התחל במצב לא מקוון עם האפליקציה Player FM !
I Was There: Stories of Production Incidents
סדרה בארכיון ("עדכון לא פעיל" status)
When? This feed was archived on December 12, 2020 08:26 (). Last successful fetch was on March 02, 2025 02:12 ()
Why? עדכון לא פעיל status. השרתים שלנו לא הצליחו לאחזר פודקאסט חוקי לזמן ממושך.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 294560476 series 2501898
Corey Martin values storytelling. It's just one way developers can share their experiences in order for others to take lessons. To that end, this episode takes a close look at production issues from two different applications to examine what went wrong and how it was fixed.
Meg Viar is a Senior Software Developer at Nomadic Learning, an e-learning platform. One day, they noticed that, for a certain group of users, a column of information in their database row was nulled. It didn't look like any user--either internally or externally--intentionally changed these values, and there hadn't been any new code deployed in days. The only clue was that the data was all changed at the same time. It turned out that a weekly cron job was deleting some data on an in-memory list. However, the database ORM they use also overloads the delete keyword, and was actually deleting the production data. Restoring the data from a backup was easy, and reworking the code to not use the data was a quick fix. However, going forward, Meg and her team came up with several ways to adjust the process around code changes like this from occurring again.
Brendan Hennessy is the co-founder and CTO at Launchpad Lab, a studio that builds custom web and mobile applications. One of their clients is an SAT/ACT test prep app, and students complained that the app was extraordinarily slow. Brendan was accustomed to seeing such feedback on testing days, when heavy volume brought added strain to servers, and they accounted for this by increasing capacity. But this was different: there weren't any tests scheduled during the period. Instead, one of their own services was inadvertently DDOSing an endpoint, expecting a response; when one didn't arrive, it just kept making requests. They reworked this code to make a request once and simply wait for a response without trying again. In the future, they committed themselves to doing more in-person blitzes of new features, since issues like this only arise after multiple users use the app--something automated tests have trouble simulating.
Links from this episode
- Nomadic Learning builds digital academies
- Launchpad Lab builds custom web and mobile applications for startups and established businesses
131 פרקים
סדרה בארכיון ("עדכון לא פעיל" status)
When? This feed was archived on December 12, 2020 08:26 (). Last successful fetch was on March 02, 2025 02:12 ()
Why? עדכון לא פעיל status. השרתים שלנו לא הצליחו לאחזר פודקאסט חוקי לזמן ממושך.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 294560476 series 2501898
Corey Martin values storytelling. It's just one way developers can share their experiences in order for others to take lessons. To that end, this episode takes a close look at production issues from two different applications to examine what went wrong and how it was fixed.
Meg Viar is a Senior Software Developer at Nomadic Learning, an e-learning platform. One day, they noticed that, for a certain group of users, a column of information in their database row was nulled. It didn't look like any user--either internally or externally--intentionally changed these values, and there hadn't been any new code deployed in days. The only clue was that the data was all changed at the same time. It turned out that a weekly cron job was deleting some data on an in-memory list. However, the database ORM they use also overloads the delete keyword, and was actually deleting the production data. Restoring the data from a backup was easy, and reworking the code to not use the data was a quick fix. However, going forward, Meg and her team came up with several ways to adjust the process around code changes like this from occurring again.
Brendan Hennessy is the co-founder and CTO at Launchpad Lab, a studio that builds custom web and mobile applications. One of their clients is an SAT/ACT test prep app, and students complained that the app was extraordinarily slow. Brendan was accustomed to seeing such feedback on testing days, when heavy volume brought added strain to servers, and they accounted for this by increasing capacity. But this was different: there weren't any tests scheduled during the period. Instead, one of their own services was inadvertently DDOSing an endpoint, expecting a response; when one didn't arrive, it just kept making requests. They reworked this code to make a request once and simply wait for a response without trying again. In the future, they committed themselves to doing more in-person blitzes of new features, since issues like this only arise after multiple users use the app--something automated tests have trouble simulating.
Links from this episode
- Nomadic Learning builds digital academies
- Launchpad Lab builds custom web and mobile applications for startups and established businesses
131 פרקים
כל הפרקים
×ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.