We are aware of an issue that's causing some customers to not be able to access the application
Incident Report for Limble
Postmortem

Date: May 14, 2024
Status: Resolved

Summary On May 14, 2024, our service experienced a downtime event due to multiple frontend versions being served concurrently. This incident was caused by a discrepancy in our build process that led to different JavaScript chunks being deployed despite using the same Git commit. Immediate actions were taken to redeploy the code and resolve the issue, and measures have been implemented to prevent recurrence.

Impact The downtime affected all users attempting to access our site, resulting in an inability to load the app. This incident primarily impacted customers on the main version of the app, while those on Canary were unaffected.

Root Causes The incident was triggered by a container restart that resulted in different JavaScript chunks being served. Our build process, though using the same Git commit, produced non-idempotent results, causing one of our webApp containers to serve incorrect chunks.

Resolution and Improvements Immediate actions included redeploying the rollback branch, ensuring all containers served the correct chunks, and implementing a fix to skip build/push for the container image if the SHA tag already exists. Additionally, the following improvements are planned:

  • Logging Enhancements
  • Container Health Checks
  • Immutable Tags
  • Build Process Update
  • Nginx Configuration

Description of Events

  • 6:04 AM MST: Incident detected
  • 6:15 AM MST: Investigation and communication initiated.
  • 6:46 AM MST: Redeployment initiated using rollback branch.
  • 6:51 AM MST: Successful redeployment and app loading restored.
  • 7:50 AM MST: Incident resolved
Posted May 20, 2024 - 13:42 MDT

Resolved
Services are recovering and we have identified the cause of the outage. We are working to implement a fix to prevent the same thing from happening in the future.
Posted May 14, 2024 - 06:45 MDT
Investigating
We are currently investigating this issue.
Posted May 14, 2024 - 06:21 MDT
This incident affected: Limble CMMS Web Application.