Jira outage for some customers
Incident Report for Jira Software
Postmortem

SUMMARY

On August 2nd, 2021 between 11:00 am and 01:30 pm UTC some customers on Atlassian’s Cloud Platform were unable to use Jira Software, Jira Service Management and Jira Work Management. The event was triggered by the incapacity of scaling our infrastructure to support the traffic. We exhausted some network resources in the US-east region which prevented the scale-out of the stack. The incident was detected within 2 minutes by Atlassian's staff and mitigated by cleaning resources manually which put Atlassian systems into a known good state. The total time to resolution was about 2 hours & 30 minutes.

IMPACT

The overall impact was between August 2nd, 2021 11:00 am, and 01:30 pm UTC. It mainly affected Jira Software, Jira Service Management, and Jira Work Management. However, the applications that relied on accessing the data from those products were not able to display that data (Confluence, Compass, Team Central, Bitbucket, Opsgenie). The incident caused service disruption to the customers in the US-east region. It translated into high latency when trying to perform an action within Jira or a complete outage.

  • Jira Software, Jira Service Management, Jira Work Management: Users were unable to perform any actions like creating an issue, viewing an issue, transitioning an issue, posting a comment
  • Confluence Cloud: Users were unable to access the information linked to Jira tickets mentioned on a page
  • Bitbucket Cloud: Users were unable to access the information of the Jira ticket linked to a pull request or a commit
  • Compass: Users were unable to access the information of the Jira ticket linked to a pull request or a commit
  • Team Central: Users were unable to preview the information of a Jira ticket mentioned in or linked to a project
  • Opsgenie: Users were unable to access the information of a Jira ticket linked to an incident

ROOT CAUSE

The issue was caused by an emergency deployment attempt to the US-east region during irregular deployment hours that used up all the subnet IP resources that we configured. As a result, we could not spin up enough nodes to handle the load. This happened because, during this emergency deployment, we requested more resources than we had available to handle all the traffic.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We know that outages impact your productivity. We deploy our changes progressively (by cloud region) to avoid broad impact. However, in this case, the progressive deployment rollout to a specific cloud region (US-east) didn't scale as expected. Moving forward, to minimize the impact of deploying new changes to our environments, we will implement preventative measures such as the ones listed below.

  • Improve the detection mechanism that monitors the current usage of cloud resources and ensure we have enough capacity for the deployments.
  • Implement preventive measures to abort an ongoing deployment when the provisioning capacity is at risk.
  • Improve our deployment process by implementing safety guards when deploying changes within business hours of specific cloud regions.
  • Conduct simulations of high-traffic usage during deployments to adjust the configuration limits currently set at the cloud platform level.

We apologize to those customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.

Thanks,

Atlassian Customer Support

Posted Aug 10, 2021 - 00:00 UTC

Resolved
Between 1:02 PM UTC to 2:28 PM UTC, we experienced Jira Work Management, Jira Service Management, Jira Software, and Jira Align being unusable for some customers. The issue has been resolved and the service is operating normally.
Posted Aug 02, 2021 - 15:40 UTC
Monitoring
We have identified the root cause of Jira being unusable for some customers and have mitigated the problem. We are now monitoring closely.
Posted Aug 02, 2021 - 15:02 UTC
Investigating
We are investigating an issue with Jira being unusable for some customers of Jira Work Management, Jira Service Management, Jira Software, and Jira Align Cloud. We will provide more details within the next hour.
Posted Aug 02, 2021 - 14:10 UTC