On March 7, 2024, between 12:02 pm UTC and 12:21 pm UTC, some Atlassian customers using Jira in the EU-Central region were unable to interact with core experiences of the product, and some customers using Confluence in the EU-Central region had a degraded experience. This was caused by unexpected traffic and recent changes to automated scaling policies which resulted in nodes not being able to scale up quickly enough. The incident was detected within two minutes by automated monitoring and was mitigated by manually scaling up nodes which put Atlassian systems into a known good state. The total time to resolution was 19 minutes.
The overall impact was on March 7, 2024, between 12:02 pm UTC and 12:21 pm UTC, on Jira and Confluence products. The incident caused service disruption to some customers accessing the EU-central region. For Jira, the disruption caused core experiences of the product to be unavailable during this time. For Confluence, some non-core experiences were affected during this time.
This incident was caused by unexpected traffic and recent changes to automated scaling policies which resulted in nodes not being able to scale up quickly enough in the EU-Central region. As a result, the Atlassian GraphQL Gateway could not service all requests in EU-Central, leading to parts of Jira being unavailable for some customers and degradation of non-core features in Confluence in the incident time period.
We understand that outages impact your productivity.
We have already implemented improvement actions to avoid repeating this type of incident and reduce the time to recovery. We have improved scaling out policies to be substantially more aggressive. This will help mitigate many different types of incidents.
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support