From 12/Aug/24 10:39 UTC
to 13/Aug/24 14:38 UTC
some customers experienced degraded performance on Issue View in Jira. This was caused by a data processing compatibility problem between a cache, the underlying database, and the new deployment.
Due to a slow increase in failure rate and a small initial surface area of impact, the problem didn’t immediately trigger our continuous error monitoring and alerting.
Once we identified the issue it was resolving itself through self-healing mechanisms in the infrastructure. However, in a few outlier cases, we had to intervene with tenant specific cache recalculations. All but 6 tenants were fully remediated by 12/Aug/24 21:30 UTC
.
The issue occurred on the read layer of our architecture so while customer experience was degraded, there was no data loss.
About 1% of instances in were impacted over the lifetime of the incident. Users on those impacted instances would have experienced degradation when loading Issue View in a specific scenario. This was when a Multi Select Custom Field is enabled on an Issue and where that Custom Field also had a Default Value set.
We introduced a change in our code which caused processing of Custom Fields in specific configurations to fail. This prevented Issue View from loading issues for projects with the above specific configuration applied.
This problem occurred because of different representations of the data in the database, in the code base, and in the cache in the production environment. These multiple representations caused an exception when translating the data from one representation to the next.
The problem largely self healed as the cache expired and was refreshed with compatible data; however, we chose to force cache re-computation for affected tenants in order to expedite this process.
We chose not to roll back the deployment at that point as that would have created the reverse compatibility issue with the already healed tenants.
Instead, we focussed on forward fixing with a hotfix and accelerating remediation for still affected tenants. For a small very number of tenants, forced re-computation did not immediately rectify and we had to roll forward a code hotfix to remediate.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
We apologize for any disruption this issue may have caused and are taking steps to help ensure it does not occur again.
Thanks,
Atlassian Customer Support