Notice history

On Thursday, August 30 at 09:46 EST \(13:46 UTC\), the Studio Professional team observed API failures during a release rollback procedure. These failures caused disruptions for some of our studio customer applications. ### Incident Details After a scheduled release was completed for Studio Professional, it was observed that client applications were not behaving as expected with certain theming elements being applied incorrectly. To avoid further corrupting of layout data, a decision to rollback the release was taken. During the rollback, Studio Professional API failed to respond which led to unavailability of Studio Professional Portal, CMS metadata APIs and Studio Professional Client Applications. This was an unexpected effect of one of the database sub-systems temporarily going offline. ### Resolution Details At 10:13 EST \(14:13 UTC\), the rollback was complete, all systems and related data were available again in its original state prior to the release. ### Mitigation & Planned Actions The Studio Professional team is investigating alternative future rollback procedures that would lead to minimal disruption of client applications who consume Studio Professional. In addition, the team will look at adding client side caching to improve overall availability of client applications during situations like these. Performance and stability are our team's top concerns and we'll continue to work in these areas for our users. We apologize for the inconvenience this may have caused. Thank you for your understanding and support.

On Friday, July 6 at 16:00 CEST (14:00 UTC) the Accedo One team observed an increasing amount of errors on Publish API endpoints and elevated response times. #Incident Details During the investigation, the issue was identified to be caused by an abnormally large increase of traffic on Publish API endpoints. Observed traffic was approximately ten times the overall normal traffic on the system (including high-load events, such as World Cup 2018) with majority of the traffic targeting Publish endpoints. #Resolution Details As part of an immediate resolution, an additional short-lived cache tier was placed in front of Publish endpoints. Additional compute capacity was also deployed to offload the underlying services and allow systems to return back to a regular operational state. At 17:09 CEST (15:09 UTC) all Accedo One APIs returned fully to their operational state. #Mitigation & Planned Actions The changes implemented during last night’s incident remain in place, as well as new fixes to be put in place, as outlined below. Accedo One has identified opportunities for scalability improvements for Publish endpoints and will immediately prioritize improvements to the related services. These changes will provide better availability during abnormal load spikes for Publish endpoints, with the intended goal of minimizing service degradation times. These upgrades will be rolled out as soon as possible, but not before the end of the World Cup. Additionally, the Accedo One team is investigating other complimentary solutions, such as rate limiting, to improve resilience of the entire system and increase protection of our customers. Specifically, during the remaining World Cup games, which are expected to provide significant load spikes, we will continue to have high operational capacity, including additional live monitoring capabilities, and will deploy short-lived caching and additional compute capacity which were successful mitigation effects. This will resolve any issues for the remaining games. With all distributed systems, it is always recommended to retain a fallback cache in the middleware and implement an incremental back-off mechanism to allow for systems to recover in case of any failures, especially due to excessive spikes in load.

All systems operational

Sep 2018

Aug 2018

Jul 2018

Leyra - Notice history

All systems operational

Notice history

Sep 2018

Aug 2018

Jul 2018