Leyra - Notice history

All systems operational

Admin Console - Operational

100% - uptime
Jul 2018 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2018
Aug 2018
Sep 2018

Delivery API - Operational

100% - uptime
Jul 2018 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2018
Aug 2018
Sep 2018

Web Platform - Operational

100% - uptime
Jul 2018 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2018
Aug 2018
Sep 2018

Notice history

Sep 2018

Aug 2018

Studio Professional API Errors
  • Update
    Update

    On Thursday, August 30 at 09:46 EST \(13:46 UTC\), the Studio Professional team observed API failures during a release rollback procedure. These failures caused disruptions for some of our studio customer applications. ### Incident Details After a scheduled release was completed for Studio Professional, it was observed that client applications were not behaving as expected with certain theming elements being applied incorrectly. To avoid further corrupting of layout data, a decision to rollback the release was taken. During the rollback, Studio Professional API failed to respond which led to unavailability of Studio Professional Portal, CMS metadata APIs and Studio Professional Client Applications. This was an unexpected effect of one of the database sub-systems temporarily going offline. ### Resolution Details At 10:13 EST \(14:13 UTC\), the rollback was complete, all systems and related data were available again in its original state prior to the release. ### Mitigation & Planned Actions The Studio Professional team is investigating alternative future rollback procedures that would lead to minimal disruption of client applications who consume Studio Professional. In addition, the team will look at adding client side caching to improve overall availability of client applications during situations like these. Performance and stability are our team's top concerns and we'll continue to work in these areas for our users. We apologize for the inconvenience this may have caused. Thank you for your understanding and support.

  • Resolved
    Resolved

    At 9:46 AM EST (13:46 UTC), the Studio Professional Admin Portal and Client Applications were unavailable due to outages on the Studio Pro Admin API. As of 10:13 AM EST (14:13 UTC), APIs became available again and all systems resumed regular operation.

Jul 2018

Elevated API Errors
  • Update
    Update

    On Friday, July 6 at 16:00 CEST (14:00 UTC) the Accedo One team observed an increasing amount of errors on Publish API endpoints and elevated response times. #Incident Details During the investigation, the issue was identified to be caused by an abnormally large increase of traffic on Publish API endpoints. Observed traffic was approximately ten times the overall normal traffic on the system (including high-load events, such as World Cup 2018) with majority of the traffic targeting Publish endpoints. #Resolution Details As part of an immediate resolution, an additional short-lived cache tier was placed in front of Publish endpoints. Additional compute capacity was also deployed to offload the underlying services and allow systems to return back to a regular operational state. At 17:09 CEST (15:09 UTC) all Accedo One APIs returned fully to their operational state. #Mitigation & Planned Actions The changes implemented during last night’s incident remain in place, as well as new fixes to be put in place, as outlined below. Accedo One has identified opportunities for scalability improvements for Publish endpoints and will immediately prioritize improvements to the related services. These changes will provide better availability during abnormal load spikes for Publish endpoints, with the intended goal of minimizing service degradation times. These upgrades will be rolled out as soon as possible, but not before the end of the World Cup. Additionally, the Accedo One team is investigating other complimentary solutions, such as rate limiting, to improve resilience of the entire system and increase protection of our customers. Specifically, during the remaining World Cup games, which are expected to provide significant load spikes, we will continue to have high operational capacity, including additional live monitoring capabilities, and will deploy short-lived caching and additional compute capacity which were successful mitigation effects. This will resolve any issues for the remaining games. With all distributed systems, it is always recommended to retain a fallback cache in the middleware and implement an incremental back-off mechanism to allow for systems to recover in case of any failures, especially due to excessive spikes in load.

  • Resolved
    Resolved

    This incident has been resolved.

  • Monitoring
    Monitoring

    With help of measures that were put in place, Accedo One APIs have returned to an operation state. We continue monitoring the situation.

  • Update
    Update

    We are still working on resolving the issue with the elevated response times and error rates of the Publish endpoints. Several measures are being put in place and we will update the status as soon as more information is available.

  • Identified
    Identified

    We have identified the problem that causes API requests to Publish endpoints to have an increased error rate and response times and are currently working on the resolution.

  • Investigating
    Investigating

    We are currently experiencing an elevated level of API errors and are currently looking into the issue. We will provide more updates as soon as possible.

Jul 2018 to Sep 2018

Next