protel Cloud Service Degraded Performance
Incident Report for protel Cloud Center
Postmortem

Customer Confidential | protel | Incident Root Cause Analysis

Problem Description

On Wednesday 15th June 2022, protel AIR customers were not able to access the NG environment, as well as connected applications, starting approx. 12:20 PM UTC+2:00.

The applications were reachable again, approx. 14:55 PM UTC+2:00.

In order to prevent such an incident from reoccurring, we have performed a thorough analysis of what happened and the corrective measures were taken. Our assessment of the fault, including future preventative actions, can be found below.

Affected systems

IAM and connected applications (for example pAir, dSignature, SMP)

Impact

Login to the applications was intermittently not possible.

Root Cause

As a preparation for the upcoming update of the Identity Server, WSO2, from version 5.08 to 5.11 on our production / live environment (scheduled for the 6th of July), two new servers had to be created and configured.

When the software update had to be deployed on the newly created servers, the current servers were incorrectly chosen as the target of deployment.

The effect of this was that logging into IAM was not possible anymore, as the Identity Server couldn’t be reached, affecting all connected applications.

In order to undo the “incorrect” update of the production/live servers the following steps had to be taken:

  • Misconfiguration of a DevOps script had to be corrected.

    • In collaboration with the DevOps department.
  • Resources of the Identity Server were no longer linked and had to be restored.

  • Restoration of each service had to be applied on a step by step basis

Unfortunately, all of the actions had to be done manually, resulting in an elongated downtime.

At approx. 14:55PM UTC+2:00, the reverting of the “incorrect” update of the live Identity Servers was completed and as a result all systems were restored and applications were once again accessible.

Upon completion, the functionality of the Identity Server was closely monitored and smaller incidents, such as the loss of some user permissions were restored. For certain applications, a restart was required as new certificates had to be configured

Mitigation / Preventive Actions

Under normal deployment or update circumstances a roll-back strategy is implemented. As this issue occurred “unexpectedly”, this was not in place. The following actions will be taken in order to reduce the chance of recurrence:

  • Access to the live Identity Servers will only be granted to dedicated user accounts.
  • The colouring of the configuration screens will be made differently for the production environment, making it directly visible to the user.
  • The updating of any Identity Server related system will only be performed following the four-eyes principle. (Two person approval and overview)

Disclaimer: This document has been compiled for information purposes only by protel Hotelsoftware GmbH (protel) to the best of its knowledge and belief based on information currently available and at hand. However, protel does not guarantee that the information is correct, complete, up-to-date and/or in the correct order. Protel reserves the right to make changes and/or additions without prior notice. Protel makes no express or implied warranty (including but not limited to any warranty or merchantability or fitness for a particular purpose or use, etc.) with respect to this information. Information from Protel is provided to users "as is". Protel shall not be liable to users or any other person for any interruptions, inaccuracies, errors or omissions etc. in protel's information, regardless of the cause, for any resulting damages (including, without limitation, direct or indirect, consequential damages, etc.). In all other respects, protel's General Terms and Conditions, which can be downloaded from the protel website at http://www.protel.net/de/agb/ , shall apply.
Posted Jun 21, 2022 - 12:05 CEST

Resolved
This incident has been resolved.
Posted Jun 15, 2022 - 16:49 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 15, 2022 - 15:03 CEST
Update
We are continuing to investigate this issue.
Posted Jun 15, 2022 - 14:02 CEST
Investigating
We are aware that some customers are currently getting an error message during login. We're working with priority to get things back to normal. View the current status and impacted services via https://cloudstatus.protel.net.
Posted Jun 15, 2022 - 13:48 CEST
This incident affected: protel Cloud Solutions | Europe, North America (protel Air, Identity and Access Management (IAM)) and protel Cloud Solutions | Australia, Asia (protel Air, Identity and Access Management (IAM)).