protelAir Outage

Incident Report for protel Cloud Center

Postmortem

Problem Description

On Friday, 22th September 2023, protel Cloud experienced a service disruption, starting approximately 07:00 AM UTC.

After noticing high load on the Front Office backend servers , it was found that the environment showed a high CPU usage and failed to establish connections to the global caching database servers. The automatic health checks were failing and ended in frequent restarts of all Backend server instances, causing long wait times and frequent error messages. After enhancing the monitoring capabilities and downgrading the connection pooling library the environment went back to a stable state.

The development team is still investigating what the root cause of the malfunctioning library is to implement a workable solution, but the downgrade ensured that all systems were fully operational again, approx. 3:00 PM UTC.

In order to prevent such an incident from reoccurring, the development team is performing a thorough analysis of what happened so that corrective measures can be taken. Our assessment of the fault, including future preventative actions, can be found below.

Affected systems

protel Cloud Front Office

Impact

Due to the failure of connecting to the global caching database, many client requests ran into time-outs. This resulted in the users being presented with frequent 504 Error messages, as well as not being able to login into the Front Office.

Root Cause

The connection pooling library has been updated during the last release on Wednesday, 20th September and is not completely backward compatible with the current Front Office infrastructure.

Mitigation / Preventive Actions

Unfortunately the error only occurred for an exceptional unknown use case that was not properly tested in advance. With the now obtained knowledge the next version will undergo an intensive testing phase to prevent this same error again.
In addition the development team will prepare an emergency response plan for this kind of error to ensure a fast rollback of the affected systems.

Disclaimer: This document has been compiled for information purposes only by protel Hotelsoftware GmbH (protel) to the best of its knowledge and belief based on information currently available and at hand. However, protel does not guarantee that the information is correct, complete, up-to-date and/or in the correct order. Protel reserves the right to make changes and/or additions without prior notice. Protel makes no express or implied warranty (including but not limited to any warranty or merchantability or fitness for a particular purpose or use, etc.) with respect to this information. Information from Protel is provided to users "as is". Protel shall not be liable to users or any other person for any interruptions, inaccuracies, errors or omissions etc. in protel's information, regardless of the cause, for any resulting damages (including, without limitation, direct or indirect, consequential damages, etc.). In all other respects, protel's General Terms and Conditions, which can be downloaded from the protel website at  http://www.protel.net/de/agb/ , shall apply.
Posted Sep 26, 2023 - 16:46 CEST

Resolved

Dear Customers,

The Incident has been resolved.
Protel Cloud PMS is running stable again.
We will post a root cause analysis on our status page next week.

Please also subscribe to our status page if you have not already done so.
https://cloudstatus.protel.net/

We apologize for any inconvenience caused.

Your Support Team
Posted Sep 22, 2023 - 21:50 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Sep 22, 2023 - 17:01 CEST

Identified

Dear customers,

we are now implementing several fixes and will monitor the results. we will keep you informed.
Posted Sep 22, 2023 - 15:55 CEST

Update

Our development department continues to research the causes of the recurring system performance issue.

We apologise for the inconvenience.
Posted Sep 22, 2023 - 15:32 CEST

Update

We are still continuing to investigate this issue. We apologize for the inconvenience.
Posted Sep 22, 2023 - 12:44 CEST

Investigating

We are currently investigating this issue.
Posted Sep 22, 2023 - 10:51 CEST

Update

We are continuing to work on a fix for this issue.
Posted Sep 22, 2023 - 10:44 CEST

Identified

The issue has been identified and a fix is being implemented.
Posted Sep 22, 2023 - 09:41 CEST

Investigating

Access to individual protel services is currently interrupted for some customers. We are working to resolve this issue as soon as possible. View the current status and impacted services via https://cloudstatus.protel.net.
Posted Sep 22, 2023 - 09:38 CEST
This incident affected: protel Cloud Solutions | Europe, North America (protel Air) and protel Cloud Solutions | Australia, Asia (protel Air, Credit Card Interface, Other Interfaces to local systems (not IDS or credit cards)).