Clock Issue

Incident Report for Skeddly

Postmortem

This is a repeat of the incident that occurred on August 10, 2018. However, this time, the problem was caught faster.

There are 4 locations where the time issue could have originated:

Based on deeper investigation following today’s incident, we observed the following:

The time jump was not localized to the application process because the time jump was recorded in the OS logs
The time jump was not caused by the NTP server because the time jump was not logged to the OS logs, however, the time was corrected by the NTP client (which was logged).

Based on the above, the time jump occurred at the EC2 instance (hardware or VM level) or OS.

Further investigation will occur along with more discussions with AWS support.

Posted Aug 26, 2018 - 01:00 EDT

Action processing has been resumed and affected actions have been restored and/or cancelled as necessary.

Our SLA will be applied.

Posted Aug 26, 2018 - 00:57 EDT

The clock has been corrected by NTP.

Action processing has resumed. Affected actions are being attended to.

Posted Aug 26, 2018 - 00:44 EDT

We have identified a clock jump on one of our EC2 instances.

Action processing has been halted.

Posted Aug 26, 2018 - 00:25 EDT

This incident affected: Action Infrastructure.