Clock Issue
Incident Report for Skeddly
Postmortem

This is a repeat of the incident that occurred on August 10, 2018. However, this time, the problem was caught faster.

There are 4 locations where the time issue could have originated:

  • The EC2 instance (hardware or VM level)
  • The OS
  • NTP server
  • The application process

Based on deeper investigation following today’s incident, we observed the following:

  • The time jump was not localized to the application process because the time jump was recorded in the OS logs
  • The time jump was not caused by the NTP server because the time jump was not logged to the OS logs, however, the time was corrected by the NTP client (which was logged).

Based on the above, the time jump occurred at the EC2 instance (hardware or VM level) or OS.

Further investigation will occur along with more discussions with AWS support.

Posted Aug 26, 2018 - 01:00 EDT

Resolved
Action processing has been resumed and affected actions have been restored and/or cancelled as necessary.

Our SLA will be applied.
Posted Aug 26, 2018 - 00:57 EDT
Update
The clock has been corrected by NTP.

Action processing has resumed. Affected actions are being attended to.
Posted Aug 26, 2018 - 00:44 EDT
Identified
We have identified a clock jump on one of our EC2 instances.

Action processing has been halted.
Posted Aug 26, 2018 - 00:25 EDT
This incident affected: Action Infrastructure.