tag:status.skeddly.com,2005:/historySkeddly Status - Incident History2024-02-29T22:43:52-05:00Skeddlytag:status.skeddly.com,2005:Incident/199228762024-02-10T12:55:54-05:002024-02-10T12:55:54-05:00RDS Database Upgrade<p><small>Feb <var data-var='date'>10</var>, <var data-var='time'>12:55</var> EST</small><br><strong>Completed</strong> - Our database upgrade completed successfully and actions are executing correctly. We are closing the maintenance window as a success.</p><p><small>Feb <var data-var='date'>10</var>, <var data-var='time'>12:51</var> EST</small><br><strong>Verifying</strong> - Our RDS database upgrade completed successfully. All actions and notifications have resumed execution.</p><p><small>Feb <var data-var='date'>10</var>, <var data-var='time'>12:00</var> EST</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'> 6</var>, <var data-var='time'>11:07</var> EST</small><br><strong>Scheduled</strong> - We will be upgrading our primary RDS database. Action starts and executions may be impacted, but hopefully only by a small amount. We will be honouring our SLA during this time.</p>tag:status.skeddly.com,2005:Incident/178911142023-07-18T14:25:22-04:002023-07-18T14:25:22-04:00Errors accessing web interface<p><small>Jul <var data-var='date'>18</var>, <var data-var='time'>14:25</var> EDT</small><br><strong>Resolved</strong> - The SSL certificate issues are resolved.</p><p><small>Jul <var data-var='date'>18</var>, <var data-var='time'>12:59</var> EDT</small><br><strong>Identified</strong> - CloudFront is issuing an invalid SSL certificate in some responses from the Skeddly API.</p><p><small>Jul <var data-var='date'>18</var>, <var data-var='time'>12:52</var> EDT</small><br><strong>Investigating</strong> - We are investigating issues accessing the primary Skeddly web interface.</p>tag:status.skeddly.com,2005:Incident/175828152023-06-15T09:41:00-04:002023-06-15T09:41:00-04:00Notifications are not being sent<p><small>Jun <var data-var='date'>15</var>, <var data-var='time'>09:41</var> EDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jun <var data-var='date'>15</var>, <var data-var='time'>09:30</var> EDT</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Jun <var data-var='date'>15</var>, <var data-var='time'>09:28</var> EDT</small><br><strong>Identified</strong> - The issue has been identified and a fix is being implemented.</p><p><small>Jun <var data-var='date'>15</var>, <var data-var='time'>09:19</var> EDT</small><br><strong>Investigating</strong> - We are currently investigating reports of email notifications not being sent.</p>tag:status.skeddly.com,2005:Incident/163827372023-03-08T10:51:12-05:002023-03-08T10:59:42-05:00ElastiCache Memcached Cluster Upgrade<p><small>Mar <var data-var='date'> 8</var>, <var data-var='time'>10:51</var> EST</small><br><strong>Completed</strong> - The upgrade is complete.</p><p><small>Mar <var data-var='date'> 8</var>, <var data-var='time'>10:24</var> EST</small><br><strong>Verifying</strong> - The new endpoint has been rolled out to the API and web front-end. We are now verifying everything is working correctly.</p><p><small>Mar <var data-var='date'> 8</var>, <var data-var='time'>10:03</var> EST</small><br><strong>Update</strong> - The new ElastiCache cluster is up and running. The new endpoint is being rolled out to the API and web front-end now.</p><p><small>Mar <var data-var='date'> 8</var>, <var data-var='time'>10:00</var> EST</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Mar <var data-var='date'> 8</var>, <var data-var='time'>07:59</var> EST</small><br><strong>Update</strong> - We will be undergoing scheduled maintenance during this time.</p><p><small>Mar <var data-var='date'> 7</var>, <var data-var='time'>19:13</var> EST</small><br><strong>Scheduled</strong> - We will be upgrading out ElastiCache Memcached cluster.<br /><br />Users using the web interface may periodically be signed-out.</p>tag:status.skeddly.com,2005:Incident/151147162022-12-13T16:30:59-05:002022-12-13T17:05:11-05:00Actions are not executing<p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>16:30</var> EST</small><br><strong>Resolved</strong> - The version has been rolled back, and actions are executing again.</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>15:58</var> EST</small><br><strong>Identified</strong> - Due to a deployment issue, actions are not executing.<br /><br />We are deploying the previous version to remedy the issue.</p>tag:status.skeddly.com,2005:Incident/83086142021-10-25T07:23:43-04:002021-10-25T07:24:41-04:00Notifications Are Temporarily Stopped<p><small>Oct <var data-var='date'>25</var>, <var data-var='time'>07:23</var> EDT</small><br><strong>Resolved</strong> - Notifications have be restarted.<br /><br />Our downstream provider has worked around the Heroku scheduled downtime: https://status.sendwithus.com/incidents/48rpl6x1kz7b</p><p><small>Oct <var data-var='date'>25</var>, <var data-var='time'>04:45</var> EDT</small><br><strong>Update</strong> - Heroku, who hosts a third-party service used in our notifications sub-system is undergoing scheduled maintenance. Notifications have been disabled.<br /><br />https://status.heroku.com/incidents/2375</p><p><small>Oct <var data-var='date'>25</var>, <var data-var='time'>04:09</var> EDT</small><br><strong>Identified</strong> - Heroku, who hosts a third-party service used in our notifications sub-system is undergoing scheduled maintenance. Notifications have been disabled.<br /><br />https://status.heroku.com/incidents/2375</p>tag:status.skeddly.com,2005:Incident/51679082020-09-24T12:29:15-04:002020-09-24T12:29:15-04:00Error during deployment<p><small>Sep <var data-var='date'>24</var>, <var data-var='time'>12:29</var> EDT</small><br><strong>Resolved</strong> - The deployment was successful.</p><p><small>Sep <var data-var='date'>24</var>, <var data-var='time'>11:44</var> EDT</small><br><strong>Monitoring</strong> - The cause of the deployment error has been identified and a fix has been created.<br /><br />A new version is deploying now and we are monitoring closely for any issues.</p><p><small>Sep <var data-var='date'>24</var>, <var data-var='time'>11:09</var> EDT</small><br><strong>Update</strong> - The version is being rolled back.<br /><br />Action executions have resumed. Email notifications have been restored.</p><p><small>Sep <var data-var='date'>24</var>, <var data-var='time'>11:07</var> EDT</small><br><strong>Identified</strong> - Due to an error during deployment, some notifications were not sent, and action executions were delayed.<br /><br />The bad deploy is being rolled back.<br /><br />Affected functions:<br />* Action executions were delayed for 15 minutes. Our SLA has been applied to all affected actions.<br />* Email notifications were not sent.</p>tag:status.skeddly.com,2005:Incident/33896512020-01-04T12:10:57-05:002020-01-04T12:10:57-05:00Notifications are not sending<p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>12:10</var> EST</small><br><strong>Resolved</strong> - Our notification processes have been restarted and emails are flowing again.</p><p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>12:04</var> EST</small><br><strong>Identified</strong> - At 12:14am EST, our notification processes stopped sending out notification emails.<br /><br />The issue has been identified and we are restarting the offending processes.</p>tag:status.skeddly.com,2005:Incident/29060232019-08-23T10:01:57-04:002019-08-23T10:01:57-04:00Redis Server Connection Failure<p><small>Aug <var data-var='date'>23</var>, <var data-var='time'>10:01</var> EDT</small><br><strong>Resolved</strong> - Between 10:30am and 10:35am UTC, connections to our Redis server failed. The issue was temporary and action executions resumed normally. However, some action executions were affected.<br /><br />All affected action executions have had our SLA applied.<br /><br />And we are enhancing our service to be more resilient against this type of error so that either:<br /> (a) the error does not happen, or <br /> (b) if it does happen, then actions won't fail like they did.</p>tag:status.skeddly.com,2005:Incident/26856602019-07-10T13:17:43-04:002019-07-10T13:17:43-04:00Skeddly Dashboard is failing<p><small>Jul <var data-var='date'>10</var>, <var data-var='time'>13:17</var> EDT</small><br><strong>Resolved</strong> - Between 15:41 and 16:05 UTC, Stripe was having an issue with one of their APIs which caused Skeddly's dashboard to fail.<br /><br />Stripe has resolved their API issue.<br /><br />In addition, we have improved the dashboard such that it won't bomb in case of another Stripe API failure.</p><p><small>Jul <var data-var='date'>10</var>, <var data-var='time'>13:12</var> EDT</small><br><strong>Monitoring</strong> - The downstream API has been resolved and the Skeddly dashboard is operating normally again. We are monitoring the situation to ensure they don't break again.</p><p><small>Jul <var data-var='date'>10</var>, <var data-var='time'>12:45</var> EDT</small><br><strong>Identified</strong> - We are having a problem with a downstream API on Skeddly's main dashboard.<br /><br />To get around the issue, don't go to the main dashboard page. Instead, go to the actions page https://app.skeddly.com/Actions</p>tag:status.skeddly.com,2005:Incident/25492712019-06-14T14:13:24-04:002019-06-14T14:14:01-04:00Manually Triggered Actions Executed Twice<p><small>Jun <var data-var='date'>14</var>, <var data-var='time'>14:13</var> EDT</small><br><strong>Resolved</strong> - Between 21:30 on 2019-06-13 UTC and 12:00 on 2019-06-14 UTC, actions that were manually executed by the "Execute Now" button in the Skeddly user interface were inadvertently executed twice.<br /><br />This was caused by a bug whereby the button's click event was triggered twice from a single click, and two AJAX calls were sent.<br /><br />The issue has been fixed, and now only one action execution is started per button click.<br /><br />Our SLA has been applied to the duplicated action executions.</p>tag:status.skeddly.com,2005:Incident/25220112019-06-10T12:55:18-04:002019-06-10T12:56:55-04:00Database Connection Issue<p><small>Jun <var data-var='date'>10</var>, <var data-var='time'>12:55</var> EDT</small><br><strong>Resolved</strong> - Between 15:30 and 16:30 UTC, connections to our primary RDS instances failed.<br /><br />The RDS instances have now been restored. Everything is executing normally again.<br /><br />There were some action executions that failed during the outage. Those executions will have our SLA applied.</p><p><small>Jun <var data-var='date'>10</var>, <var data-var='time'>12:33</var> EDT</small><br><strong>Monitoring</strong> - Connections to the RDS instances have been restored and action executions are operating normally again. <br /><br />We are monitoring things closely.</p><p><small>Jun <var data-var='date'>10</var>, <var data-var='time'>12:17</var> EDT</small><br><strong>Update</strong> - Action executions are failing as well.<br /><br />Connections to our primary RDS instance appears to be impaired.</p><p><small>Jun <var data-var='date'>10</var>, <var data-var='time'>12:16</var> EDT</small><br><strong>Update</strong> - We are continuing to investigate this issue.</p><p><small>Jun <var data-var='date'>10</var>, <var data-var='time'>12:01</var> EDT</small><br><strong>Investigating</strong> - We are investigating an issue with the web interface.<br /><br />Action executions are executing normally.</p>tag:status.skeddly.com,2005:Incident/23517032019-04-26T01:43:00-04:002019-04-26T01:43:00-04:00Authorization Failures During Action Execution<p><small>Apr <var data-var='date'>26</var>, <var data-var='time'>01:43</var> EDT</small><br><strong>Resolved</strong> - Between 20:00 on Thursday, April 25 and 01:00 on Friday, April 26, some actions configured to run in "All Regions" experienced authorization issues while accessing the new AWS Hong Kong region.<br /><br />The region has been disabled in Skeddly while we resolve the issues.<br /><br />All actions are executing correctly now. All affected actions have had our SLA applied.</p>tag:status.skeddly.com,2005:Incident/20484512018-11-19T08:50:50-05:002018-11-19T08:50:50-05:00Notifications Are Delayed<p><small>Nov <var data-var='date'>19</var>, <var data-var='time'>08:50</var> EST</small><br><strong>Resolved</strong> - Due to an issue in our notification sub-system, notifications have been delayed. This includes forgotten email and username notifications and action execution notifications.<br /><br />Our notification system is working through the backlog now and should be caught up in a few hours.</p>tag:status.skeddly.com,2005:Incident/20266332018-11-09T07:22:25-05:002018-11-09T07:22:25-05:00Notifications are not sending<p><small>Nov <var data-var='date'> 9</var>, <var data-var='time'>07:22</var> EST</small><br><strong>Resolved</strong> - Email notifications are now being sent normally.</p><p><small>Nov <var data-var='date'> 9</var>, <var data-var='time'>07:00</var> EST</small><br><strong>Identified</strong> - Email verifications and action execution notifications are not being sent. This was due to an issue during a deploy. Our notification system is being rolled-back, so notifications should resume shortly.</p>tag:status.skeddly.com,2005:Incident/19695032018-10-14T16:22:52-04:002018-10-14T16:22:52-04:00Clock Issue<p><small>Oct <var data-var='date'>14</var>, <var data-var='time'>16:22</var> EDT</small><br><strong>Resolved</strong> - The clock on a Skeddly worker instance jumped into the future again.<br /><br />AWS has confirmed this to be an issue with some EC2 instances.<br /><br />Due to measures put in place to deal with the issue, fewer actions were affected this time. All affected action executions have had our SLA applied.</p>tag:status.skeddly.com,2005:Incident/18783922018-08-26T00:57:27-04:002018-08-26T01:00:12-04:00Clock Issue<p><small>Aug <var data-var='date'>26</var>, <var data-var='time'>00:57</var> EDT</small><br><strong>Resolved</strong> - Action processing has been resumed and affected actions have been restored and/or cancelled as necessary.<br /><br />Our SLA will be applied.</p><p><small>Aug <var data-var='date'>26</var>, <var data-var='time'>00:44</var> EDT</small><br><strong>Update</strong> - The clock has been corrected by NTP.<br /><br />Action processing has resumed. Affected actions are being attended to.</p><p><small>Aug <var data-var='date'>26</var>, <var data-var='time'>00:25</var> EDT</small><br><strong>Identified</strong> - We have identified a clock jump on one of our EC2 instances.<br /><br />Action processing has been halted.</p>tag:status.skeddly.com,2005:Incident/18541172018-08-10T01:49:34-04:002018-08-10T11:32:09-04:00Errors saving action execution logs to S3<p><small>Aug <var data-var='date'>10</var>, <var data-var='time'>01:49</var> EDT</small><br><strong>Resolved</strong> - At 12:15am EST, one of our action worker instances was unable to authenticate against Amazon S3. This caused the worker to be unable to upload partial action execution logs to the S3 bucket in which logs are stored. Aside from this issue, action executions were proceeding correctly.<br /><br />After diagnosing the issue, at 1:30am EST the EC2 instance was terminated and a replacement was launched by Auto Scaling.<br /><br />Actions are executing normally, and execution logs are being stored in S3 correctly.<br /><br />Update 2:13am EST:<br /><br />We have also found that the clock on the problem EC2 instance jumped forward in time to December 12, 2018. This may have helped cause the authentication failures with S3. The cause for the clock shift has not been determined.<br /><br />Some action executions were able to be brought back to the present, but unfortunately, this has forced us to cancel some action executions which are "stuck" in the future. Our SLA will be applied to all affected action executions.</p><p><small>Aug <var data-var='date'>10</var>, <var data-var='time'>01:40</var> EDT</small><br><strong>Monitoring</strong> - The problem EC2 instance has been terminated and a replacement is being launched. We are monitoring the situation.<br /><br />Action execution is continuing normally. Action execution logs are being saved to S3.</p><p><small>Aug <var data-var='date'>10</var>, <var data-var='time'>01:30</var> EDT</small><br><strong>Identified</strong> - We have identified a problem with a single EC2 instance saving action execution logs to S3. Otherwise, actions have been executing correctly.</p><p><small>Aug <var data-var='date'>10</var>, <var data-var='time'>01:00</var> EDT</small><br><strong>Investigating</strong> - We are investigating an issue saving action execution logs to S3.</p>tag:status.skeddly.com,2005:Incident/18280522018-07-24T19:31:40-04:002018-07-24T21:31:56-04:00Degraded Database Performance<p><small>Jul <var data-var='date'>24</var>, <var data-var='time'>19:31</var> EDT</small><br><strong>Resolved</strong> - The issues with our primary RDS database have been resolved. Actions are executing normally again.</p><p><small>Jul <var data-var='date'>24</var>, <var data-var='time'>18:10</var> EDT</small><br><strong>Investigating</strong> - We are investigating degraded database performance in our primary RDS instance.<br /><br />Actions are still executing, but there are some minor delays.</p>tag:status.skeddly.com,2005:Incident/17678092018-06-13T07:38:32-04:002018-06-13T07:38:33-04:00Actions are not executing<p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>07:38</var> EDT</small><br><strong>Resolved</strong> - After rebooting our primary RDS instance (with failover), actions have resumed executing normally.<br /><br />All affected action executions will have our SLA applied.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>07:21</var> EDT</small><br><strong>Monitoring</strong> - We rebooted our primary RDS instance with failover, and actions are executing again. We are continuing to monitor.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>07:14</var> EDT</small><br><strong>Identified</strong> - We have identified an issue with our primary RDS instance.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>06:32</var> EDT</small><br><strong>Investigating</strong> - We are investigating an issue where action executions are not executing.</p>tag:status.skeddly.com,2005:Incident/17675992018-06-13T02:31:11-04:002018-08-02T12:38:01-04:00Action execution failures<p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>02:31</var> EDT</small><br><strong>Resolved</strong> - The issue has been resolved. Action executions are executing normally again.<br /><br />We are applying our SLA to affected action executions.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>02:07</var> EDT</small><br><strong>Monitoring</strong> - After terminating the offending EC2 instance, the time problem has been resolved. We are monitoring to ensure the problem is fully resolved.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>01:45</var> EDT</small><br><strong>Identified</strong> - We have identified a problem with the clock on one of Skeddly's action execution workers. The worker was terminated to stop the problem.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>00:53</var> EDT</small><br><strong>Investigating</strong> - Action executions are ending prematurely. We are investigating.</p>tag:status.skeddly.com,2005:Incident/15536852018-01-04T15:32:13-05:002018-01-04T15:32:13-05:00Email notifications are not being sent<p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>15:32</var> EST</small><br><strong>Resolved</strong> - The issues have been resolved.</p><p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>15:07</var> EST</small><br><strong>Monitoring</strong> - Our upstream provider has indicated that the issue is resolved. They are monitoring the situation, as are we.</p><p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>14:21</var> EST</small><br><strong>Identified</strong> - Email notifications are not being sent currently due to an issue with an upstream provider.</p>tag:status.skeddly.com,2005:Incident/13600622017-09-14T16:13:22-04:002017-09-14T16:13:22-04:00Amazon S3 Errors<p><small>Sep <var data-var='date'>14</var>, <var data-var='time'>16:13</var> EDT</small><br><strong>Resolved</strong> - The errors in Amazon S3 have been resolved.</p><p><small>Sep <var data-var='date'>14</var>, <var data-var='time'>15:11</var> EDT</small><br><strong>Identified</strong> - Amazon S3 is experiencing increased error rates. As a result, action executions are unable to save some log fragments.</p>tag:status.skeddly.com,2005:Incident/12673312017-06-15T12:41:42-04:002017-06-15T12:41:42-04:00Issue signing-in<p><small>Jun <var data-var='date'>15</var>, <var data-var='time'>12:41</var> EDT</small><br><strong>Resolved</strong> - The issues signing-in have been resolved.</p><p><small>Jun <var data-var='date'>15</var>, <var data-var='time'>12:25</var> EDT</small><br><strong>Identified</strong> - We have identified an issue signing-in to the Skeddly front-end. The issue is related to caching in CloudFront.
<br />
<br />We have addressed the issue and are waiting for the changes to propagate.</p>tag:status.skeddly.com,2005:Incident/11811082017-04-01T22:16:12-04:002017-04-01T22:16:33-04:00RDS Database Throttling Issues<p><small>Apr <var data-var='date'> 1</var>, <var data-var='time'>22:16</var> EDT</small><br><strong>Resolved</strong> - All issues with our RDS instances have been resolved. Actions are executing normally again.</p><p><small>Apr <var data-var='date'> 1</var>, <var data-var='time'>21:51</var> EDT</small><br><strong>Monitoring</strong> - We have resolved our RDS throttling issues. Actions are executing normally and we are monitoring the action execution pipeline to ensure it continues to operate normally.</p><p><small>Apr <var data-var='date'> 1</var>, <var data-var='time'>19:55</var> EDT</small><br><strong>Identified</strong> - We have identified the connection issues with our RDS database. Write IOPS are being throttled. We are working to resolve the throttling.</p><p><small>Apr <var data-var='date'> 1</var>, <var data-var='time'>19:13</var> EDT</small><br><strong>Investigating</strong> - We are currently investigating connection issues to our RDS databases.</p>