MATRIX Increase Error Rate
Incident Report for CYPHER Learning
Resolved
This incident occurred for approximately 58 minutes and services were restored fully at 3:58 AM CDT (UTC-5).

This was due to an attack on our site that took advantage of a specific visitor-facing feature - SMS OTP verification during signup. The backend code that processes an SMS OTP verification was making an SQL call that could take a few seconds to execute, and the attack accessed that page very rapidly which in turn caused a heavy load to be placed on our database. This in turn cause a general slowdown with some page loads timing out.

We just deployed a fix that does two things to avoid this situation in the future:

- it adds additional checks on the front end so that automated attacks on the SMS OTP are not possible in the first place, and
- it optimizes the code that processes valid requests so that the SQL call takes milliseconds instead of seconds

Please note that there was no security or database integrity risk during that attack - it was effectively a denial-of-service attack on a very specific and rarely-used visitor-facing feature. This particular attack vector has now been neutralized as a result of our fix.

We sincerely apologize for the outage, and rest assured that we take things situations seriously. We commit senior engineers immediately to investigate and fix these kinds of issues when they happen.
Posted Jun 21, 2023 - 08:00 UTC