Recent Hiccups May 2016

We faced some blackouts recently, and it was because we got very popular in china.

The images that was generating lot of traffic were violating our TOS, but as they were generating lot of traffic and I thought it was a good time to stress test our image delivery systems.

At first we were handling the load fine. after some time our redis server start acting wired. the issue was with dumping the db to disk, but as the images were generating lot of data the redis server crashed. we started to migrate the all gif images to be served by our CDN while still collecting views data. but at some point our real time stats handling script started to fail due to redis connection issues. after that the data started to collect and the redis server crashed every time it was started.

While all that was happening we noticed that our php front end started throwing errors. this was due the connectivity issues to our redis server.

The image serving end do not use redis server and was not afflicted. its using no dependencies at all to minimize downtime in our image serving end.

We then decided to run redis on ram and to automatically remove unnecessary data form redis server if it got big. and we thought the stress test was enough and deleted all the images that was violating out TOS.

Even after deleting those images, they were generateing a lot of traffic on 404 errors and we decided toredirect all 404 traffic automatically on high server load.

All the scripts were stress tested and were optimized during and after the test and we are working to improve our frond end, so it does not get affected by redis server or mysql server connectivity issues.

Thanks for reading.