Posts

Showing posts from October, 2014

October 6 server outage - Post Motem

below is the post mortem report of the October 6 outage.

Our server had an outage at 13:30 GMT due to a high trafficked image. (http://beeimg.com/view/h8150456397/) according to the referral logs the image was linked to several "LiveJournal" blogs. The outage was recorded by pingdom at 13:31 GMT. (http://stats.pingdom.com/pdc110r2vx7j/889337/2014/10) I was notified about this outage at 13:36 GMT via SMS (Email too) by Pingdom. Which then I quickly went online and started investigating this issue.(https://twitter.com/beeimg/status/519120005695537152)

Normally whole server is monitored and if the Apache server is crashed, it automatically get restarted. during this outage the whole server got crashed making the monitoring mechanism useless. the server was returning pings, meaning the server was accessible, but was under high load. I gave the server a hard reboot at 13:50 GMT via DigitalOcean CP and the server was back online 13:52 GMT. The server was completely inaccessible for…

October 6 server outage - Post Motem

below is the post mortem report of the October 6 outage.

Our server had an outage at 13:30 GMT due to a high trafficked image. (http://beeimg.com/view/h8150456397/) according to the referral logs the image was linked to several "LiveJournal" blogs. The outage was recorded by pingdom at 13:31 GMT. (http://stats.pingdom.com/pdc110r2vx7j/889337/2014/10) I was notified about this outage at 13:36 GMT via SMS (Email too) by Pingdom. Which then I quickly went online and started investigating this issue.(https://twitter.com/beeimg/status/519120005695537152)

Normally whole server is monitored and if the Apache server is crashed, it automatically get restarted. during this outage the whole server got crashed making the monitoring mechanism useless. the server was returning pings, meaning the server was accessible, but was under high load. I gave the server a hard reboot at 13:50 GMT via DigitalOcean CP and the server was back online 13:52 GMT. The server was completely inaccessible for…