Jump to content
Sign in to follow this  
okc_smoker

(Archived) Maintenance?

Recommended Posts

Usually when sites are down for scheduled maintenance they give an ETA for when it will be back up or at least some advanced warning. Is this regular maintenance or has there been some kind of catastrophe? Any idea when the elephant will wake up? I'm going into withdrawals here! :)

Thanks!

Share this post


Link to post

Unfortunately, we had a multi-hour outage of the service today due to a redundant pair of servers that both failed (one on Saturday, the other today). These servers hold the master user database of usernames and passwords. This data is simultaneously stored twice on each of those boxes for maximum redundancy and availability, and then backed up nightly to "near-line" storage in case of a catastrophic outage, for a total 6 copies on different disks.

The hardware failure on the redundant database servers did not cause any data loss, but we needed to move the current database onto different hardware after the second machine went down. This took some time to properly prepare and test before bringing the service back alive.

We apologize for any inconvenience this may have caused you. Our high level of redundancy has avoided this sort of outage in the past, but the timing of these two failures really hurt. We'll be looking at how we can reduce the likelihood of this type of problem in the future, and to better catch the warning signs of such problems before they occur.

Thanks

Share this post


Link to post
Unfortunately, we had a multi-hour outage of the service today due to a redundant pair of servers that both failed (one on Saturday, the other today).

Well, stuff happens, and this is understandable.

However, what is a bit unacceptable, is that the status page claimed that this was "scheduled maintenance", and that there was really no mention of this anywhere else (e.g., the blog), except here, in response to customer complaints. That's not good customer service.

Share this post


Link to post

Yeah, the load balancers directed all traffic to a separate web server that still had an old version of the "we're down" page. We noticed this a bit later and fixed the text, but it was misleading for a while.

Share this post


Link to post
Guest
This topic is now closed to further replies.
Sign in to follow this  

×
×
  • Create New...