by Frank on August 1, 2011
July was an interesting month. We had our first large network outage in a long, long time. Rather than repeat all the details, you can read about it here. The good news is that things basically worked as they should and it only ended up affecting one server.
- dogato – 99.673% (1 outage, 2 hours 26 minutes)
- epsilon – 100.0%
- mars – 100.0%
- mail31 – 99.998% (1 outage, 1 minute)
- minbar – 100.0%
- vorlon – 99.987% (1 outage, 6 minutes)
- whitestar – 100.00%
by Frank on July 22, 2011
We are currently having an issue with one of our distribution switches. The problem with the switch is affecting sites on Dogato and some of our VPS’s. We currently do not have an ETA for recovery, but we will post updates to this blog post.
=== 00:55 23-Jul-2011 update ===
First the good news, everything is back up and on-line.
Now the details. For a yet undetermined reason the distribution switch that failed had a corrupt configuration. This should not happen, but it did. Once the network technicians determined what the problem was they invoked the disaster recovery plan for the switch. The configuration for the switch was restored, the switch was tested and then it was brought back on-line. The network group will follow up with Cisco to determine why it got corrupted in the first place.