Outage on 11-30-2018

Initial symptoms indicated a route block through one of our IP providers. Upstream provider was contacted to drop the IP route for this provider. Additional information was provided by upstream after detailed log analysis that indicated the issue was with our BGP flapping. On site detailed inspection revealed a core switch was showing both (redundant) 6000W power supplies in a failure state.

Both were fully operational up until the outage. The chances of both failing at the same time are extremely miniscule. Our network engineer was able to provide a stable system on one of the power supplies. We suspect a short in one was causing both to indicate a failed status and provide only partial power to the switch.

This partial power state was causing BGP routes to flap and so the IP provider mentioned above was blocking this unstable routing state. We are going to continue to troubleshoot and both power supplies and our spares to get redundant power restored. We already had plans in progress to replace this switch.

