Outage on 11-30-2018
Posted Nov. 30, 2018 by: wilhelms
Initial symptoms indicated a route block through one of
our IP providers. Upstream provider was contacted to
drop the IP route for this provider. Additional
information was provided by upstream after detailed
log analysis that indicated the issue was with our BGP
flapping. On site detailed inspection revealed a core
switch was showing both (redundant) 6000W power
supplies in a failure state.
Both were fully operational up until the outage. The chances of both failing at the same time are extremely miniscule. Our network engineer was able to provide a stable system on one of the power supplies. We suspect a short in one was causing both to indicate a failed status and provide only partial power to the switch.
This partial power state was causing BGP routes to flap and so the IP provider mentioned above was blocking this unstable routing state. We are going to continue to troubleshoot and both power supplies and our spares to get redundant power restored. We already had plans in progress to replace this switch.