<div dir="ltr"><div>OPLIN would like to apologize for yesterday's extended internet outage that affected so many of you. We are very aware of the importance of the online services you provide to your communities and regret that those critical services were interrupted for so long yesterday.</div><div><br></div><div><b>Outage timeframe</b>: 2017/9/19 13:45 - 20:43, 6 hours 57 minutes</div><div><br></div><div><b>Short answer</b>: </div><div><br></div><div> Yesterday at 1:45pm the OPLIN core routers stopped routing traffic between them, which cut the Spectrum serviced libraries off from the network. The secondary core has been bypassed to restore connectivity while we determine exactly what happened to destabilize the router pair.</div><div><br></div><div><b>Long answer</b>: </div><div><br></div><div> For the past eight weeks OPLIN has been operating two Juniper MX480 routers in a Virtual Chassis pair, instead of a single MX480 core router. This change is part of a larger project for the OPLIN core as we add redundancy, and also physically move the core into the two primary network rooms of the SOCC. The first live Spectrum connections moved onto the new router in NR2 (Network Room 2) about 4 weeks ago, utilizing a new 20Gb Aggregate Ethernet handoff from the vendor. Two weeks after that the remaining 150 Spectrum circuits were migrated onto the new trunk. </div><div><br></div><div> Yesterday at 1:45pm the two OPLIN cores stopped routing traffic between them, which cut the Spectrum circuits off from the rest of the network. Attempts to reestablish communications between the two cores destabilized the live core and disrupted traffic to the entire OPLIN network at ~2:40pm. Rather than risk further disruption to the rest of the network, we decided to bypass the secondary core and focus efforts on piping the Spectrum circuits directly to the functioning core.</div><div><br></div><div> Our new problem then became that the Spectrum handoff is multiple rooms/floors away from the primary core, with the only direct path between the two being the links for the Virtual Chassis, which was the wrong type of fiber. We attempted cannibalize the links and to use a switch to convert the media and pass the trunk through, but ran into troubles with configuration that kept the Aggregate Ethernet interface from coming up cleanly. In the end we resolved the issue by identifying a path of jumps though fiber panels that allowed us to jumper up to the live core, restoring connectivity for all Spectrum serviced locations.</div><div><br></div><div><b>Moving forward</b>:</div><div> </div><div> Today I'll be sitting down with OIT and Juniper to determine exactly what went wrong with the Virtual Chassis link yesterday. If the issue can be isolated and corrected, then utilizing Virtual Chassis will make things more reliable and easier to manage. If there's any remaining question as to the reliability of the technology, then we'll simply fall back to a more labor intensive but older redundancy technology. </div><div><br></div><div>Either way, I'm sure we're going to have some after hours maintenance work to announce in the near future. :)</div><div><br></div><div>Sorry again for all the hassle, I know how quickly bad days for us turn into bad days for you.</div><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><pre cols="72"><font face="arial, helvetica, sans-serif"><span style="font-size:12.8px">Karl Jendretzky
IT Manager - Ohio Public Library Information Network
(614) 728-5252
<a href="mailto:karl@oplin.ohio.gov" target="_blank">karl@oplin.ohio.gov</a></span></font><br></pre></div></div></div></div></div></div></div></div></div>
</div>