Munson Healthcare officials distributed a memo on Wednesday that explained Tuesday’s systems failure that affected Munson Medical Center, Paul Oliver Memorial Hospital and Kalkaska Memorial Health Center and various clinics. The following memo is attributed to Chris Podges, Munson’s vice president of information systems.
Related Story: Munson has 4-hour communications failure
“As you are all aware, we experienced an unplanned network downtime (Tuesday) that had widespread operational and clinical implications. Briefly, here is what happened:
Munson’s data centers’ connectivity to the outside world runs primarily on two redundant high speed fiber optic circuits administered by Traverse City Light and Power. We were informed by them that they needed to take one of the circuits off-line in order for them to do maintenance.
This would leave us operating on one circuit for the duration of their planned, 12-hour downtime. This shouldn’t have been any problem for us and is precisely why we have parallel, redundant technology on our most important systems and infrastructure. We have frequently tested for an event like this (losing one of the circuits) by manually “switching off” a fiber circuit.
In our testing, the remaining circuit took on all the traffic, just as it was architected to do; no hiccups, no instability, no impact on users, no downtime. And that is what we fully expected yesterday morning when one of the circuits was taken off line.
Unfortunately, that isn’t what happened. The core switch of the remaining circuit became confused, couldn’t take over the role as the primary switch (a transition which is measured in milliseconds) and ultimately shut down. Once down, everything running on the network – applications, paging systems, wireless devices, IP phones, etc., went down with it.
Given the circumstance, the recovery of the network went relatively quickly (approximately 2 hours). But the applications that ran on it, because of the way they shut down so abruptly, were more difficult to bring back on-line. We were all mostly back on line by noon, approximately 4 and ½ hours after the initial shut down.
During the downtime, we assembled a small army working on three objectives:
1. Make sure the hospitals and clinics could operate - especially as to the provision of patient care – on downtime procedures
2. Communicate as comprehensively and as often as we could
3. Fix the technical issues
While the reports are that all hospitals and clinics did a fantastic job surviving the down-time, we fully understand that it was very difficult to manage the resultant chaos and that downtimes like this are unacceptable.
Thursday morning at 2 a.m. we are going to re-introduce the second fiber optic circuit into our network architecture. While we expect no issues, we’re planning otherwise. This afternoon your organizations will receive specific instructions on how to prepare for the event of another network outage; What to print in advance of 2 a.m., what resources are available to you during the downtime, how to get needed clinical information without the use of computers, who to call for help, etc.
Again, we do not expect any downtime tomorrow morning, but we did re-learn some valuable lessons yesterday and the safety of our patients is the number one objective should the network experience another issue. We’ll be ready at 2 a.m. and we want your organizations to be ready, too.
We are working diligently to understand what happened yesterday and will share with you what we learn and our plans to remedy whatever may need attention.
Finally, we have made every effort to get this e-mail to the right people in your organizations. If you would like to forward this to anyone that should get it that has been omitted here – especially members of your medical staff – please do so.
I appreciate your patience and understanding. We’ll do everything possible to see that this does not happen again.
Thank you. Chris Podges”


