Dispatches From The Geeks

News and Announcements from the MCS Systems Group

The Saga Continues

I have good news and bad news.  And worse news.The good news — we scrounged up enough power to get some critical TG machines back online.  The cluster itself is not affected by the outage, but the rack with all the management and login nodes is.  I’m not sure if things are actually back there right now, but TG users should get a separate notice from the TG admins.This leaves the affected systems at: Jazz, Cosmea, DSL testbed machines, and the NMPDR Mac cluster (I left that out of the last count).  Plus a blower fan for BG/L, though I believe things are operating okay without it.Okay, so that was the good news.The bad news — the four power panels (panels 1 through 4) will not be coming back any time before Monday.  These panels predate me.  And when I say they predate me, I don’t mean they predate my joining Argonne, I mean they predate my joining the human race.  In doing the work necessary to bring these back up, they made the happy discovery of what happens to really old wiring that’s been sitting in conduit for 10s of years.  The insulation becomes brittle and falls apart when moved.This means there’s an extensive rewiring effort that needs to take place.  Material and equipment needs to be scrounged and ordered, and this is before any work can actually take place.Okay, so that was the bad news.The worse news — when I say things aren’t coming back before Monday, I need to stress that it doesn’t mean they will be coming back on Monday.  If everything goes completely right (and, really, we’ve certainly not experienced that thus far), Monday is the soonest we can see it.Also, this rewiring doesn’t in any way solve the problem alluded to in yesterday’s update; specifically, that we don’t know if we’ll be able to bring all four panels back online.  At least three will be doable, and we’re exploring contingency plans that will either allow us to bring up the fourth panel, or power the things that are on it via alternate methods.Things are really out of our hands, but we’re doing everything we can whenever we find an element of this situation in which we can be a factor.  This is a thoroughly frustrating situation for everyone involved, and we really appreciate the lack of pitchforks and torches thus far.Hang in there!


April 3, 2008

