Dispatches From The Geeks

News and Announcements from the MCS Systems Group

Author Archive

Systems Urgent File Server reboot at 5PM today, April 3rd

The server is back up and seems to be running cleanly.

We have not resolved the issue that has been hanging the system, but we should now be able to get reasonable logs to help us diagnose it.

Thank you again for your patience.

Written by Craig Stacey

April 3, 2017 at 5:44 pm

Posted in Uncategorized

Urgent File Server reboot at 5PM today, April 3rd

We must reboot the main MCS/CELS file server known as sto10.mcs.anl.gov at 5PM today.

We again experienced an unexpected outage on our main file server over the weekend, which took out or workstation and compute node infrastructure.

We have configured additional debug logging that should help in the event this happens again.

We anticipate the outage to last no more than one hour.

During this time logins to the login machines and workstations will not be possible and access to most project

You may need to reboot your machine once the system has come back on line.

We apologize for the lack of notice on this, it is an urgent issue.

Thank you for your patience

Written by Craig Stacey

April 3, 2017 at 4:35 pm

Posted in Uncategorized

Emergency sto10 File Server Reboot at Noon Today 03/14/2017

Greetings,

The reboot was completed successfully. We encountered only minor, and not unexpected, configuration issues.

We expect that the system will remain stable, we’ve added some additional monitoring to the configuration.

You may need to reboot your linux workstation.

If you notice anything important out of order (after you reboot) please let us know immediately.

Thank you for your patience

Written by Craig Stacey

March 14, 2017 at 1:39 pm

Posted in Uncategorized

Emergency sto10 File Server Reboot at Noon Today 03/14/2017

Greetings,

We have to reboot the main MCS/CELS file server known as sto10.mcs.anl.gov at Noon today.

We have experienced a number of unexpected outages on our main file server over the last week, which take out or workstation and compute node infrastructure.

We took the opportunity at the recent power outage to upgrade the system to a new patch level and we theorize this might be a contributing factor in the current problem and so we are rolling back to a previous patch level.

We anticipate the outage to last no more than one hour.

During this time logins to the login machines and workstations will not be possible and access to most project directories will be offline.

You may need to reboot your machine once the system has come back on line.

We apologize for the lack of notice on this, it is an urgent issue.

Thank you for your patience

Written by Craig Stacey

March 14, 2017 at 10:56 am

Posted in Uncategorized

Dayforce on rdp.mcs.anl.gov

If you currently access Dayforce via http://dash.anl.gov, you can ignore this note, it doesn’t affect you in any way.

If you currently access Dayforce via Remote Desktop to rdp.mcs.anl.gov, we’ve had to make a small change. The latest version of Firefox breaks Silverlight, which is required for Dayforce.

The global desktop shortcut to Dayforce on rdp.mcs.anl.gov has been changed to use Internet Explorer instead of Firefox.

Written by Craig Stacey

March 8, 2017 at 11:17 am

Posted in Uncategorized

All CELS Systems affected by power outage are back in operation

If you notice anything out of the ordinary, please report it to us at help@cels.anl.gov.

Thanks for your patience.

Written by Craig Stacey

March 6, 2017 at 3:47 pm

Posted in Uncategorized

Reminder: Data Center and service outage today March 6, 2017

See https://mcssys.wordpress.com/2017/03/02/reminder-data-center-and-systems-outage-monday-march-6-2017/ for details. Systems are starting to come down now in anticipation of the 9AM outage. Follow the twitter feed for updates.

We’re also fighting an unrelated issue taking out some services including collab.cels.anl.gov. Work on that is proceeding and we hope to have things back shortly.

Written by Craig Stacey

March 6, 2017 at 7:37 am

Posted in Uncategorized

Reminder: Data Center and systems outage, Monday, March 6, 2017.

This is a reminder of my previous announcement regarding Monday’s outage (see https://mcssys.wordpress.com/2017/02/13/data-center-and-systems-outage-monday-march-6-2017/)

There is one change worth noting from the initial announcement.  Wired desktop networks for CELS, MCS, and ALCF will also be down during this outage.  Linux desktops in 240 were already going to go down due to the back-end infrastructure going down, so there’s no real change there, but Mac and Windows users will have to use WiFi (which is unaffected) for network connectivity during the outage.

We will send a notice on Monday morning prior to the shutdown.  CELS compute and file servers will start going offline at 8 AM in preparation for the loss of power at 9AM.  We will send an all-clear announcement to the mailing list and the blog when things are back.  We will also provide periodic short updates via our Twitter feed at https://twitter.com/mcssys.  For any unexpected issues during the outage, please contact the CELS help desk at help@cels.anl.gov, or via phone at 630-252-6813.

Thanks.  And here’s to an uneventful outage.

Written by Craig Stacey

March 2, 2017 at 5:21 pm

Posted in Uncategorized

Data Center and systems outage, Monday, March 6, 2017.

In order to install a new breaker panel in the building 240 data center, electric power needs to be taken down for a subsection of the data center – unfortunately that section comprises the bulk of the computers in the room.

This outage will effectively take down the computers in the 240 data center. The outage window is expected to be from 9AM until 3PM on that day. The CELS IT systems affected will be:

* The CELS Linux/Unix services, including: file servers (project directories, home directories), login.mcs.anl.gov, building 240 linux desktops and linux print services, compute servers (see https://wiki.mcs.anl.gov/IT/index.php/General_MCS_Questions#computeservers for the list of compute servers), jenkins.cels.anl.gov, buildbot.mcs.anl.gov

* UPDATE: wired desktop networks for MCS, LCF, and CELS.

* accounts.mcs.anl.gov

* BIO divisional servers (Y Drive, X Drive, print server – use CIS server instead)

* SVN repositories hosted at repo.anl-external.org.

* The license servers for PGI, NAG, starcd, mathematica, totalview, idl, and accelrys.

* other project-specific systems and services not provided by CELS IT – expect to get announcements from the systems administrators of those systems.

The following will *not* be affected:

* Most websites hosted by CELS systems (including wikis, confluence, wordpress)

* E-mail list servers

* License servers for Intel, esgee, matlab

* ANL Authentication (anl.gov domain accounts, Single Sign On, etc.)

* ANL or externally provided apps (Exchange, Box, Workday, etc.)

* Building 240 office-side operations. Only the data center will be affected.

We apologize for the inconvenience. This outage was scheduled to coincide with scheduled downtimes already lined up for the primary systems in the room. Due to the scope of the work being provided, there is no way to perform this work in a safe manner without this level of power outage.

Written by Craig Stacey

February 13, 2017 at 9:39 am

Posted in Uncategorized

E-Mail Service upgrades at Argonne

In the coming months, Argonne will be moving its email service from Microsoft Exchange servers hosted on-site running an old version of Microsoft Exchange to a hosted solution run by Microsoft. I’ve been on this new service for about a year at least, and it’s great – much more capable than the current offering at Argonne.

You’ll get more communications directly from CIS about this when you’re scheduled to upgrade, including pointers to documentation and letting you know what you can expect while the migration is happening.

The actual migration process for this is really painless, as well. In the end, as a user, you’d notice a 5-10 minute period where you couldn’t connect to your mailbox. Also, depending on what method you use to read your mail (POP/IMAP especially), you might need a little extra assistance getting your settings changed. But most people the change will be near seamless.

Due to some technical reasons, it’s easier if we migrate divisions together (shared calendars and mailboxes being a big part of this). These migrations will be starting this month, and run through September at the latest. If your division has some sort of event or project in this time frame that you think would be negatively affected by your division moving during that time frame, let me know and I’ll block it out for the division.

Thanks!


Craig

Written by Craig Stacey

February 1, 2017 at 8:54 am

Posted in Uncategorized