Dispatches From The Geeks

News and Announcements from the MCS Systems Group

COMPLETE: 11/12/2016 File Server Reboot

Greetings,

This is the ALL-CLEAR

The file server reboot completed without incident. All filesystems should be available again.

If you are having trouble do this first:

Restart your terminal session.
Reboot your workstation.

If this doesn’t seem to help then contact the CELS Helpdesk.

Thanks for your patience

Written by Craig Stacey

November 12, 2016 at 10:59 am

Posted in Uncategorized

REMIDER: 11/12/2016 File Server Reboot

File Server Reboot on Saturday, November 12th from 10:AM until 12 Noon

In order to apply important system updates and security patches we will be rebooting the file server called sto10.mcs.anl.gov this Saturday morning. We expect the work to take no more than an hour to complete but are scheduling a 2 hour “outage window” so we have time to take care of any unanticipated problems.

What do you need to do?
Before the outage begins you should suspend any automated scripts or processes that would access any of the subdirectories of the affected NFS mountpoints, (listed below). You should save any work you are doing in them and try to get any active terminal sessions out of the affected mountpoints. If you need advice on how to do this contact the CELS Help Desk
After the outage is over (indicated by our “all-clear” messages) you may need to reboot your work stations or log out and back in to your open ssh or terminal sessions.

If you feel that this outage window is unacceptable get in touch with us immediately so we can try to address your needs.

What are we doing?
We are shutting down and rebooting the file server named sto10.mcs.anl.gov

Why are we doing this?
There are a number of important operating system, security and service software updates available for the system.

When are we doing this?
We will begin work at 10:00 AM, Saturday, November 16, 2016

What services will be affected?
The UNIX NFS mounted “project directories” (listed below) will be offline for the duration of the outage.

How long will the outage last?
We anticipate that the outage will only last for about an hour, but it could be offline for up to 2 hours.

How will you know that the service is back online?
We will send an “all clear” message to this list, as well as a notice via twitter ( follow us at @mcssys ) and an update to the “Dispatches from the Geeks” blog https://mcssys.wordpress.com/

Thank you for your patience and we apologize for the inconvenience

List of NFS mountpoints that will be OFFLINE during the outage:

/nfs/UncertaintyClimate
/nfs/alcf-admins
/nfs/cels-media
/nfs/cels-systems
/nfs/geassm
/nfs/gtr.globus.org
/nfs/mcs-proj-climate
/nfs/mcs-proj-dynamics
/nfs/mcs-proj-magnetic
/nfs/mcs-proj-magnetic2
/nfs/mcs-proj-source
/nfs/ms-software
/nfs/ms-users
/nfs/noaa
/nfs/proj-climate1
/nfs/proj-climate2
/nfs/proj-davidk
/nfs/proj-dsl
/nfs/proj-emconsta
/nfs/proj-fischer00
/nfs/proj-fischer01
/nfs/proj-fischer02
/nfs/proj-fischer03
/nfs/proj-flash
/nfs/proj-fluids
/nfs/proj-genogrp
/nfs/proj-genomics
/nfs/proj-lans1
/nfs/proj-mpich
/nfs/proj-swat
/nfs/proj-sysbio
/nfs/proj-tpeterka
/nfs/sharp
/nfs/uso
/nfs/proj-phasor

Written by Craig Stacey

November 11, 2016 at 11:03 am

Posted in Uncategorized

NOTICE: 11/12/2016 File Server Reboot

File Server Reboot on Saturday, November 12th from 10:AM until 12 Noon

In order to apply important system updates and security patches we will be rebooting the file server called sto10.mcs.anl.gov this Saturday morning. We expect the work to take no more than an hour to complete but are scheduling a 2 hour “outage window” so we have time to take care of any unanticipated problems.

What are we doing?
We are shutting down and rebooting the file server named sto10.mcs.anl.gov

Why are we doing this?
There are a number of important operating system, security and service software updates available for the system.

When are we doing this?
We will begin work at 10:00 AM, Saturday, November 12, 2016

What services will be affected?
The UNIX NFS mounted “project directories” (listed below) will be offline for the duration of the outage.

How long will the outage last?
We anticipate that the outage will only last for about an hour, but it could be offline for up to 2 hours.

How will you know that the service is back online?
We will send an “all clear” message to this list, as well as a notice via twitter ( follow us at @mcssys ) and an update to the “Dispatches from the Geeks” blog https://mcssys.wordpress.com/

What do you need to do?
Before the outage begins you should suspend any automated scripts or processes that would access any of the subdirectories of the affected NFS mountpoints, (listed below). You should save any work you are doing in them and try to get any active terminal sessions out of the affected mountpoints. If you need advice on how to do this contact the CELS Help Desk
After the outage is over (indicated by our “all-clear” messages) you may need to reboot your work stations or log out and back in to your open ssh or terminal sessions.

If you feel that this outage window is unacceptable get in touch with us immediately so we can try to address your needs.

Thank you for your patience and we apologize for the inconvenience

List of NFS mountpoints that will be OFFLINE during the outage:

/nfs/UncertaintyClimate
/nfs/alcf-admins
/nfs/cels-media
/nfs/cels-systems
/nfs/geassm
/nfs/gtr.globus.org
/nfs/mcs-proj-climate
/nfs/mcs-proj-dynamics
/nfs/mcs-proj-magnetic
/nfs/mcs-proj-magnetic2
/nfs/mcs-proj-source
/nfs/ms-software
/nfs/ms-users
/nfs/noaa
/nfs/proj-climate1
/nfs/proj-climate2
/nfs/proj-davidk
/nfs/proj-dsl
/nfs/proj-emconsta
/nfs/proj-fischer00
/nfs/proj-fischer01
/nfs/proj-fischer02
/nfs/proj-fischer03
/nfs/proj-flash
/nfs/proj-fluids
/nfs/proj-genogrp
/nfs/proj-genomics
/nfs/proj-lans1
/nfs/proj-mpich
/nfs/proj-swat
/nfs/proj-sysbio
/nfs/proj-tpeterka
/nfs/sharp
/nfs/uso
/nfs/proj-phasor

Written by Craig Stacey

November 8, 2016 at 12:27 pm

Posted in Uncategorized

gitlab and xgitlab will be down from noon-1PM today for security patches.

Written by Craig Stacey

November 3, 2016 at 9:45 am

Posted in Uncategorized

Endnote X7 Site Licensing

Thanks to initiative from Anthony Avarca in CNM, who polled the various divisions on interest level, we’ve now got site licensing for EndNote X7

Please note these links are for CELS (MCS, ALCF, EVS, BIO, CELS & associated institutes) employees only.  Do not distribute these files or licenses beyond your own use.  The distribution links will require you to login to your Box account to retrieve the files.

When X8 is released, we will make it available via the above links as well.  Please note macOS Sierra is not fully supported, however will be under X8.

Written by Craig Stacey

November 1, 2016 at 12:56 pm

Posted in Uncategorized

Emergency upgrade of login.mcs.anl.gov

Due to a zero-day exploit of a Linux kernel vulnerability, we need to upgrade the machines login1, login2, login3, and login4.mcs.anl.gov (collectively known as login.mcs.anl.gov).

These machines will be replaced with new builds matching what’s running currently on login2.mcs.anl.gov (Ubuntu Trusty 14.04), but with the latest Linux kernel.

These upgrades are happening as we speak. I apologize for the short notice, but the level of exposure from this vulnerability requires us to patch these machines as quickly as possible. Other machines throughout the infrastructure (but not externally exposed) will be updated throughout the next few days.

Thanks, and sorry for the hassle.


Craig

Written by Craig Stacey

October 20, 2016 at 5:38 pm

Posted in Uncategorized

Postponed: File Server Maintenance

Once again we have to push this off. We’re awaiting resolution from a vendor on a problem we’re having with backups before we make this move. We don’t have a timeline on this resolution, so at this point the maintenance is postponed until further notice. I will announce it at least a week in advance when we’ve rescheduled it. Thank you.

Written by Craig Stacey

October 6, 2016 at 2:32 pm

Posted in Uncategorized

compute001 back online

The upgrade took longer than expected (slightly exotic hardware, plus it takes forever to boot due to its 1.5TB of RAM). Please let us know any issues you encounter, missing packages, etc.

Once the last of the compute servers is up to date next week, we’ll start on the remaining linux desktops. If you want to be at the head of the queue, send us a note at help@cels.anl.gov.

Thanks!


Craig

Written by Craig Stacey

October 5, 2016 at 3:57 pm

Posted in Uncategorized

Compute server upgrades: compute001 9AM tomorrow

We pushed back a bit on the schedule previously announced. We’ve got compute001 slated for 9AM tomorrow, Wednesday 10/5, and next week we’ll finish off with stomp.

As before, during each rebuild, the machine will be unavailable for some portion of that day. We’ll announce the shutdown on the machine itself to all logged-in users 30 minutes prior to shutdown. After the machine is rebuilt, you’ll need to recreate any crontabs you had in place. Also note /sandbox is not backed up and data will be lost – never keep data in /sandbox that can’t be easily reproduced.

If you notice software packages missing or other oddities, please report them to help@cels.anl.gov.


Craig

From: <cels-systems-announce-bounces@lists.anl.gov> on behalf of Craig Stacey <stace@anl.gov>
Date: Tuesday, September 6, 2016 at 8:58 AM
To: "cels-systems-announce@lists.anl.gov" <cels-systems-announce@lists.anl.gov>
Subject: [Systems Announce] Compute server upgrades continue

We’re pushing through on updating the remaining 64 bit compute nodes to Ubuntu 14.04 Trusty. Here’s the schedule:

This week (through Sep 9)

thwomp.mcs.anl.gov

vanquish.mcs.anl.gov

Next week (Sep 12-16)

trounce.mcs.anl.gov

churn.mcs.anl.gov

Week 3 (Sep 19-23)

crush.mcs.anl.gov

crank.mcs.anl.gov

grind.mcs.anl.gov

Week 4 (Sep 26-30)

compute001.mcs.anl.gov

steamroller.mcs.anl.gov

Week 5 (Oct 3-7)

stomp.mcs.anl.gov

During each rebuild, the machine will be unavailable for some portion of that day. We’ll announce the shutdown on the machine itself to all logged-in users 30 minutes prior to shutdown. After the machine is rebuilt, you’ll need to recreate any crontabs you had in place. Also note /sandbox is not backed up and data will be lost – never keep data in /sandbox that can’t be easily reproduced.

If you notice software packages missing or other oddities, please report them to help@cels.anl.gov.

We’ll start this week’s batch of machines tomorrow (Wednesday, September 7).

Let us know if this presents any problems.

Written by Craig Stacey

October 4, 2016 at 11:39 am

Posted in Uncategorized

Gitlab Maintenance Complete: Proposed CIS Maintenance weekends for FY17.

First up, the Gitlab maintenance announced earlier today is complete. The UI is a bit different – seems to be a more unified interface between desktop/mobile, so if you don’t see what you expect, hit the “hamburger menu” up in the upper left corner.

Secondly, CIS has proposed the following potential maintenance weekends:

November 4-6, 2016

May 5-7, 2017 (APS Maintenance)

August 25-27, 2017 (APS Maintenance)

As noted, the May and August weekends are designed to coincide with APS Maintenance. There’s no set expectation of what is or isn’t going to be available for the above weekends, but here’s what you can expect:

* We (CELS) will likely time any upgrades we have to coincide with these.

* ANL business systems will likely be affected

* ANL network access could be affected, though that doesn’t necessarily affect MCS/LCF. But it might.

Bearing that in mind, please send me (stace@anl.gov) any objections to this maintenance schedule so I can forward up the chain.

Thanks!


Craig

Written by Craig Stacey

September 29, 2016 at 5:16 pm

Posted in Uncategorized