Dispatches From The Geeks

News and Announcements from the MCS Systems Group

ANL site wide power outage

More details as we have them.

July 28, 2016 at 2:51 pm

MacOS, iOS, and browser security updates needed.

See the attached note from Cyber. Short story is if you’re not current on MacOS, you’ll start getting blocked by the proxy if you’re on Argonne Auth wifi.

Systems is taking care of machines we manage.

It appears that Apple has fallen victim to the a vuln very similar to last year’s StageFright found on Android systems last year.

Using almost any/every method (MMS, iMessage, Mail, web browsing, … to get a trojan TIFF image to the device, a buffer overflow can be exploited to run anything on the system the malware wants to do.
This was patched in last weeks patch set from Apple. This needs to be applied to any iPhone, Mac, AppleTV, and even Apple Watch.
You’ll need to install one of these on your device.
iOS 9.3.3
El Capitan 10.11.6
tvOS 9.2.2
watchOS 2.2.2
latest patch set for 10.10.5 Yosemite.

Starting Tuesday, July 26, we will be updating the web filter block list to include MacOS 10.11.5 to the outdated software list. You should already have this applied.

So people are aware, Apple still supports 10.10.5, but patch detection on that is not as easy to see if it is up to date or the original release. Please make sure these machines are up to date. Anything below 10.10.4 is going to be blocked. This will include any of the 10.9 and 10.8 releases. Those upgrades should have been completed some time ago.
On the web browser front,
Chrome has updated to version 51 for everything and headed to version 52. We will be blocking anything indicating version 49 and below.
Firefox is at version 47 for Stable release and 45.2 for Extended support release. Anything below those releases will also be blocked.
Patching should be routine, so this shouldn’t impact many systems.

July 25, 2016 at 10:15 am

Systems Announce Upgrading v8.mcs.anl.gov 7/11/16

v8 is now rebuilt. Feel free to test it out and let us know what pieces you think are missing from the new Trusty linux environment.

July 11, 2016 at 12:18 pm

Upgrading v8.mcs.anl.gov 7/11/16

We’ve got a two phased linux compute environment upgrade we’re in the process of doing this year. Phase one is rolling out an Ubuntu 14.04 build (aka Trusty), and phase two will be rolling out a CentOS 7 build. Phase two won’t be starting until end of summer, but phase one is already under way, with some destops and servers serving as early tests.

The server v8.mcs.anl.gov is running a very old Ubuntu Build (10.04, aka Lucid), so we’re going to target that as the first of the compute servers to get the upgrade.

We’ll be taking the machine down on Monday morning, and reinstalling it as Trusty, at which point we’ll announce when it’s back up and encourage you to use it and let us know what’s missing from it. This will allow us to fine tune the Trusty environment and make sure all the machines running at that level are best suited to your needs.

What you need to do: If you’ve got data in the /sandbox on v8, make sure you’ve got it copied elsewhere, since that will be erased. If you’ve got cron jobs that run on v8, make a copy of your crontabs so you can replicate them on the new v8.

After we’re done getting that machine in a good state for your needs, we’ll announce the schedule of the updates to the rest of the compute servers. At that time we’ll also ask for more volunteers to have their desktops updated.

If this outage poses a significant problem for you, please let us know so we can reschedule it.



July 7, 2016 at 3:48 pm

New IT Support staffing and conference rooms in 240

Hi, all! I’ve got a couple of quick announcements to knock out here, so let’s get to it.

New staff!

Jasan left our group a few weeks ago for a fantastic opportunity, and we’ve just now been able to bring on his replacement. Please welcome Steve Verdone to the group! He’ll be manning the Help Desk on and off and handing user support issues with a focus on the Mac side of the world. Steve joins us from his prior gig at Apple as an Apple Expert. He’s getting up to speed on how we do things here, but I’m sure you’ll like him as much as we do – he’s a smart, friendly guy. Stop by and say hi.

Starting in July, we’re also going to be getting some additional help in getting the BIO IT environment inventoried, updated, and migrated into the ANL and CELS infrastructures. Jeff Hinthorn will be lending us some of his time over the next few months, splitting his duties between us and HEP. Jeff’s a seasoned Windows admin and his time will almost entirely be focused on BIO through the end of this fiscal year. We’re still finalizing the schedule, but he’ll be spending his time in building 446 at our IT desk over there daily. I’ll send a separate note to BIO when the schedule is finalized.

240 conference rooms:

Based on some feedback I’ve been getting, I want to send a reminder out about booking the conference and meeting rooms in the main 240 building. Instructions on doing so can be found at http://tcs.anl.gov/for-tcs-tenants/meeting-and-conference-rooms-in-tcs/ and it’s worth paying special attention to the highlighted section:

When you reserve a room, pay attention to the email response you receive back. Unless you get an e-mail indicating the room has “accepted” the meeting, you have not reserved the room. If in doubt, check the “Web View” links on the calendars below, which update every 5 minutes.

We’ve had people who think they’ve reserved the room, but either ignored the response from the room saying the room was already booked, or didn’t actually include the room as part of the invite, so it was never booked to begin with.

I’m investigating options for putting live schedule views at the entrances to the rooms to help avoid these issues in the future. More on that as it develops.

June 29, 2016 at 12:12 pm

CELS IT Systems back online

The core IT services provided by CELS Systems are back online. This includes the BIO infrastructure, the MCS general computing infrastructure, and any virtual machines CELS Systems hosts. Other larger systems (such as the compute clusters) will come back online as previously announced by their admin teams.

We had a few issues during the outage, which I’ll do a full post mortem on later in the week.

When you return to your desk on Monday, if your desktop isn’t working as you’d expect, first try rebooting it. If it still isn’t working, let us know at help or at x6813, or in person at the Help Desk.

Thanks, all. Enjoy the rest of your weekend, I know I will.

June 18, 2016 at 5:35 pm

Reminder: Building 240 power outage June 17 through 19, CELS IT services affected

First announcement: https://mcssys.wordpress.com/2016/05/26/cels-systems-outage-june-17-19-2016/

Second announcement: https://mcssys.wordpress.com/2016/06/13/cels-systems-outage-june-17-19-2016-2/

For status updates during the outage you can follow our twitter account at https://twitter.com/mcssys

First, updates for BIO (MCS/CELS announcements follow this section):

I’m awaiting word that the switchover is complete, but we should have DHCP (network address assignment) switched over to CIS servers today, which means if a machine reboots during the outage, it will still come up. The BIO authentication server backup is running in the 221 data center and will stay up, so logins will continue to work. Y, Z, and X drives will be down. Linux systems will not be able to login to BIO or ANL accounts, and linux shared file systems will be down.

The canon copiers are now available on the CIS print server (which will stay up during the outage). You can see them here: \\printers.anl.gov. The printers installed are:

onewestcopier.bio.anl.gov (imageRUNNER2525)

q104copier.sbc.anl.gov (iR-ADV 6055)

a102copier.bio.anl.gov (iR-ADV 6055)

If you normally print directly to the copiers, that would work as usual. The Xerox and other printers are not yet on that print server, we’ll get those moved over after the outage and retire “BIOPRINT”.

Kat will be on-site at our desk in 446 to assist tomorrow, and the team will be monitoring the ticket queue.

Updates for MCS/LCF/CELS:

The CELS user-facing linux environment will be down, as previously announced, including login and home file servers. We will issue a system-wide shutdown command to all linux workstations tomorrow morning. If you run a mac or a self-managed machine, please shut down the machine before you leave tonight. Mac file servers will shut down at COB today, as will our tape backup servers.

On Monday, if anything is not working as expected on your computer, please reboot first, then if that doesn’t fix the issue, report it to help or call the help desk at x6813.

The list of user-facing machines that will *not* go down is included below. The ALCF accounts webpages will have a brief downtime tomorrow for maintenance on the server, but will be up by afternoon. Note that repo.anl-external.org is one of the sites that will unfortunately be down during the outage.

The outage window is advertised to be until Sunday, though it’s very likely power will return on Saturday. However, it may take some time to get systems back, so please do not expect anything to be back in operation until we say it is. I will send a notice to this list, which will also post to the previously noted Twitter account and to the blog at https://mcssys.wordpress.com, when we believe things are back in normal operation.

Some systems not part of CELS core IT may not return until Monday – please watch for communications from the teams for those systems. Also note any all-clear I give is purely related to the IT systems – the building will not resume normal business operations until Monday morning.

These servers will not go down as part of the outage:

app001.cels.anl.gov (collab.cels.anl.gov – Confluence)

app003.cels.anl.gov (gitlab.cels.anl.gov – internal gitlab)

app006.cels.anl.gov (dev.esg.anl.gov, dev.esgf.anl.gov)

app007.cels.anl.gov (www.esg.anl.gov, http://www.esgf.anl.gov)

beehive0.mcs.anl.gov (waggle project)

beehive1.mcs.anl.gov (waggle project)

beehive2.mcs.anl.gov (waggle project)

ca.mcs.anl.gov (certificate authority)

caveat.mcs.anl.gov (MPICH sv/trac)

coredb.mcs.anl.gov (internal database server)

cyclone.mcs.anl.gov (jira.cels.anl.gov)

davmail.mcs.anl.gov (Exchange/WebDAV connector)


gust.mcs.anl.gov (WordPress sites)



hub-221.mcs.anl.gov (radius)

jenkins.mcs.anl.gov (build test server)

kerdap-2.mcs.anl.gov (MCS/CELS kerberos/LDAP server)

kerdap.jlse.anl.gov (JLSE LDAP/kerberos server)

lic001.cels.anl.gov (License server for some software packages)

mon001.cels.anl.gov (monitoring server)

newnewman-1.mcs.anl.gov (email relay)

newnewman-2.mcs.anl.gov (email relay)

nginx.mcs.anl.gov (web proxy front-end for CELS-hosted websites)

owney.mcs.anl.gov (CELS-hosted mailman lists)

rdp.mcs.anl.gov (Windows terminal server)

rt.mcs.anl.gov (Request Tracker ticketing system)

squall.mcs.anl.gov (ALCF websites)

typhoon.mcs.anl.gov (JLSE wiki)

variant.mcs.anl.gov (svn/trac server)

wilbur.mcs.anl.gov (MCS webpages not hosted by CIS)

wind.mcs.anl.gov (wordpress, mediawiki sites)

xgitlab.cels.anl.gov (externally available gitlab server)

yubi-221.mcs.anl.gov (CELS One Time Password server)

June 16, 2016 at 1:41 pm

CELS Systems Outage, June 17-19, 2016

(Previous announcement here: https://mcssys.wordpress.com/2016/05/26/cels-systems-outage-june-17-19-2016/)

Here’s an update on the state of things for the outage this Friday. Please read the above for context. Another reminder/announcement will be sent on Thursday.


Aside from what’s noted in the prior announcement, please note we will be turning off the Mac file servers on Thursday evening at close of business. Please make sure you have the files you need moved to Box or your local machine prior to 5PM on June 16.

The morning of June 17 we will issue a building-wide shutdown command to all linux workstations managed by us. If you self-manage your machine, or if you run a Mac or Windows machine, please shut it down before you leave on Thursday to ensure no data loss on the local disk. Remember, you won’t be coming into building 240 at all on Friday, it will be off limits to everyone (including Systems).

The list of machines that will stay up is largely the same as last time (https://mcssys.wordpress.com/2015/05/27/reminder-tcs-power-outage-june-1-2015/) with the addition of gitlab.cels.anl.gov and xgitlab.cels.anl.gov and some back-end services. I’ll post the complete list later this week.


Work to reduce the impact of this outage on your work is progressing well. Our expectation is that login services for bio.anl.gov workstations and DHCP (the service that provides the network address for your workstation) will remain up and running. BIO File servers will, however, be down. The BIO print server will also be down.

If you need to print something on Friday, you can connect to the canons from your web browser (ex: http://a102copier.bio.anl.gov) and print PDFs that way. Save the document you want to print as a PDF, go to the printer in your web browser, and click “End User Mode”, then login (no PIN). Click “Direct Print”, then under “Specify File”, choose “Browse” and select the PDF you want to print. Once the options below look like what you want, click “Start Printing”.

Kat will be over there through most of the day to help with issues, and we’ll be monitoring the support queue at help.

June 13, 2016 at 3:20 pm

gitlab.cels.anl.gov and xgitlab.cels.anl.gov maintenance, Wed Jun 8, 12-12:30pm

In order to apply some security patches to our gitlab servers, we’re going to have a brief outage on Wednesday afternoon (June 8) from noon to 12:30pm. During this time, the servers won’t be reachable, pushes and pulls will fail, etc. No data loss will occur, and everything will be back in operation within 30 minutes.

If this poses an undue hardship, please let me know and we’ll reschedule. Thanks!

June 3, 2016 at 1:21 pm

CELS Systems Outage, June 17-19, 2016.

As you may be aware by now, building 240 is undergoing a complete shutdown of power beginning in the morning of Friday, June 17, with an outage window extending into Sunday, June 19. We hope the outage will be shorter than that, but fully expect it will last until the evening of the 18th at the absolute earliest.

This affects all computers in the building 240 data center. Each IT organization is going to be notifying its users of the impact on them, and that’s what I’m writing to you about today. Our served customer base in CELS has grown since the last time we had to endure one of these, so I’m going to break it out a little bit, and I expect I will also have some more detailed mails that will only be targeted at the BIO division after the fact. This message’s goal is to give you a heads up that this is happening, and make sure you plan accordingly.

I’ll make some division-specific announcements below, but everyone can expect affected compute systems to start going down beginning by 6AM on June 17, and shutdowns will be complete by 9AM. Because your network files will not be available, we encourage you to make sure you have files and data you need locally for that day. Getting accustomed to working in Box can help with that. You can find more information at http://inside.anl.gov/services/box.

We’ll send more announcements on this as we get closer to the date. You can also keep up to date via twitter (@mcssys) or WordPress (https://mcssys.wordpress.com).


The affect of this work so closely mirrors the work that happened a year ago that I’m largely going to crib from that announcement. Pardon my unoriginality:

The short answer is that it’s easier to say what won’t be affected. Mail services we provide (forwarding for mcs.anl.gov, alcf.anl.gov, cels.anl.gov, ci.uchicago.edu, etc.) and mailing lists will be unaffected. Most web sites we host (WordPress, Confluence, etc.) will remain up. We’ll notify site owners of any exceptions to this. CIS-provided services (e-mail, web, business systems) are generally unaffected. Externally hosted services (Box, Dayforce, TAMS) are unaffected.

Now for the info you really need — what will be down. All MCS/CELS file and compute servers will be down. This includes SSH logins (login.mcs.anl.gov), unix and Mac home file servers, linux compute servers, all desktops, and all networking in building 240. We had planned to move the subversion server at repo.anl-external.org to the 221 data center, but have not been able to accomplish that in time for this work and thus it, too, will be down.

It’s outside the scope of this announcement, but I’ll also just remind you if it’s in the data center, it’s down. So that means LCRC, Mira and friends, Beagle, Magellan, Chameleon… you get the gist.

I will send an update closer to the outage detailing the exact services that will still be up, just like I did last year. (See https://mcssys.wordpress.com/2015/05/27/reminder-tcs-power-outage-june-1-2015/ for historical reference.)


I’m not sure what Rocky told you during last year’s power work. I can tell you that the net effect will be the same and we’ll take what steps we can to minimize the effect on you. That being said, all services hosted in 240 will go down, and that’s where your entire back-end services live at the moment, so the effect will be felt. We’ll reduce your dependency on the server that handles giving out network addresses to wired computers so that as long as you don’t reboot you’ll stay up on the network. Your file server will go down (bioxshared, Y drive, Z drive), and you may have issues logging in to your computer if you logout or reboot.

Unfortunately, the work to move BIO computers off the BIO child domain and remove those dependencies is starting in June, but won’t be complete in time for this outage (it will take some time). We will, however, get what pieces we can in place prior to the outage. I’m personally carving out a chunk of time next week to see if I can’t at least get the DHCP (IP address assignment) component put away and have that functional during the outage. File systems are a bigger fish to fry and will take a lot more time. As such, we’re strongly encouraging you to embrace Box if you haven’t already. I’ll spend a bit of time helping folks out with Box at our session next week.

We will be taking the opportunity this outage brings us to make some improvements in the power layout of your servers and switches to make them more resilient to partial power failures. Alas, nothing short of a big ole generator makes them resilient to a total outage like this.

I will send weekly updates to BIO on the progress to minimize the effect this outage has on the division. Next week’s update will be summarized at the previously announced tech session in the auditorium.

Thanks, and please understand – if we had any say in this, it wouldn’t be happening. But we’ll power through it as best we can.

May 26, 2016 at 5:41 pm

