Dispatches From The Geeks

News and Announcements from the MCS Systems Group

Author Archive

Power work in building 240 you need to know about.

We’ve received notice regarding upcoming power work for building 240. Time frames have not been set, but this is something that will take out the entire building for a weekend, and many servers and services we provide. The current expectation is a weekend in June will be scheduled for the work. This work is to install necessary power for the installation of Theta in ALCF, the newest supercomputer coming on-site.

As we approach the date (once it’s decided), we’ll have a more definite list of what will and won’t be affected, but it’s safe to say all compute resources, file servers, desktops, and anything else that’s housed in building 240 will go down.

Over the years, we’ve moved a lot of critical CELS resources to building 221 (at least, the ones we’re able to fit into half a rack), so things like mailing lists and websites will generally continue to work. We’ll keep you updated as dates are shored up, though the critical scheduling factors will be driven by the Argonne site, ComEd, and ALCF.

Written by Craig Stacey

April 18, 2016 at 4:08 pm

Posted in Uncategorized

Survey Reminder, upcoming staffing changes

Hi, all.

First of all, here’s a reminder about the user survey we’re conducting as announced in the last update. You can find the survey at the URL below, and it’s pretty quick. Should only take a few minutes to fill out. I’ll be closing the survey down at the end of the month.

http://goo.gl/forms/RA60ciGfB3

Next up, we’ve got some staffing changes coming up. On April 4th we’ll be welcoming our newest team member, Brad Fritz. He’ll be joining us as a Systems Administrator, filling the role than John Roberts had previously worked under before his promotion in LCRC. Brad will be joining us from Motorola where he has a long history maintaining various wireless and telecom installations. But his passion’s been doing unix administration, and he’s very happy to be finally getting to do it for a living instead of a hobby. He’s excited to join Argonne, and we’re thrilled to have him!

Lastly, some bittersweet news. Many of you surely know this by now, but Ti Leggett is leaving our team to go on to bigger things. He’s not leaving Argonne, thankfully, as he’s joining ALCF as their new Deputy Project Director & Deputy Director of Operations. Our loss is most assuredly ALCF’s gain. I’m sure you’ll all join me in congratulating Ti and wishing him great success in his new role; I’m confident he’s going to be amazing.

As such, we’ve got a new hole to fill on this team, so if you know of people who might be interested in a leadership role dealing with production computing in CELS, let me know.

Thanks!

Written by Craig Stacey

March 21, 2016 at 10:47 am

Posted in Uncategorized

CELS Systems Announcements; Customer Survey

Hi, folks. We’ve had a few changes in the group since the new year, and I’m overdue in letting you know about them. Let’s dive right in:

Welcome BIO:

At the end of January, BIO’s IT admin Rocky Patel left Argonne for another opportunity. With that, our group took on supporting the BIO division along with the divisions we’re currently supporting. I sent a separate note earlier this month specifically to the BIO division after this happened, and have since had a meeting with the division, but I thought it would be good to have everyone aware of the situation, since we’re going to be spending some effort figuring out BIO’s IT architecture and incorporating them into what we do.

I’ve added the BIO division to this announcement list, so BIO folks, you’re going to see all the announcements we send to this list. It’s not very high traffic, and not everything announced on it is relevant to your division yet. You can also follow our exploits on our blog at https://mcssys.wordpress.com, and @mcssys on Twitter.

Welcome Kat and Jasan:

Martino left our group at the end of October to join CIS and help run their Service Desk, leaving a void for us. We filled that void (and added a little extra effort to help with BIO) since the break. You’ve probably already encountered our newest team members, but I wanted to introduce them all the same.

Jasan Krupka previously spent a number of years with Apple as a Technician/Genius/Product Specialist. He’s also an active member of the National Guard.

Kat Tylka has spent a number of years providing tech support and help desk services in the area, most recently with Porvisur Technologies in Mokena.

We’re thrilled to have both of them on board, so stop by and say “hi" if you haven’t already. (I also hope to be introducing one more team member in the coming month as we’re in the final stages of hiring a new junior sysadmin.)

BIO in-person coverage:

Now that we’ve got a fully staffed service desk, we’ve gone back to full hours again. And with the addition of the BIO division, we’re adding in-person hours in building 446. We’re trying out this schedule to see how it works, and will adjust accordingly as needs dictate, but for the moment we’ll have someone on the desk over there on Tuesday and Thursday mornings, from 8:30 – 11:30 AM. Either Tina, Kat, or Jasan will be located in A128A-1 in building 446 during those hours. Of course, if something comes up that requires an in-person visit outside those hours we’ll do what we’ve been doing, but this gives us a little more regular coverage in the building. You still get help the same way, by e-mailing help@cels.anl.gov, or calling extension 6813.

Services Survey:

We haven’t done a survey in some time, and I think we’re overdue. At the link below you’ll find a Google Form with a rather free-form questionnaire on our team’s services. There aren’t many questions, and it’s really an opportunity for you to let us know where we should be putting our efforts in the coming year. BIO folks, I know you’ve only had a month or so with us, but your input is helpful in this as well, so please dive in.

You can find the survey at http://goo.gl/forms/RA60ciGfB3, and I’ll post a summary in a couple of months. I’d like to keep it open for the month of March, closing it to answers at the end of the month.

Thanks!

Written by Craig Stacey

February 29, 2016 at 11:46 am

Posted in Uncategorized

Reduced in-person Help Desk hours today

Due to the weather, we’re going to be closing the CELS service desk at lunch today and work remotely for the rest of the day. That means no walk-ups and phone calls will go to Voice Mail. However we’ll be monitoring help@cels.anl.gov and will be handling any issues we can remotely.

Thanks!

Written by Craig Stacey

February 24, 2016 at 11:10 am

Posted in Uncategorized

Monday’s power work

Monday’s power work will not generally affect servers run by us. Systems Administrators of affected systems will notify their users directly, but the affected systems are Mira, Beagle, TRACC, and some portions of Magellan.

The power outage in December took out much of the data center, but left most of the office side of the building up. This is the opposite of that situation – most of the data center (with the exception of two Power Distribution Units) will stay up, but the office side of things will generally lose power.

Written by Craig Stacey

January 20, 2016 at 1:40 pm

Posted in Uncategorized

B240/TCS to open at 10:30 am, this Monday, January 25th

For those of your in B240 and are supported by us in CELS Systems, a couple of notes:

  1. Your desktop will almost certainly lose power and reboot. Before you leave on Friday, you should cleanly shut down your machine.
  2. Upon your return on Monday, even if you did cleanly shut it down, it’s possible it will have come back on its own. However, it may have come back before things were ready for it to do so. As such, please reboot your computer if anything seems out of the ordinary before opening a trouble ticket with us – I promise it’s the first thing we’re going to do once we get a ticket and it might get you up and running quicker.
  3. After you’ve rebooted your computer, if things are still messed up, please let us know at help@cels.anl.gov (or systems@mcs.anl.gov) and we’ll look into it. Please be patient, we’re short-staffed and will be dealing with a building that just rebooted.:)

Written by Craig Stacey

January 20, 2016 at 11:31 am

Posted in Uncategorized

Update: CIS Maintenance Weekend this weekend.

Here’s the official list of affected services:

http://today.anl.gov/2016/01/it-maintenance-weekend-jan-15-17/

Note that Single Sign On (SSO) which allows external apps to authenticate against Argonne accounts (like Box, Workday, etc.) is being temporarily migrated offsite so that it will remain up during the network outage. And CIS is confident this outage won’t last more than 15 minutes.

Thanks!

Written by Craig Stacey

January 15, 2016 at 2:53 pm

Posted in Uncategorized

CIS Maintenance Weekend this weekend.

This is a reminder of the outage happening this weekend. The takeaway is still the same, anything behind the lab’s firewall will be unavailable for a brief time tomorrow morning, with some periods of up and down time throughout the morning. Things should all be normal by noon.

The official story from CIS is that e-mail and other services are not affected, but that’s not going to be the case. As such, the official announcement from them isn’t accurate and I don’t want to send it on since it implies a false sense of what the situation really is.

So be prepared for brief outages of lab-hosted services tomorrow between 8 and noon.

Thanks. My original notice follows below.

===

CIS Maintenance weekend is the weekend after this coming one. I normally forward these announcements on to you, but I noticed some possible errors in the list of affected services, and I want to get some clarification from them on it. I’ll send a complete list when I have it, however based on the fact that the Argonne network will be undergoing a major upgrade on the morning of Saturday, January 16, we can expect access to services from offsite that are behind the lab’s firewall to be inaccessible.

While this does not include services Systems runs here in building 240, nor does it include ALCF’s machines, it does include services hosted in building 221. This includes lab e-mail, lab web servers (includinghttp://www.mcs.anl.gov), and services that Systems runs in that building. Because that building has the most stable power situation (generator-backed UPS), we’ve housed all our critical servers there including Systems-provided websites and services such as WordPress, Mediawiki, Confluence, Jira, gitlab, etc.

The official outage window is from 8AM through noon, though the work is expected to take no more than an hour if things go as expected; the extra time is for the unexpected.

I’ll send an official list of affected services once it’s nailed down, but I wanted to get this word out quickly in case there are issues we need to be aware of regarding sustaining the outage at that time. If it poses a particular problem for you, let me know.

Written by Craig Stacey

January 15, 2016 at 8:30 am

Posted in Uncategorized

CIS Maintenance Weekend January 15-17

CIS Maintenance weekend is the weekend after this coming one. I normally forward these announcements on to you, but I noticed some possible errors in the list of affected services, and I want to get some clarification from them on it. I’ll send a complete list when I have it, however based on the fact that the Argonne network will be undergoing a major upgrade on the morning of Saturday, January 16, we can expect access to services from offsite that are behind the lab’s firewall to be inaccessible.

While this does not include services Systems runs here in building 240, nor does it include ALCF’s machines, it does include services hosted in building 221. This includes lab e-mail, lab web servers (including http://www.mcs.anl.gov), and services that Systems runs in that building. Because that building has the most stable power situation (generator-backed UPS), we’ve housed all our critical servers there including Systems-provided websites and services such as WordPress, Mediawiki, Confluence, Jira, gitlab, etc.

The official outage window is from 8AM through noon, though the work is expected to take no more than an hour if things go as expected; the extra time is for the unexpected.

I’ll send an official list of affected services once it’s nailed down, but I wanted to get this word out quickly in case there are issues we need to be aware of regarding sustaining the outage at that time. If it poses a particular problem for you, let me know.

Thanks.

Written by Craig Stacey

January 7, 2016 at 4:51 pm

Posted in Uncategorized

Phishing mail

Some of you may be getting very obvious phishing notices saying your anl.gov mail account is over limit. ANL Cyber has been notified, and you can safely ignore and delete the message. Thanks, and happy new year!

Written by Craig Stacey

January 4, 2016 at 9:53 am

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.

Join 56 other followers