Dispatches From The Geeks

News and Announcements from the MCS Systems Group

Another outage

Well, that didn’t work. The server went down again. We’re going to move the VMs to another hypervisor for now. This will result in some degraded performance, but it should at least stay up.

(I’m sending this while the list server is down, but also posting it to http://systemsblog.mcs.anl.gov, which is hosted offsite.)

Written by Craig Stacey

May 27, 2012 at 3:24 pm

Posted in Uncategorized

Outages this weekend, and other updates

This weekend is a planned maintenance weekend for CIS, and some work being done will affect us. Specifically:

– Short network interruptions (less than 60 seconds per interruption) (Saturday)
— These short rolling outages will affect mail, calendar, web, etc. Any servers we have in the 221 data center.
– Wireless network interruptions (Saturday)
– All business systems (most of Sat.)

MCS is also performing maintenance on some of its systems this weekend. Specifically:

– MCS-hosted mailing list server upgrade. (5PM on Friday through 7PM on Saturday).

— During this outage, mail sent to mailing lists will queue up and be held until the new server is operating. This covers all mailing lists hosted in the following domains:

lists.kbase-group.org
lists.kbase.us
lists.kbt.mcs.anl.gov
lists.lcrc.anl.gov
lists.accessgrid.org
lists.mcs.anl.gov
lists.alcf.anl.gov
lists.metagenomics.anl.gov
lists.anl-external.org
lists.nmpdr.org
lists.bgconsortium.org
lists.terragenome.org
lists.cels.anl.gov
lists.theseed.org
lists.cogkit.org
lists.cosmea.mcs.anl.gov
lists.earthmicrobiome.org
lists.globus.org
lists.globusonline.org
lists.i2u2.org
lists.icis.anl.gov
lists.ieeetcsc.org
lists.igsb.anl.gov

There will be brief interruption of the RT service and the macintosh file server, silver.mcs.anl.gov on Saturday morning. We do not anticipate these systems to be offline for more than a few minutes.

Between 8 and noon there will be intermittent outages that could prevent logging in to mail, workstations, login.mcs.anl.gov, and compute nodes.  The actual outage window should be less than that window.

If these outages pose an extreme hardship for you, please let us know ASAP.

Also, our Confluence instance is now production. The new hostname is https://collab.mcs.anl.gov. You can request an account on it via the accounts interface at https://accounts.mcs.anl.gov.

Thanks!

Written by Craig Stacey

May 15, 2012 at 7:42 pm

Posted in Uncategorized

Mail problems resolved

A critical server crashed in the middle of the night, taking down CI authentication. Normally, this should not cause too big a problem, as we have backup authentication servers. However, there appears to be a misconfiguration on the Zimbra servers that was causing it to fail on the backup servers. This caused a cascading problem which made the Zimbra servers unresponsive. Mail was still coming in, but nobody could login to check it.

The initial failure has been fixed (the authentication server is now back up), and we’re digging through the mess trying to make sure we fully understand why the other servers didn’t work as expected. We’ll have this bolted down such that the next failure will result in a proper fallback to redundant servers.

Sorry for the inconvenience.

Written by Craig Stacey

April 25, 2012 at 2:33 pm

Posted in Uncategorized

Problem with e-mail authentication

We’re aware of the problem with authentication to e-mail and are working on the solution. It will be fixed for some very soon. CI users will have a slightly longer delay while we fix the authentication server. Sorry for the trouble.

Written by Craig Stacey

April 25, 2012 at 1:56 pm

Posted in Uncategorized

RT outage

RT (trouble ticket system) is currently down. We’re working to bring it back — it should be back shortly, after which we’ll send more details.

Written by Craig Stacey

March 7, 2012 at 5:37 pm

Posted in Uncategorized

RT issue resolved

We identified the issue and turned on the mail queue again. New tickets are now being created. Any tickets or correspondence sent during the outage from someone without an account on RT would not have generated a ticket. This issue is fixed, and we’ll be pushing through the missing messages over the next 30 minutes or so. If you maintain or read an RT queue on our system, you should see this backlog of messages start shortly.

Sorry about this.

Written by Craig Stacey

February 21, 2012 at 6:20 pm

Posted in Uncategorized

RT issue

We’re tracking down a problem in RT that has prevented new tickets from being created. Will send an update when the issue is fixed. We have all the messages, so nothing will be lost, only delayed.

Written by Craig Stacey

February 21, 2012 at 5:51 pm

Posted in Uncategorized

Reminder: Linux Workstation / NFS fileserver maintenance

From: Dan Olson <dolson@mcs.anl.gov>
Date: December 22, 2011 11:10:07 AM CST
To: systems-announce@mcs.anl.gov
Subject: Linux Workstation / NFS fileserver maintenance
Reply-To: systems@mcs.anl.gov


The MCS NFS fileservers will be undergoing maintenance from 6AM CST December 27 to 6PM CST December 28.  

We are moving home directories and our softenv environment to a new server during this window.  This work
is being performed to provide performance improvements and more available space.  

During this period you may not be able to logon to linux workstations, and many of our filesystems will
be marked readonly.  The linux workstations, login and compute servers will be restarted at the completion
of this maintenance.  

—-
Daniel Murphy-Olson       
Systems Administrator
Mathematics & Computer Science Division
Argonne National Laboratory
630-252-0055

Written by Craig Stacey

December 27, 2011 at 9:13 pm

Posted in Uncategorized

New Copiers/Printers

The new copier/printers are being installed today. We’re in the process of getting them set up. We’ll send an announcement with a link to instructions when everything’s all set and ready to go. You should be able to use the copy function now, but the rest will require a little time before it’s all online.

Thanks!

Written by Craig Stacey

December 5, 2011 at 5:32 pm

Posted in Uncategorized

Zimbra outage on Saturday, November 26, noon to 5PM.

This outage and upgrade is going ahead. The time frame is from noon until 5PM. Happy Thanksgiving!

Written by Craig Stacey

November 23, 2011 at 8:56 pm

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.

Join 49 other followers