Dispatches From The Geeks

News and Announcements from the MCS Systems Group

Power Outage this weekend (11/4-11/6)

Please note this will affect all machines in the datacenter.  Primary websites will stay up, as will cryptocard, database servers, mailing lists, and RT, as we have hosted those servers in another building.  Compute servers, file servers, and clusters hosted in 240 will be down.  The power will not return until 5PM on Sunday, at which point we can start turning things back on.

Also, the Zimbra mail/calendar service will be getting a maintenance upgrade this Saturday from 9 AM until 1 PM.  Except for that window, your mail will stay up.  During that window, mail will be queued up and will be delivered after service returns.  A more detailed announcement will be sent to Zimbra account holders.

Also, I will remind you again — if you have a black “AASTRA” phone in your office, it’s going to stop working at 5PM, along with all your networking in the building.

Thanks for your understanding.

Written by Craig Stacey

November 2, 2011 at 7:30 pm

Posted in Uncategorized

2011 MCS User Feedback Survey

It’s time to conduct our User Feedback survey. (Well, past time, really, since it’s been over a year.)

You can find the survey here: http://www.surveymonkey.com/s/KVTHY6Y (also linked off our blog (systemsblog.mcs.anl.gov), and the IT Wiki (mcs.anl.gov/help)). The survey is open until 10/31, and all answers are completely anonymous.

I’ve simplified the questions somewhat and went a little more free-form with the answers. There are six questions in total.

Thanks.

Written by Craig Stacey

October 6, 2011 at 10:28 pm

Posted in Uncategorized

Service outage resolved

(earlier, blog only)

> I’ve identified and replaced the failed server, and authentication is working again. We’ve got some NFS issues lingering that I’m tracking down that are preventing people from logging into the login servers. I’ll post another update once that’s taken care of.

Update: NFS has been restored. Some machines may need a reboot. login2.mcs.anl.gov was rebooted in the process of debugging this. login1.mcs.anl.gov was done in by a combination of the service outage and a user job on the login node that ran the machine out of memory. login3 and login4 remained up.
Zimbra issues seem to have cleared up as soon as the authentication service was restored. This outage affected:
CELS Zimbra users
MCS LDAP/Kerberos authentications (including MCS Unix workstations)
NFS
MCS License servers.

We will investigate why backup authentication methods did not help, as well as why CI Zimbra users were affected. Thanks for your patience.

Written by Craig Stacey

September 11, 2011 at 10:32 pm

Posted in Uncategorized

Authentication fixed

I’ve identified and replaced the failed server, and authentication is working again. We’ve got some NFS issues lingering that I’m tracking down that are preventing people from logging into the login servers. I’ll post another update once that’s taken care of.

Written by Craig Stacey

September 11, 2011 at 9:02 pm

Posted in Uncategorized

Service outage affecting Zimbra, MCS logins

We’re tracking down a problem in MCS that’s affecting all logins for MCS users, as well as Zimbra authentication. Zimbra may have other issues — it’s not clear if this is caused by the outage on the MCS side or not at this point. I have reports CI users are unable to authenticate for mail as well.

We’re working on this. See http://systemsblog.mcs.anl.gov for updates, and thanks for your patience.

Written by Craig Stacey

September 11, 2011 at 7:20 pm

Posted in Uncategorized

CIS: Emergency VPN Maintenance, Thursday, August 4, 17:15

Begin forwarded message:

> Subject: Emergency VPN Maintenance
>> Impact: All VPN Users
>> Begin: August 4th, 5:15PM
>> End: August 4th, 5:30PM
>>> Summary:
>>> We are experiencing hardware failure on our VPN, it is currently up and running but in a very unstable condition. We have received the new hardware and will be swapping it out this evening at 5:15 in order to minimize impact to users. When this work begins all users will be disconnected from the VPN, and VPN will be unavailable until this work is completed. Another email will be sent when all work is completed.
>>> Contacts:
>> Please always submit new requests for CIS assistance to help@anl.gov or
>> call the CIS Helpdesk @ 2-9999.
>>> For assistance regarding this communication and project, please contact:
>> Brandon Siegel
>> bsiegel@anl.gov
>> 2-0261
>>> Thank you for your patience in this matter.

Written by Craig Stacey

August 4, 2011 at 3:38 pm

Posted in Uncategorized

MacOS Lion upgrades

I’m getting asked more about upgrades to Mac OS 10.7 (OS X Lion). Here’s the situation, as I know it:

* The lab has a list of over 700 interested clients interested in upgrading to Lion (this includes everyone MCS Systems reports who’s using OS X).
* The lab does not yet have a method to purchase Lion individually, let alone a group purchase.
* The lab is trying to light a fire under Apple, but Apple has not yet addressed how to handle Lion updates and the various DOE labs.

By the terms of Lion’s licensing and the App Store, if you buy Lion for a home machine, it can be installed on all the computers you use with that App Store account (Section 1.B.ii). So if you feel like you can’t wait, you have that option. But even if you jump the gun and place a requisition for a copy, it won’t go anywhere until the lab and Apple figure this out.

I’ll send an update when I have this.

BTW: if you’re interested in Mac OS and don’t already subscribe, you should subscribe to http://lists.mcs.anl.gov/mailman/listinfo/mac-users — lots of good tips and discussions show up there.

Written by Craig Stacey

July 28, 2011 at 3:59 pm

Posted in Uncategorized

Power restored

Power has been restored and we are working to restore services.

Written by Craig Stacey

July 6, 2011 at 3:48 pm

Posted in Uncategorized

ANL power outage

A good portion of the ANL site has lost power, including TCS.

Written by Craig Stacey

July 6, 2011 at 2:35 pm

Posted in Uncategorized

Mailing list and RT server back up.

The mailing list and RT server has been brought back. It suffered a network failure, which has been rectified.

==
Craig

Written by Craig Stacey

June 23, 2011 at 2:20 pm

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.

Join 48 other followers