Dispatches From The Geeks

News and Announcements from the MCS Systems Group

Author Archive

Blue Jeans video conferencing and Chrome under Mac OS (and Linux).

If you’re running the latest version of Chrome (and, really, you should always be running the most recent stable version of any browser for security reasons), you will have issues using Blue Jeans on either Mac OS X or Linux.

“Service Advisory: Mac customers upgraded to Chrome version 39 or above will not be able to download the browser plugin. You will need to use Firefox or Safari as a substitute. Our engineering team is working on a fix. For details, see: http://bluejeans.force.com/KnowledgeSearch/articles/Knowledge_Base/Browser-Plugin-will-not-work-on-certain-Chrome-versions-on-Linux-MAC/

As noted on the above page, Blue Jeans claim they will have the Mac problem licked by next week. No word on when the linux fix is coming.

In each case, you can work around it by using Firefox (linux/Mac) or Safari (Mac).

Written by Craig Stacey

November 20, 2014 at 9:42 am

Posted in Uncategorized

CELS Helpdesk closed Wednesday 11/19 through Friday 11/21.

Due to the installation of cubicles, walk-up service to the CELS help desk will be unavailable Wednesday through Friday. During this period, if you require assistance, please contact us via email (systems@mcs.anl.gov or help@cels.anl.gov) or via telephone (x6813). In a pinch, you can stop by room 2136.

Thanks!

Written by Craig Stacey

November 17, 2014 at 10:19 am

Posted in Uncategorized

Systems Announce Git, SVN, trac currently down

These services should be back online. The networking maintenance over the weekend resulted in a missed configuration on the routers and some hosts in 221 lost their routes. If you notice any other downed hosts, please let us know and we’ll get them taken care of.

From: <Leggett>, Ti Leggett <leggett>
Date: Monday, November 10, 2014 at 8:18 AM
To: "cels-systems-announce" <cels-systems-announce>
Subject: [Systems Announce] Git, SVN, trac currently down

The Git, SVN and trac services are currently down. We are working to get them fixed ASAP and will announce when all services are back and functional.

Written by Craig Stacey

November 10, 2014 at 9:36 am

Posted in Uncategorized

Git, SVN, trac currently down

The Git, SVN and trac services are currently down. We are working to get them fixed ASAP and will announce when all services are back and functional.

Written by Craig Stacey

November 10, 2014 at 8:18 am

Posted in Uncategorized

A series of announcements (Wifi changes, Maintenance Weekend, and summary of this weekend’s outage)

Rather than inundate you with mailings, I’m sending this omnibus note with pointers to blog postings outlining the issues. I’ll also note you can follow @mcssys on twitter or check the blog at http://mcssys.wordpress.com (or http://mcs.anl.gov/systems/blog) for updates as well.

First up, for a summary of the repo.anl-external.org outage, see http://wp.me/p3jwfN-77.

Next, a notice on wifi changes from CIS: http://wp.me/p3jwfN-79

The short version of the story is that next week, you will notice a change in the wireless network names used onsite. Specifically instead of the many options you currently see, there will primarily only be two: Argonne-auth and Argonne-guest. As with the current setup, connections to the “auth” network require authenticating with your Argonne credentials and gain you access to a trusted VPN network. Connections to the “guest” network requires no authentication, but does require filling out a web registration form which you will see on your first browser connection.

Finally, a notice on the upcoming maintenance weekend in CIS: http://wp.me/p3jwfN-7b.

This includes information about other maintenance activity. Although, short of voicemail and phones being unreliable Friday evening, and desktop networks being unavailable Saturday morning, you may not be affected by these.

Written by Craig Stacey

November 3, 2014 at 2:21 pm

Posted in Uncategorized

CIS Maintenance Weekend, November 7-9

Please see the following notice from CIS regarding other maintenance activity this weekend.  CELS Systems has not scheduled any outages during this window, so the only outages you should notice are outlined below, and are generally restricted to Argonne central systems, including Wifi and Voicemail.  Servers and services not listed below (including all provided by CELS Systems) should remain up.

WHAT ARE WE DOING?

Argonne’s quarterly IT maintenance weekend is scheduled for Friday, November 7th, thru Sunday, November 9th. Expect that any laboratory network and core IT services may be effected during the weekend.

Specifically:
• All laboratory phones will be intermittently unavailable from 5 to 9 p.m., Friday, November 7th.  In the event of an emergency dial 1-630-252-1911 from any cell phone.

• Voice mail will be unavailable from 6 to 8 p.m., Friday, November 7th. During this time, voice mail messages will not be received nor will they be retrievable.

• Inside Argonne Document Center and Material Safety Data Sheets (MSDS) will be unavailable from 10 to 11 p.m., Friday, November 7th.

• All business applications will be unavailable from 5 to 6 a.m., Saturday November 7th.

• All networking services (Except Data Center) in Building 240 will be unavailable from 6 to 8 a.m., Saturday November 8th.

• SharePoint will be unavailable from 1 to 5 p.m., Saturday November 8th.

• Wireless networking will be unavailable from 7 to 1:30 p.m., Sunday November 9th. (See Inside Argonne Communication for more information)

• CIS will perform verification of IT services on Sunday November 8th to ensure all services are functioning for business hours on Monday, November 10th.

WHEN WILL THIS OCCUR?

November 7th, 2014, 5:00 p.m. thru November 9th, 2014, ~12:00 p.m. We expect the maintenance to be complete Sunday morning and will then be followed by a verification process after which an “all clear” message will be sent.

WHAT IS THE EFFECT ON YOU?

Unless there is an unforeseen issue with the maintenance activities, you should not be affected outside of the maintenance window.

FOR MORE INFORMATION

Report issues with services after the maintenance is complete to the CIS Service Desk at ext. 2-9999 option 2.

Written by Craig Stacey

November 3, 2014 at 2:18 pm

Posted in Uncategorized

Upcoming Wireless Network Changes

Please see the following notice from CIS.

The short version of the story is that next week, you will notice a change in the wireless network names used onsite. Specifically instead of the many options you currently see, there will primarily only be two: Argonne-auth and Argonne-guest. As with the current setup, connections to the “auth” network require authenticating with your Argonne credentials and gain you access to a trusted VPN network. Connections to the “guest” network requires no authentication, but does require filling out a web registration form which you will see on your first browser connection.

CELS Systems staff will be visiting known AppleTVs in the conference rooms in 3178, 4172, and 4313 to make sure they’re on the new network, as well as updating the building kiosks.

WHAT ARE WE DOING?

On Sunday, November 9, 2014, CIS will be reducing the number of published Service Set Identifiers (SSIDs) on both the “Authenticated” and “Guest” wireless network networks. We are doing this to make it easier for Argonne employees and visitors to know which WiFi SSID to use.

Certain special and specific WiFi SSIDs at Argonne-Lodging, the DC office, and certain areas of the Advanced Photon Source will not be affected or changed. However, the “Authenticated” network SSID will change everywhere.
WHEN WILL THIS OCCUR?

Sunday, November 9, 2014, from 7:00 a.m. to 1:30 p.m.
WHAT DO YOU NEED TO DO?

After Sunday, November 9, 2014, employees and visitors will need to remove, delete, or ignore (“Forget This Network”) the following SSID names from the Wi-Fi Network Preferences or Managed Wireless Networks Profile (depending on operating system):

• ArgonneG-auth
• ArgonneA-Auth
• ArgonneG-guest
• ArgonneA-guest
• ANLTCSG-guest
• ANLTCSA-guest
• In the Argonne RAP office: RAP-OfficeG-guest

They will then need to connect and set up the new SSIDs: Argonne-auth (Argonne Employee) and/or Argonne-guest (Argonne Visitor). These new SSIDs will automatically support 5GHz and 2.4GHz frequencies (also known as 802.11a/n, 802.11b/g/n) and will allow devices to choose the best and least congested frequency on the WiFi SSID chosen.

If you have Apple TVs or other digital displays that depend on a wireless connection, those devices will also need to be reconnected to the appropriate network. CIS will be re-establishing wireless connectivity on these devices starting Monday, November 10th. If an employee needs assistance with a critical meeting that may use some of these devices (Mondo Pad, Visix digital displays or Apple TVs), please contact the Service Desk at ext. 2-9999, option 2.
WHAT IS THE EFFECT ON YOU?

Choosing a wifi network will be easier for Argonne employees and visitors while on site.

FOR MORE INFORMATION

Report issues with services after the maintenance is complete to the CIS Service Desk at ext. 2-9999, option 2.

Written by Craig Stacey

November 3, 2014 at 2:10 pm

Posted in Uncategorized

Summary of repo.anl-external.org downtime

Executive summary


VM Hypervisor showed signs of going bad on Friday night.  Mitigating steps were taken.  Server failed on Sunday night too late for anyone to be able to work on it until Monday.  Server was restored at 11:30AM.  Hypervisor is now stable, however critical services hosted on that hypervisor will be moved to more resilient hardware to reduce likelihood of downtime.

What happened


An MCS Virtual Server hypervisor (hereafter referred to as vserver8) had a system disk go into a bad state, taking down vserver8 and all Virtual Machines hosted on it.  Affected VMs were:

  • login1.mcs.anl.gov
  • login2.mcs.anl.gov
  • buildbot.mcs.anl.gov
  • pwca.alcf.anl.gov
  • horde.alcf.anl.gov
  • repo.anl-external.org

The short term fix


We noticed instability with the server on Friday night, as the hypervisor had gone offline and rebooted a couple of times.  A reboot seemed to clear the issue with the disk, though there did appear to be corruption in some previously retired VMs.  As a first step, we made sure we had a current backup of the data stored on repo.  We moved the IP addresses of login1 and login2 over to login3 and shut those VMs down.  buildbot, pwca and horde remained down to minimize the load on the hypervisor in the hopes of increasing the likelihood of it staying up through the weekend.  Once we had the data in repo.anl-external.org confidently duplicated, we brought it back up and kept our eye on the service, with the goal of migrating it to a new hypervisor on Monday.

The second failure


On Sunday night, the disk on vserver8 failed in a different manner than before.  Unfortunately, there was nobody available to handle the situation and thus it had to wait until morning. First thing in the morning, attempts were made to bring the VM back.  Due to the configuration of that machine, we were unable to recover from a bad system disk in the usual methods.  Ultimately, we had to burn a bootable linux LiveCD to boot the machine and initiate the data transfer to the new disk.

The progress on that transfer gave us an original estimate of about an hour to copy the data to the new disk, at which point the machine would be resurrected as it was before the crash.   Had it looked like it would take longer, our recovery path would have been to migrate the data from the repocafe backup made on Friday to a new disk pool and set up a new VM on that.  The data duplication looked to be the fastest path to recovery, so we continued on that route, since our backup on Friday would not have included any changes over the weekend.

The copy slowed down some, and ultimately finished just before 11:30 AM (about 1.5 hours after the initial estimate).  After the copy finished, the server was able to reboot and operate as normal with no loss of data.  It currently shows to be healthy.

Next steps


login1 and login2 are still directed at login3.  Later today, we will send a notice to users who may be affected when we move the IP addresses back to their original hosts.  If you logged into login1 or login2 since Friday evening, and are still logged in, you’ll be among the affected users.

repo.anl-external.org is currently up and stable, and we will begin the process of moving that to a more resilient VM infrastructure.  When we first deployed it, it was intended to be a “best effort” self-service SVN repo to ease collaborations with external users.  Because we never made that explicitly clear, and because it is *so* much easier to self-serve these sorts of things, many users gravitated towards using it as their primary SVN repo over the more “production level” svn.mcs.anl.gov.

Bearing this in mind, we’re reclassifying repo.anl-external.org as a critical service.  We’re going to move the VM to hardware that’s better designed to weather these sorts of failures and be able to move trivially from hypervisor to hypervisor as needed, which we currently do with other critical servers.  There will be an announced outage on this service as we migrate the last of the repository data to the new server.  We’ll make the bulk of this work happen in the background with the goal of having the outage only necessary to copy last minute data changes and ultimately move the VM.  There will be further updates on this as we progress, and we’ll coordinate to ensure the migration does not happen at a critical time.

Written by Craig Stacey

November 3, 2014 at 1:50 pm

Posted in Uncategorized

repo.anl-external.org is now back online. Will send detailed explanation of downtime later for those interested.

Written by Craig Stacey

November 3, 2014 at 11:48 am

Posted in Uncategorized

Progress slowed slightly. At current transfer rate we’re expecting recovery around 11:30.

Written by Craig Stacey

November 3, 2014 at 11:08 am

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.

Join 45 other followers