Box is coming on site (date TBD) and are looking to have an open and frank discussion about document sharing and collaboration with linux users. They want to know how they can improve the product for linux users. They’re looking for 7-10 users from across the lab. If you have an opinion on this and are willing to give a couple of hours of your time, please let me know. We need to have the list finalized by next week. I was going to send this to firstname.lastname@example.org, but I wanted to cast a wide net so I’m using the general announcement list.
Also, if you have specific things you want to make sure are addressed, also let me know. Ideally, I’d want you in the room to be able to be your own advocate, but if you can’t be (or aren’t on the final list), I want to be sure the big opportunities for improvement are expressed.
Power work in the data center has taken a handful of compute nodes offline for a few hours. They should be back online early this afternoon. The affected machines are:
Sorry for the inconvenience. We didn’t believe these machines would be affected by the work, however we were incorrect.
For a list of alternative machines, see https://wiki.mcs.anl.gov/IT/index.php/General_MCS_Questions#computeservers.
The disk migration is finally finished. User home directories are now on their own partition, and the full disk problem has been rectified. There’s currently over 50GB available in user home directories on RDP for any files and programs that need local storage.
Thanks for your patience!
rdp.mcs.anl.gov is offline until further notice. The outage window is through 5PM, but I don’t expect it to take that long. I’ll post here when the work is done.
Unfortunately, the home directory migration was not yet successful, so we’re in the same boat we were in before the outage with space being very tight.
I’m going to take another crack at it on Sunday, which means from around noon to 5PM you can expect the machine to be unavailable. If anything changes, I’ll send a note to the blog and twitter feeds linked below. Thanks.
Those of you who use rdp.mcs.anl.gov (Remote Desktop server for Windows) may have noticed the disk is quite full. I need to migrate users to a new partition to free up space. This, however, requires the machine be offline during the migration. At the moment, the plan is to take the machine offline tomorrow at noon. I’m estimating a three hour outage, though it may be less than that. At any point I’ll post an announcement at the start and end of work on the blog and twitter feed linked below.
Thanks, and sorry for any inconvenience this causes. I’d like to do this on a weekend, but schedules don’t align to have it happen this coming weekend and I don’t want it to wait another week as the disk is quite full.
Quick summary: I just got back from the 221 data center (gee, it’s hot outside) having replaced what we suspect are bad power supplies in a Virtual Machine Host server. We isolated the issue to this specific server rebooting without offering any useful information as to why in its logs, coupled with a bad set of configs that prevented the virtual machines hosted on it from restarting without human intervention.
We’ve addressed the config issue, and replaced the power supplies as there were indications that one or possibly both were bad.
We’re ready to migrate these affected virtual machines to a new host if this last fix doesn’t stabilize things, but we’re feeling pretty good about this at the moment.
Thanks so much for your patience, and sorry for any troubles.
A similar outage to this morning is occurring (though limited in scope at the moment since we know what *won’t* work to bring things back. Stand by…
Addendum. One of the web servers (personal pages and project pages under http://www.mcs.anl.gov) is still booting up. It has been awhile since it rebooted and it’s doing a filesystem check. It should be back within the next hour.