News from the staff team¶

August 30, 2012
1 min read

Unexpected interruption of services

Due to issues with how NFS clients connect to the NFS server, machines were rebooted to fix the problem, including sandstorm (mail server), death (web server), and tsunami (login server). This issue appears to have started around 10:00AM. We apologize for the inconvenience.

All issues should have been resolved as of 2:20PM.

August 16, 2012
1 min read

Unexpected downtime

NFS home directories were unavailable between 6:30pm yesterday and 9:30am this morning. Most OCF services were affected. The outage appears to be related to networking changes for the move to Hearst Gym.

Edit 4:00pm: We discovered that the mail server ran out of space during the outage and was mishandling mail between 6:25am and now. Domino effect.

Edit Fri 4:00am: Unfortunately parts of the mail server became corrupted. Mail was being temporarily rejected until we resolved numerous software issues by 3:00am. Delayed mail should now be coming in.

August 15, 2012
1 min read

Mail outage

Mail was unavailable between 9:30pm Monday and 7pm Tuesday as a result of electrical work for the UPS, which broke NFS, see post Moving to Hearst Gym. We apologize for the inconvenience.

August 12, 2012
1 min read

Backup server compromised

We discovered that our backup server (pollution.ocf) had been recently misconfigured and was subsequently compromised. pollution maintains copies of account data (home directories, web directories, MySQL databases, and mail inboxes) and administrative credentials.

We immediately investigated, changed the affected credentials, and notified campus. Please assume that the attackers had access to a copy of your account data, and take appropriate measures, especially if you store sensitive or restricted data. Although we determined that account credentials (e.g., password hashes), which are not stored on pollution, were not compromised, feel free to reset your password as an additional security measure.

We sincerely apologize for this inconvenience. If you need any assistance, please let us know.

August 4, 2012
1 min read

Mail is down

OCF email service abruptly stopped due to an issue in our authentication system. Please check back for updates.

Problem was resolved. (2:27 pm 8/4)

July 29, 2012
1 min read

File server and user directory migration

We will be migrating the NFS disk array (file server) and LDAP server (user directory) to new hardware after 8pm tonight. This is necessary to maintain uptime and acceptable performance as much as possible during and after our move to Hearst Gym in August.

We will attempt to keep files read-only where possible so that services including web hosting will not be as severely affected.

We lack the extra hard drives on hand (we will be using the same hard drives in the new hardware, which means they must be backed up, formatted, and restored), so partial downtime of all OCF services will be unavoidable.

Edit 07/29 02:00am: LDAP migration completed successfully without service interruption. NFS migration (during which file access will be read-only) postponed to later today.

Edit 07/29 06:00pm: NFS migration started. Mail service is offline. Web hosting and SSH are read-only.

Edit 07/30 01:10am: All services except mail restored.

Edit 07/30 01:25am: All services restored. NFS disk array may require minor downtime in the near future. Good night.

July 24, 2012
1 min read

Systemwide downtime

Unscheduled maintenance on virtual machine hypervisors made many OCF services unavailable between 3am and 6:30am. A somewhat-related issue affected web hosting yesterday.

July 22, 2012
1 min read

Mail maintenance

Mail services (IMAP/POP/SMTP/webmail) are going down as we transition the mail server to a virtual machine. By doing so, we hope to keep mail available on a temporary server during our move out of Eshleman in August, since virtual machines (guests) can be migrated on-the-fly between physical hypervisors (hosts).

Update 02:04p: All services back up. Mail spool (inbox files) will be migrated later.

July 6, 2012
1 min read

Scheduled webserver downtime again

We are (again) scheduling downtime on our primary server (hal) between 9:00pm and 10:30pm July 5st, to add two new processors. Notably, this will affect web hosting, MySQL, IRC, and wiki.

Update 9:15pm: Services going down.

Mid-downtime action shot!
(via Kenny Do)

Update 10:24pm: Services are back up.

July 2, 2012
1 min read

Scheduled webserver downtime

We are scheduling downtime on our primary server (hal) to physically inspect the machine between 6:30pm and 7:30pm July 1st. Notably, this will affect web hosting, MySQL, IRC, and wiki. We will again schedule downtime in one or two weeks to add two additional processors to hal.

Update 6:47pm: hal is being taken offline.
Update 7:25pm: hal and all services except IRC are back.
Update 7:32pm: All services operating normally.