Skip to content

News from the staff team

Unexpected interruption of services

Due to issues with how NFS clients connect to the NFS server, machines were rebooted to fix the problem, including sandstorm (mail server), death (web server), and tsunami (login server). This issue appears to have started around 10:00AM. We apologize for the inconvenience.

All issues should have been resolved as of 2:20PM.

Unexpected downtime

NFS home directories were unavailable between 6:30pm yesterday and 9:30am this morning. Most OCF services were affected. The outage appears to be related to networking changes for the move to Hearst Gym.

Edit 4:00pm: We discovered that the mail server ran out of space during the outage and was mishandling mail between 6:25am and now. Domino effect.

Edit Fri 4:00am: Unfortunately parts of the mail server became corrupted. Mail was being temporarily rejected until we resolved numerous software issues by 3:00am. Delayed mail should now be coming in.

Mail outage

Mail was unavailable between 9:30pm Monday and 7pm Tuesday as a result of electrical work for the UPS, which broke NFS, see post Moving to Hearst Gym. We apologize for the inconvenience.

Backup server compromised

We discovered that our backup server (pollution.ocf) had been recently misconfigured and was subsequently compromised. pollution maintains copies of account data (home directories, web directories, MySQL databases, and mail inboxes) and administrative credentials.

We immediately investigated, changed the affected credentials, and notified campus. Please assume that the attackers had access to a copy of your account data, and take appropriate measures, especially if you store sensitive or restricted data. Although we determined that account credentials (e.g., password hashes), which are not stored on pollution, were not compromised, feel free to reset your password as an additional security measure.

We sincerely apologize for this inconvenience. If you need any assistance, please let us know.

Mail is down

OCF email service abruptly stopped due to an issue in our authentication system. Please check back for updates.

Problem was resolved. (2:27 pm 8/4)

File server and user directory migration

We will be migrating the NFS disk array (file server) and LDAP server (user directory) to new hardware after 8pm tonight. This is necessary to maintain uptime and acceptable performance as much as possible during and after our move to Hearst Gym in August.

We will attempt to keep files read-only where possible so that services including web hosting will not be as severely affected.

We lack the extra hard drives on hand (we will be using the same hard drives in the new hardware, which means they must be backed up, formatted, and restored), so partial downtime of all OCF services will be unavoidable.

Edit 07/29 02:00am:  LDAP migration completed successfully without service interruption. NFS migration (during which file access will be read-only) postponed to later today.

Edit 07/29 06:00pm:  NFS migration started. Mail service is offline. Web hosting and SSH are read-only.

Edit 07/30 01:10am:  All services except mail restored.

Edit 07/30 01:25am:  All services restored. NFS disk array may require minor downtime in the near future. Good night.

Systemwide downtime

Unscheduled maintenance on virtual machine hypervisors made many OCF services unavailable between 3am and 6:30am. A somewhat-related issue affected web hosting yesterday.

Mail maintenance

Mail services (IMAP/POP/SMTP/webmail) are going down as we transition the mail server to a virtual machine. By doing so, we hope to keep mail available on a temporary server during our move out of Eshleman in August, since virtual machines (guests) can be migrated on-the-fly between physical hypervisors (hosts).

Update 02:04p: All services back up. Mail spool (inbox files) will be migrated later.

Scheduled webserver downtime again

We are (again) scheduling downtime on our primary server (hal) between 9:00pm and 10:30pm July 5st, to add two new processors. Notably, this will affect web hosting, MySQL, IRC, and wiki.
Update 9:15pm: Services going down.
Mid-downtime action shot!
(via Kenny Do)
Update 10:24pm: Services are back up.

Scheduled webserver downtime

We are scheduling downtime on our primary server (hal) to physically inspect the machine between 6:30pm and 7:30pm July 1st. Notably, this will affect web hosting, MySQL, IRC, and wiki. We will again schedule downtime in one or two weeks to add two additional processors to hal.

Update 6:47pm:  hal is being taken offline.
Update 7:25pm:  hal and all services except IRC are back.
Update 7:32pm:  All services operating normally.