Skip to content

News from the staff team

Our print server crashed a week ago and users' print quotas were wiped. A Board of Directors resolution was passed to set each users' quota to 175 pages given that we are in the 7th week of the semester. Users can email staff and will be dealt with on a case-by-case basis.

Login servers

We are taking apocalypse down for maintenance. Most users should not be affected, however SSHing directly into apocalypse.ocf.berkeley.edu will not work.

To SSH into a login server, use the hostname ocf.berkeley.edu, as specified by our documentation.

Tuesday outage

On an unrelated note, tsunami (our primary login server) was down between 7:00am and 2:40pm on Tuesday, Feb 8. We apologize for the inconvenience.

Update: Tuesday, February 22

apocalypse.OCF has been permanently discontinued as a SPARC login server. We will be discontinuing support for the SPARC architecture.

tsunami is currently our 32-bit x86 login server, and was upgraded on Sunday to Debian 6.0.

OCF Not Open Until Furthur Notice

Due to unexpected problems with the Windows Server, the Windows lab machines do not work.
We will update you as progress happens.

Update 01/19 at 09:17 PM: We were able to implement a temporary solution so that some of the Windows computers in the lab work (as well as some of the Linux computers). The computers with blue post-it notes definitely do not work. The others worked when last checked. We apologize for the inconvenience.

Update 01/20: The machines previously with blue post-it notes are now working. All computers should be working as before.

Mail scheduled maintenance

As part of our ongoing consolidation and maintenance of mail infrastructure, OCF mail will be scheduled for maintenance in short period(s) during the beginning of the spring semester.

We apologize for the inconvenience and hope for better performance, reliability, and redundancy in the long run.

Mail is currently scheduled for maintenance on Monday, January 17, while /var/mail is migrated to a permanent mail server. Incoming mail will not be accepted during this time, but read-only access will be available as much as possible through the login servers (but not on POP/IMAP and webmail).

As always, you can follow our progress here and on IRC.

UPDATE 05:25p: Maintenance will begin shortly. /var/mail will now be mounted read-only.

UPDATE 06:00p: /var/mail is currently being migrated to a permanent server. It will still be mounted read-only, and accessible through the login servers (but not on POP/IMAP and webmail).

UPDATE Tuesday 01:40p: Mail should now be back up, including the login servers, POP/IMAP, and webmail.

OCF Mail Down

OCF mail is down; please stay tuned for updates.


UPDATE (10 December): The server which hosts /var/mail had a drive failure (one of the pins on the boot drive was bent, which short-circuited something and caused a system shutdown). We've restored the server using its other drive and are rebuilding RAID. All mail services should now be functional.

UPDATE (11 December): The mailspool server is down again, and its remote management system seems to be malfunctioning. Mail is down until an OCF staffer living in Berkeley can investigate; if the server is irreparably damaged, we'll have to wait until finals are over to rebuild it. Sorry for the inconvenience.

UPDATE (12 December): Some of you have expressed concern that your data may have been lost. Sorry for any alarm — rest assured, your data is safe, and we do have recent backups of /var/mail.

As part of our transition to a new mail infrastructure, we recently migrated /var/mail from our central disk array to a new machine with a better NFS server. We didn't realize until the server shut down two days ago that one of its drives was damaged, at which point we swapped out the drive and turned the machine on again. The server shut down again yesterday, and S.M.A.R.T. is predicting another drive failure, so we fear the motherboard or drive controller may have been damaged by the faulty drive. The server runs and your data is intact, but rather than just rebuilding RAID again and risking a major meltdown during finals week, we've decided to leave everything off after finals, when we'll have time to take care of this properly.

UPDATE (14 December): As of early morning today, mail is tentatively operational. Last night we transitioned /var/mail to an alternate hard disk in a temporary server. Maintenance work will be scheduled soon, especially over break. Special thanks to the ASUC for their cooperation.

Unexpected Downtime

The OCF's DNS server just underwent spontaneous massive existence failure and we are trying to get the system to boot. Further bulletins as events warrant.


In other news, there was a hardware failure of unknown origin on the new mail server and /var/mail was down for a few minutes while we recovered the RAID array. Sorry for the inconvenience.

Update(10:30PM): DNS is back up. Sorry for the inconvenience again.

Update(11:00PM): The login servers had the blues during the DNS outage. They have been rebooted, and are fully functional again.

Scheduled downtime this weekend

Hey OCFers,

In order to give OCF mail a much-needed reliability boost, we plan to incrementally replace our current mail infrastructure with a centralized, redundant new system. The current system is scattered and full of single points of failure — as an example, our SpamAssassin server didn't survive last week's power outage, and our mail servers were forced to reject all incoming mail until we were able to build a replacement spam filter on a different machine. Our hope is that this new system will be significantly less volatile and more maintainable.

To that end, we'll be taking OCF mail down this Saturday. We'll be migrating mail storage, outgoing SMTP, IMAP/POP, and webmail to the new mail server. Incoming mail will not be accepted while we're working on the servers, but we'll try to keep read-only access to your old mail online for most of the day. As always, you can follow our progress here and on IRC.

UPDATE (20 November, 12:15PM): It begins! We've taken incoming mail offline and made /var/mail read-only for the mailspool migration.

UPDATE (20 Nov, 2:30PM): We've finished copying data from the old mailstore to the new one and are in the process of switching NFS servers. IMAP/POP access to mail is now down.

UPDATE (20 Nov, 5:30PM): We're having a bit of trouble with the new NFS server; IMAP/POP is still down, and our login servers hung and had to be rebooted a short while ago. tsunami and conquest should be working now, though.

UPDATE (20 Nov, 6:45PM): The NFS server is now up and running, now that we've squashed a pesky NFSv4-related bug (thanks to sluo and dwc for their help!). /var/mail is no longer read-only. Once we finish migrating disk quota information, we'll bring the mail servers back online. Thanks for your patience.

UPDATE (21 Nov, 12:15AM): /var/mail has been fully migrated to the new NFS server, and we've re-enabled the old mail servers. Incoming mail and POP/IMAP still live on the old infrastructure, but now that the NFS server migration is complete we should be able to set up the new infrastructure with minimal downtime.

To be clear, we have not yet migrated SMTP, IMAP/POP, or webmail to the new mail server, but we plan to do so in the near future. Watch this space for updates.

ASUC Building Power Outage

Hi,
Tonight from 2am-6am Esheleman Hall will have a power outage, stay tuned for updates.

Update 8:44: Everything except the OCF webserver/disk array/authentication servers has now been shutdown.

Update 9:00pm Shutting down mysql

Update: 9:15pm Webserver shutdown

Update: 9:20pm Disk array shutdown/authentication servers shut down. The only things that are still up are infrastructure related servers. These allow us to manage machines remotely. The UPS says 2:15 of runtime, the outage is 4hrs, lets hope 4hrs is a conservative estimate?

Update 6:36am Marginally restored our infrastructure, we are running tests to make sure everything that is up is working. Login Servers will be up shortly

Update 7:04am There are a few issues coming back up we are looking to get them resolved ASAP

Update 7:31am We had some disk array issues, they seemed to be resolved for the time being, all the windows machines work, printing works, and we will soon boot up the login servers after we are sure permissions and such are working properly

Update 7:56 Our DNS server is being stubborn, seems to be the root cause of recent issues. Mysql should be back up

Update 8:04am FSCK time, what a fun way to start the moring, fsck'ing broken filesystems

Update 8:12am The webserver should be working again

Update 8:22am DNS is plodding along, expect a delay between 9-10:30 since I have class at this time.

Update 8:41am FSCK on the DNS server, will likely be down for a while, login servers should be up and running, docs and webmail should work too.

Update 8:44am spoke too soon disregard the previous post

Update 10:56am still working on getting dns up.

Update 11:11am DNS should be up now

Update 6:40pm Reaching hour 30 of this adventure, most of our services have been restored. Mail is a work in progress, but your stored email should be fully accessible now. apocalypse.ocf.berkeley.edu doesn't seem to turn on, so we will keep that off for now (while we straighten out everything else).

Other failures

In a rather unlucky streak of timing here are other known failures in the OCF (today was not the best day).

2 printers (1 critically)
3 infrastructure related servers

Will update you as we get things fixed.

Mysql Down

Mysql went down for a short period of time yesterday, we hosted the service from backups in a read-only mode for the night.

UPDATE (15:41):
ETA of 8-10hrs before we get mysql up and running again.

UPDATE (19:41):
mysql, postgresql back in business. props to jaws.ocf.berkeley.edu, for performing admirably