2007¶

June 30, 2007
1 min read

Upcoming Downtime

The OCF will be going down for maintenance on Sunday, July 1 for approximately 2 hours beginning at 6 PM. Mail will be queued up during the downtime and delivered when service resumes.

June 19, 2007
1 min read

One of the features that the OCF is rolling out this summer is increased web server security for our users and their web applications. We've added some tools that'll monitor web traffic and intercept many common types of attacks, giving our users an extra level of protection from newly discovered vulnerabilities.

Most importantly, though, these tools will help protect our users from comment spam (unsolicited commercial postings). The OCF web server receives more than one million requests every day, and a growing number of these requests are attempts by automated programs to post spam on our users' blogs, galleries, and web sites. Besides leaving distasteful messages, these programs also place a great burden upon our infrastructure, slowing down our services.

We've currently enabled an aggressive set of filters to catch most attacks and spam, and we will continue to add filters as new attacks are discovered. As with any automated monitoring system, though, there will always be the possibility for incorrectly tagged requests. If you're having problems with your web site, please let us know.

Also, please keep in mind that these security systems are designed to help us help you. We encourage OCF users to keep their web applications up-to-date, as bugs and security holes are being fixed every day, and it's impossible for us to protect every single application that our diverse user base employs. We also highly encourage users to install plugins to reduce the flow of spam via captchas or other methods.

June 15, 2007
1 min read

Secure POP and IMAP transition

As of August 1, 2007, the OCF will be requiring all users to use secure connections to read their OCF email. Users who use webmail and/or local mail access (using a program such as pine, elm, or mutt from an OCF Unix machine) exclusively will not be affected. Otherwise, you will need to reconfigure your mail client to use secure connections when receiving mail; we have instructions on our wiki for some popular mail clients. (If you have a correction or clarification for us, or instructions for another client, please email them to staff@ocf.berkeley.edu.)

The latest information on the changes can be found on our wiki at http://docs.ocf.berkeley.edu/wiki/Major_service_changes.

If you have any questions or concerns, please don't hesitate to contact us (email staff@ocf.berkeley.edu).

June 1, 2007
1 min read

Server Reboots on Wednesday

As announced on our website and login server MOTDs, we rebooted one of our core servers on Wednesday for operating system maintenance. Downtime was as long as anticipated, but we also ended up having to reboot our login servers (conquest, apocalypse, and tsunami) since they were having issues. Everything's working now, and we thank you for your continued patience as we make changes to our infrastructure to deliver more reliable services to all OCF users.

We also have some really neat features in development this summer. One of the most anticipated projects is a self-serve backup system, which will allow OCF users to access daily snapshots of their data for at least the past seven days. More information will be posted as we approach a public beta test.

May 20, 2007
1 min read

Sun Studio Compilers

We've updated the compilers available on our Solaris systems from Sun Forte 6 to Sun Studio 11. Enjoy!

May 2, 2007
1 min read

Account home directories erased

Accounts starting with "a" to "b[a-l]" have had their home directories accidentally erased and subsequently restored from last month's backup. It's complex.

March 29, 2007
1 min read

Website Refresh 1.5

Both official OCF skins - "Tux" and "Arctic" - have been visually updated, primarily in the headers for improved color matching. Since February, a number of detail features have been implemented, including "Add search engine" autodiscovery from the Firefox 2 search bar based on OpenSearch (supported by Firefox 2 and Internet Explorer 7).

Another feature in the works is a web-based tool for uploading files to your website space (folder public_html), courtesy of Jonathan Chu. It's feature complete, and we hope to roll it out sometime soon. There are also plans to look into a web-based text editor for editing the source of your text-based files in your webspace, but we'll have news about that as things progress.

As always, if you have any suggestions, criticism, or pick-up lines, feel free to send them our way! Ciao.

Edit: You can try out the website file uploader, naturally at your own risk.

March 25, 2007
2 min read

Disk Array Migration Complete

Ok, so we missed the target I posted here by a few hours (I forgot to account for staffers needing to travel home for the break), but we should be back up. Please note that quotas are once again being enforced; see our email from last week if you forget what this means. You have 30 days to get back under quota; those of you who are over quota should each receive mail about this sometime in the next few days.

There was a mistake in that email, by the way; the primary (soft) mail quota will stay at 10 MB for now, not go up to 25 MB as we announced. We're not necessarily short of disk space if we raise mail quotas (though I'm not sure if anyone's run the numbers), but our system stores mail quite inefficiently at the moment, which means that 25 MB inboxes would be pretty slow to access. Recall that if you need more space for mail, you can save it to another folder (which counts under the much larger home quota). Sorry!

We didn't have a copy of our old (pre-disk-array-failure) quotas handy, so everyone has the default soft quotas set at the moment. We'll see if we can find a copy of the old quotas in our backups, but in the meantime, if you previously obtained a quota increase, you may want to email us to remind us about it. Hard quotas were set such that no one would be over hard quota (though if your hard quota was raised because of this, don't expect it to stay that way forever), so you have 30 days to either clean up or remind us about your quota increase. (There were a few exceptions, for people who clearly were abusing disk usage.)

Again, enjoy your break!

March 24, 2007
1 min read

Downtime Update

We finished synchronizing user data between disk arrays overnight; we still need to sync our collection of installed software and some administrative files. That (hopefully, provided nothing goes wrong) puts us on track to restore service this afternoon or evening.

We're also taking advantage of this downtime to do some upgrading on one of our backend machines; that work has been completed. We might do some small physical reorganization of our servers and their power connections, but that's about it for the other work we're doing during this maintenance window.

Hope you're enjoying your break, and thanks for your patience!

March 11, 2007
3 min read

State of the OCF Address

Brief History, Current State, and Future Plans

Executive Summary: Downtime in the near future to migrate disk array. Check back for exact date/times.

Some users have voiced concerns about being kept out of the loop regarding the state and future plans of OCF. Hopefully, this should bring everybody up to speed on what's been happening and what we have planned.

Many of our problems seem to be caused by the experimental hardware used on the server holding our backup disk array. Now that the primary array is ready to be brought back into service, we are expecting that a move back should fix many of our problems (mainly the webserver uptime and printing problems). After much consideration, we determined it would be best to connect the disk array to the machine that runs the webserver (famine), in part because this would eliminate a large amount of network traffic.

Before migrating data to the primary disk array, we decided it was best to make sure famine was completely up to date first. It had been running an older version of Solaris 10, and the latest update contained a a lot of bug fixes and security fixes. famine (through the use of virtual servers) runs our webserver, print server, and database server. This update was going to happen, and downtime was needed. By doing the upgrade first, users could still access our servers and their data during the downtime.

While it is possible to update the version that was already installed, we found that it would be better to use a clean install of the latest version. According to our research, it seems that the normal upgrade path often result in unexpected errors with the virtual servers (called Containers or Zones). The general recommendation is to backup the Containers and perform a clean install. The process boiled down to five major steps:
1. back up zone data and configuration files (for the 8 installed zones)
2. install Solaris 10 Update 3 and set up raid devices
3. install and configure new zones
4. manually merge data from zone backups into new zones
5. restore services

This process takes a long time to complete, so we scheduled this upgrade for a weekend (3/2-3/4). Unfortunately, it seems that insufficient warning was given about this downtime. I apologize for the insufficient warning and any inconveniences this many have caused. We will work on getting downtime warnings out earlier.

While we had originally planned to migrate the disk array data at the end of this upgrade process, we ran into some problems. During the upgrade, we started getting hardware fault errors concerning one of the CPUs and memory modules. At that point, we decided it would be best to hold off on the disk array migration until after the hardware issues had been resolved, so we finished with the upgrade and restored services.

After many days of troubleshooting and several on-site SUN service calls, we determined the problems were caused by a faulty memory controller on the mainboard. The mainboard has been replaced and it looks like we are ready to move ahead with the disk array migration. Since most of the setup has already been done, this is not too difficult a task, but it will require another extended downtime. The process boils down to:
1. make a recent sync of all data
2. kick all users off, shutdown web server
3. make final update of any changed data
4. reconfigure servers
5. bring everything back up

The final sync of user data must be performed when nobody can access the data so we don't get inconsistencies in the copies. This will require all users to be logged off and the webserver shutdown (i.e. downtime) sometime in the near future. This process will probably take something like 10 hours or so, depending on how much data changes between rsyncs. After the data is synced and the servers are reconfigured, we can bring everything back up and everybody should be happy.

Please post comments or email staff@ocf.berkeley.edu if you have any questions or concerns.