Author Archives: Brock Tice

Whew! Jobs running on the cluster. (And travel.)

I’ve been totally absent from most of my life the last week as a result of some problems we had with our code on the cluster. My jobs kept dying, taking down compute nodes in the process, for no apparent reason. After a while I narrowed it down to the time when restart files (from a previous simulation) are read. It turns out that the way the files were read (and that way for a good reason) was really brutal on the network. It involved way too much communication. This was okay for smaller models, but I currently have the largest model we’ve ever run in the lab.

After a conversation with our current programmer and one with our former programmer, and about 6 hours of coding last night, the restart files are now read in a less naughty way, and my jobs are reliably running.

—–

Tomorrow I am leaving for about two and a half weeks in New Orleans and Mandeville! I have an early flight, preceded by an even earlier train ride to the airport. I should be hooked up to the “tubes” and (New Year’s resolution here I come) updating the blog more often with the blow-by-blow as I try to get enough data for a Heart Rhythm conference abstract in time for the deadline, despite all of the sundry delays with the cluster.

*gasps for breath*

—–

Also, Penguin liked my cluster video so much that they put it on their front page.

System Administrator Wanted

I am currently the system administrator for our cluster, but we’re looking for someone to tend it and other machines full time.

The job is posted on the JHU jobs site here. Key information is excerpted below:

General Description: Position will provide systems administration to the Institute for Computational Medicine. The Institute’s mission is to understand the mechanisms and to improve the diagnosis, prediction and treatment of human disease through applications of mathematics and computational science. This individual will be responsible for operating and maintaining high-performance computing equipment in support of the mission of the Institute. Responsibilities include setting up new high-end compute clusters and installing new software releases, and analyzing and resolving problems associated with server hardware and applications software. The individual will be responsible for installing and maintaining commercial and open source software used in the Institute, and modifying the programs/scripts whenever needed. For this purpose, a basic knowledge of shell scripting, file system permissions, security, and UNIX/Linux best-practices is essential.

Qualifications: Bachelors degree; 4 years experience in a support environment or equivalent combination of education and experience. Knowledge of UNIX/Linux systems administration best practices. Extensive knowledge of Linux, specifically Red Hat Enterprise Linux 4 – 5, Fedora 6 – 7; Rocks Cluster software, and components such as Sun Grid Engine, LAM/MPI, and MPICH. Knowledge of Windows 2003 Server and Windows XP operating systems is also required. Excellent verbal and written communication skills are necessary. Candidate must possess the ability to work professionally with faculty, staff, students, collaborators and vendors within and outside of JHU.

If you are interested, please apply directly through the linked JHU jobs site. If you have any questions about what the job currently entails, feel free to comment or otherwise contact me.