Category Archives: Science

Science

Full Queue

I’ve been running simulations over the holiday break, trying to get some data for a deadline, and I’ve had the queue on our cluster mostly to myself for two weeks.

As you can see here (if you look soon), there are now plenty of jobs in the queue, submitted by several users. My lovely vacation of nearly-unlimited CPU power is over. Unfortunately, I just discovered today that some old simulations of mine had a few incorrect parameters, and they need to be re-run. This could be done overnight on an empty cluster. Now I shall have to wait a little while.

The cluster includes some open-source, commodity status monitoring software called Ganglia, as well as some proprietary load monitoring software. I’m curious to see how the cluster and queue loads will look over time — how holidays, weekends, and the time of day affect the cluster load. In my experience using other clusters, Friday, Saturday, and morning on Sunday, as well as early morning on weekdays are the best times to run jobs on shared computing resources.

The 2008 Post

Happy New Year! My friend Rob literally rang in the new year last night with a pot and spoon at my wife’s apartment in New Orleans. (No falling bullets were encountered.)

2007 was a very interesting year — a year throughout which I lived entirely in Baltimore, with regular jaunts to the Big Easy. This year that should change. This is also the year that I plan to finish my Ph.D.. Remaining steps include:

  • Finishing two papers.
  • Doing a graduate board oral exam (qualifier) at Hopkins. I’ve done one at Tulane already.
  • Writing and defending my thesis.

I want to graduate by December.

My current focus, however, is finishing my abstract in time for our big conference deadline on the 4th.

Whew! Jobs running on the cluster. (And travel.)

I’ve been totally absent from most of my life the last week as a result of some problems we had with our code on the cluster. My jobs kept dying, taking down compute nodes in the process, for no apparent reason. After a while I narrowed it down to the time when restart files (from a previous simulation) are read. It turns out that the way the files were read (and that way for a good reason) was really brutal on the network. It involved way too much communication. This was okay for smaller models, but I currently have the largest model we’ve ever run in the lab.

After a conversation with our current programmer and one with our former programmer, and about 6 hours of coding last night, the restart files are now read in a less naughty way, and my jobs are reliably running.

—–

Tomorrow I am leaving for about two and a half weeks in New Orleans and Mandeville! I have an early flight, preceded by an even earlier train ride to the airport. I should be hooked up to the “tubes” and (New Year’s resolution here I come) updating the blog more often with the blow-by-blow as I try to get enough data for a Heart Rhythm conference abstract in time for the deadline, despite all of the sundry delays with the cluster.

*gasps for breath*

—–

Also, Penguin liked my cluster video so much that they put it on their front page.