Category Archives: Linux

Linux

Finding Duplicates with sort and uniq

Imagine this: You have two text files full of information, with one data entry on each line. You want to find out which lines occur in both files. Now, if the files are mostly the same, it’s probably best to use a program called diff. However, if the files are mostly different, you can use this little incantation:

cat file1.txt file2.txt | sort -n | uniq -d

This will join file1 and file2, sort the joined data -numerically, and display only the lines that are not unique (uniq -d).

This came in handy for manipulating electrode files today. Our electrode files just contain lists of node numbers. The simulator gets unhappy when you try to do things with overlapping electrodes, so in this way we were able to remove the offending overlap without too much trouble.

Using UNIX (and Linux) for Research in silicio

There is a rank list of the most powerful computers on the planet: Top500.org. (It looks like they’ve recently done a site redesign for the better.) Statistics on these computers are available here. Of the top 500 most powerful computers on the planet, a measly 1.20% run a non-UNIX operating system (OS). 76.20% run an unspecified Linux (compared with, say a certain version of Red Hat or SuSE linux). According to the overall family rankings here, a total of 85.20% run some variant of Linux. The remaining operating systems on the list are all various UNIXes, including BSD and Mac OS X.
Continue reading

Full Queue

I’ve been running simulations over the holiday break, trying to get some data for a deadline, and I’ve had the queue on our cluster mostly to myself for two weeks.

As you can see here (if you look soon), there are now plenty of jobs in the queue, submitted by several users. My lovely vacation of nearly-unlimited CPU power is over. Unfortunately, I just discovered today that some old simulations of mine had a few incorrect parameters, and they need to be re-run. This could be done overnight on an empty cluster. Now I shall have to wait a little while.

The cluster includes some open-source, commodity status monitoring software called Ganglia, as well as some proprietary load monitoring software. I’m curious to see how the cluster and queue loads will look over time — how holidays, weekends, and the time of day affect the cluster load. In my experience using other clusters, Friday, Saturday, and morning on Sunday, as well as early morning on weekdays are the best times to run jobs on shared computing resources.

Treo 650 vs. RAZR2 v8

I have been a Palm OS user since the original Palm Pilot (1000), having also owned/used a Palm Professional, Palm V, Palm Vx, Treo 600, and Treo 650. I’ve also used a couple of Windows CE/Mobile devices, and hated them.

Palm OS has always been very usable. The simple interface combined with a touch screen was easy to get around, and Graffiti handwriting input was fast enough for use in taking class notes. Somewhere along the way, though, Palm got lost. I jumped from Palm OS 3 to Palm OS 5 when I got the Treo 600. Palm OS 5 felt odd — the addition of data and phone capabilities left something of a “seam”. It was a pretty good PDA, but not a great phone. It crashed a bit. Upgrading to the 650 provided a few extra features, but the software was less stable still.

The Treo devices are also bulky. I had no choice but to wear them on a belt holster, as there was no way to comfortably put them in my pocket. The features of the hardware were not being upgraded in line with other phones on the market, nor were the dimensions and weight suitably reduced. When my contract with Sprint was up (or should have been — perhaps I’ll write another post on that later), I went looking for a replacement phone with the following criteria:

Continue reading

Whew! Jobs running on the cluster. (And travel.)

I’ve been totally absent from most of my life the last week as a result of some problems we had with our code on the cluster. My jobs kept dying, taking down compute nodes in the process, for no apparent reason. After a while I narrowed it down to the time when restart files (from a previous simulation) are read. It turns out that the way the files were read (and that way for a good reason) was really brutal on the network. It involved way too much communication. This was okay for smaller models, but I currently have the largest model we’ve ever run in the lab.

After a conversation with our current programmer and one with our former programmer, and about 6 hours of coding last night, the restart files are now read in a less naughty way, and my jobs are reliably running.

—–

Tomorrow I am leaving for about two and a half weeks in New Orleans and Mandeville! I have an early flight, preceded by an even earlier train ride to the airport. I should be hooked up to the “tubes” and (New Year’s resolution here I come) updating the blog more often with the blow-by-blow as I try to get enough data for a Heart Rhythm conference abstract in time for the deadline, despite all of the sundry delays with the cluster.

*gasps for breath*

—–

Also, Penguin liked my cluster video so much that they put it on their front page.