Category Archives: Mac OS X

Mac OS X

C++ ifstream and the 2 GB Limit

Any system that encodes values in some set number of places has a limit on the values that can be held. For example, old, mechanical cash registers were physically limited in the number of digits they could ring up. Likewise, modern LCD cash registers are limited by the number of digits available on the screen, though they may be able to hold longer numbers than they can show. The “Y2k” problem was also a result of such limitations.

Binary encoding of course has similar limits. Two bits can hold four values. Three bits can hold eight values. The relationship there is exponential — x bits are limited to 2x values.

One thing that programs might want to keep track of internally is the current location in a file that one is reading or writing. Like a bookmark, there are one or more variables that can be used to store locations in a file. Typically these are simply integers, indicating some sort of offset from the beginning or end of the file. These bookmarks impose a limitation. No file positions past 2x offsets may be recorded, or often even reached.

It turns out that in C++, in Linux, x = 31. This provides 231 = 2147483648 positions. Given that a position is normally a byte (B), we are then limited to 2147483648 B, or 2097152 kB, or 2048 MB, or 2 GB. With our newest, largest models, this is a problem. My latest simulations are producing data files that are around 21 GB when uncompressed.

It turns out that in 64-bit Linux, libc does not have this problem. It is therefore possible to use normal C file I/O commands to process large files (on most modern systems, which incorporate Large File Support or LFS). After searching high and low on the interwebs for a way to convince C++ to use larger variables for file I/O, and not finding much, I caved and spent all of five minutes changing my code over to use C I/O.

What made this bug a little more difficult to track down than it might have been is that for some reason, Mac OS X Leopard does not suffer from this problem with C++ stream I/O. Someone at Apple must have allocated a few more bits for file position pointers. My code was therefore working fine on my sorta-64-bit Mac but not on our fully-64-bit cluster or our workstations running recent versions of Linux.

Ultimately it would be nice to write some kind of wrapper or overload the C++ I/O functions to do things correctly, but for the moment my code is working properly and I am happy.

Do you have any better ideas for getting around this 2GB limit?

Advanced Bash Scripting

I have written before about the usefulness of command-line scripting in computational science.

Today, while looking for some information on various file test operators in bash (e.g. to check whether a file or directory exists), I found this amazing guide. As the author puts it,

This tutorial assumes no previous knowledge of scripting or programming, but progresses rapidly toward an intermediate/advanced level of instruction . . . all the while sneaking in little snippets of UNIX® wisdom and lore. It serves as a textbook, a manual for self-study, and a reference and source of knowledge on shell scripting techniques.

For instructional purposes, all along the examples have little comments like, “explain why this is the case…”, to test your knowledge as you go through the manual. This would make it excellent for use as textbook on basic programming ideas. It is even available in PDF format, and was updated March 18th of 2008.

I can assure you that every new member of the lab will be getting a link to this guide from me. Proper knowledge of shell scripting is an amplifier of one’s productivity. An investment of a few hours learning the basics will probably return a hundred-fold savings of time over a few months. More advanced concepts are naturally learned as more difficult scenarios are encountered. I’ll be writing soon about some of the more sophisticated issues I’ve encountered using shell scripting.

GTD Tip: Finder’s Column View

For the Mac users among you, here’s something I’ve been doing the last few months that you might find useful.

When you get to the following part of your weekly review (which you are doing, aren’t you?):

Review “Pending” and Support Files
Browse through all work-in-progress support material to trigger new actions, completions, and waiting-fors.

try using the column view in Finder to go through your digital files. This assumes that you have some or most of your project support materials in digital form. Here’s my project view (click thumbnails for full size):

threecol

As you click through the list, it’s easy to delve into the sub-directories but keep track of where you are:

threecol_2

Do you have any weekly review tricks?

CESE Single-Cell Simulator

I recently discovered an interesting piece of open-source software, the CESE single-cell simulator. It’s based on Java and runs on a number of platforms.

The point of this simulator is strictly to run single-cell electrophysiological models. It comes with a few of the staples in the field (like the Luo-Rudy dynamic model), and you can buy more recent/complex models from a company called Simulogic. Alternatively, there are directions on the site for designing your own models.

Unfortunately, the program currently displays the output of all of the selected variables on the same plot, rather than breaking the plot into several panels, one for each current. The latter is the way we typically look at model data. Furthermore, I don’t see a way to import experimental data traces for comparison. I also had some rendering issues with pull-down selectors in Mac OS X’s Java implementation.

We currently have our own single-cell simulator with an ugly but functional GUI, linked to the ionic models in our tissue simulator. However, it might be nice going forward to make our models compatible with CESE, and to work with the CESE developers to improve the view mode. It would be nice to have CESE as a standard platform for model development.

If you want to download it and try it out, you just need a working Java installation, and you can get CESE itself here. Check the built-in help for a tutorial.

Have you used CESE before? What did you think? If you download it and try it out, please post something about your experience here as well (or email me).