Category Archives: Tech

Tech

Your Emails — They are not secure

In other news from the house-hunting front, we’ve been working with lenders to finance the purchase of a house. Lenders want a lot of information. They want bank statements, driver’s license copies, landlord information, tax returns, income statements, current address, credit card statements, letters of employment and so on. Of course, they also want that ubiquitous, unchangeable, universal secret password, the social security number.

You would think, given the nature of this collection of information, and the rising prevalence and cost of identity theft, that these people would be careful with this information. If you’re cynical or just a realist, maybe you wouldn’t think that. Anyway, you’d be wrong. One of the first lenders we dealt with EMAILED A COMPLETE, FILLED COPY of the application form to us for signatures. No encryption, whatsoever. It was like an identity theft starter kit. After we confronted them about it, they said they had no idea this was insecure, and offered to fax or FedEx the documents instead.

If you don’t already know this, you really need to know: Email, without any special add-ons, is the opposite of secret. It is the digital equivalent of a postcard — anyone along the way can read it, and you have no idea who will be along the way. Would you tape your social security card to the back of a postcard and send it across the country? Furthermore, there’s no guarantee that an email’s “From:” address is accurate, as you may have deduced from spam email that you’ve received. All it takes to forge it is changing a string of text when putting the message together.

There are ways to use email to send secure, confidential communications. Probably the most universal and robust way is with PGP or (preferably) GPG. The main reason these solutions aren’t used more widely is that encrypted communication is difficult to do correctly. Keys have to be generated, passwords selected, keys exchanged and signed, managed, and sometimes even revoked. A number of pieces have to fit together, including the encryption engine, mail program plug-ins, and file encryption software. The difficulty of using proper encryption is not, however, an excuse for sending my SSN in plain text via E-mail. When used with good enough ciphers, email can be safe even from the prying eyes of the US Government, who would have to spend hundreds or thousands of years of computer time attempting to crack your key. Furthermore, with or without encrypting the message, cryptographic signatures may be used to verify that the purported sender of the message is in fact the true sender of the message. This eliminates the problem of From address forgery.

Should you wish to send encrypted e-mail my way, you may find my public key here.

Google StreetView for Apartment/House Hunting

We are trying to find a place to live in St.Paul, while living in New Orleans and Baltimore. This Friday we will actually be going to St.Paul to look at places, but we need to have a good list of places to look at when we arrive. To that end, we’ve been using some online services, combined with a realtor.

The advent and recent expansion of Google StreetView has changed this process dramatically. Whereas before we were limited to seeing street layout and satellite images in Google Maps with Housing Maps (a mashup with CraigsList), it is now possible to take an address from an ad and go for a virtual stroll through the neighborhood.

It’s a lot easer to get a feel for a neighborhood by looking at storefronts, cars, intersections, parks, and yes, even people out and about, than it is to do so by map or satellite. Some people are up in arms about StreetView, and not without good reason, but the benefits are pretty tantalizing. People are antsy about being caught on camera, but I’ve only seen 10 or so people using the service. Most of the time residential streets are empty during the day — everyone’s at work and school. The people that I have seen are generally just walking down the street or riding bicycles. However, I did see some people with police cars parked out front of their house, apparently discussing something with officers and several family members. I couldn’t tell if it was a burglary report, a domestic violence complaint, or murder investigation, but those people probably didn’t want the whole world watching.

Have you used StreetView?

I CAN HAZ PROGRAMMING LANGUAGE?

Okay, this is pretty much just for the programmers out there that are familiar with the LOLCATS phenomenon, but there’s an esoteric programming language called LOLCODE.

It made me lol for real. Here’s an example that prints the date followed by the beginning of the Fibonacci sequence:


IN MAI datetime GIMME date LIKE DATE

SO IM LIKE FIBBING WIT N OK?
    LOL ITERATE FIBONACCI TERMS LESS THAN N /LOL
    SO GOOD N BIG LIKE EASTERBUNNY
    BTW, FIBONACCI LIKE BUNNIES! LOL
    U BORROW CHEEZBURGER
    U BORROW CHEEZBURGER
    I CAN HAZ CHEEZBURGER
    HE CAN HAZ CHEEZBURGER
    WHILE I CUTE?
        I AND HE CAN HAZ HE AND I ALONG WITH HE
        IZ HE BIG LIKE N?
            KTHXBYE
        U BORROW HE

IZ __name__ KINDA LIKE "__main__"?
    COMPLAIN "NOW IZ" AND DATE OWN today THING
    IZ BIGNESS ARGZ OK KINDA LIKE 1?
        N CAN HAS 100
    NOPE?
        N CAN HAS NUMBR ARGZ LOOK AT 1!!
    GIMME EACH I IN UR FIBBING WIT N OK?
        VISIBLE I

I think that basically speaks for itself. You can find the Python-based interpreter here. Srsly.

Calculating Large Numbers in C

As a corollary to my last post, it’s important to be careful when calculating file seek positions (if you’re skipping around that way). It turns out it’s necessary to cast all of the numbers being used when calculating a seek position to a large integer, such as unsigned long int.

By the way, Rob had some helpful comments on that last post. (Thanks Rob!)

C++ ifstream and the 2 GB Limit

Any system that encodes values in some set number of places has a limit on the values that can be held. For example, old, mechanical cash registers were physically limited in the number of digits they could ring up. Likewise, modern LCD cash registers are limited by the number of digits available on the screen, though they may be able to hold longer numbers than they can show. The “Y2k” problem was also a result of such limitations.

Binary encoding of course has similar limits. Two bits can hold four values. Three bits can hold eight values. The relationship there is exponential — x bits are limited to 2x values.

One thing that programs might want to keep track of internally is the current location in a file that one is reading or writing. Like a bookmark, there are one or more variables that can be used to store locations in a file. Typically these are simply integers, indicating some sort of offset from the beginning or end of the file. These bookmarks impose a limitation. No file positions past 2x offsets may be recorded, or often even reached.

It turns out that in C++, in Linux, x = 31. This provides 231 = 2147483648 positions. Given that a position is normally a byte (B), we are then limited to 2147483648 B, or 2097152 kB, or 2048 MB, or 2 GB. With our newest, largest models, this is a problem. My latest simulations are producing data files that are around 21 GB when uncompressed.

It turns out that in 64-bit Linux, libc does not have this problem. It is therefore possible to use normal C file I/O commands to process large files (on most modern systems, which incorporate Large File Support or LFS). After searching high and low on the interwebs for a way to convince C++ to use larger variables for file I/O, and not finding much, I caved and spent all of five minutes changing my code over to use C I/O.

What made this bug a little more difficult to track down than it might have been is that for some reason, Mac OS X Leopard does not suffer from this problem with C++ stream I/O. Someone at Apple must have allocated a few more bits for file position pointers. My code was therefore working fine on my sorta-64-bit Mac but not on our fully-64-bit cluster or our workstations running recent versions of Linux.

Ultimately it would be nice to write some kind of wrapper or overload the C++ I/O functions to do things correctly, but for the moment my code is working properly and I am happy.

Do you have any better ideas for getting around this 2GB limit?