I just made two very important discoveries. As much as I loathed buying Acrobat Standard, and as poorly as it ran on my machine using a case sensitive filesystem, it does have a redeeming feature.
It has a built-in OCR engine, which I knew. I hadn’t tried it. I decided to try it on an academic paper that I had in my archive. However, when I loaded the paper (which looked scanned) I was already able to select the text, though I didn’t know why. I was also able to do so in Preview, so it couldn’t have been a feature of Acrobat.
I took another paper that was clearly scanned, and tried to run OCR on it. It didn’t have the selectability that the first one did to start with. However, after OCR… it did.
So, the two important discoveries are that Acrobat will overlay your scanned documents with selectable text information transparently, and that Circulation Research appears to have already done this on their downloadable PDFs from older articles.
This explains why PDFs that I thought were scanned have been showing up in Spotlight searches that pick up their contents.
ADDENDUM: Apparently Acrobat Pro can do this in batch mode. This has major implications for me. I might even consider buying it at some point, once they come out with a universal binary.
News flash: Acrobat 8 Professional *is* a Universal Binary application; there is no Acrobat 8 Standard version on the Mac.
I looked after writing the post and saw that Acrobat 8 had come out. I wasn’t aware — I’m still using Acrobat Standard 7 with Rosetta.
Pingback: Virtually Shocking » Blog Archive » Skim for scientific reading