The plot thickens further: I have a copy of Readiris OCR, so I tried pulling this PDF file in to see if I could just OCR it. All I see in Readiris is boxes instead of letters. I tried a different PDF file and it worked fine (well, mostly fine--usual OCR type errors). Note that in the "thumbnail preview" mode on the Mac in the Finder, I also see boxes instead of text. Also, in the "Preview" application on the Mac I see boxes. (This isn't surprising, as I strongly suspect these two bits of software use the same code.)
Does anyone here know enough about PDF to guess what's happening? Again, when I look at the fonts (in Document Properties in Acrobat Reader) I see pretty weird names, e.g. "TTE1D974C0t00 (Embedded Subset)". It's a truetype font, but the encoding is listed as "Custom." In files that behave more normally I see recognizeable font names (variations on Arial or Times New Roman) and encoding of "Ansi". Does anyone know how to work around this problem? Maybe I'm going to need the full version of Acrobat after all....
|