Hi - this pdf certainly has genuine text as opposed to scanned pages. I can use pdftohtml and see the text in the resulting html file with whitespace.
Dumping just the offending line from the clippings file to another file and then running chardet against it just outputs the encoding type ascii. I have a bad feeling my kindle is somehow misinterpreting the pdf and I'm loosing the data at the time the highlight is written to clippings.txt and therefore I'm stuffed!
|