08-31-2011, 05:44 PM | #1 | |
Member
Posts: 13
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Kindle clippings from pdfs have no whitespace
Hi
I often read pdfs on my kindle and make notes and highlights. I then parse the clippings file and associate notes with highlights so I can save time on capturing the knowledge. Some pdfs save highlights without any whitespace, eg Quote:
|
|
08-31-2011, 08:52 PM | #2 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
I have seen this happen. This means that text layer doesn't have spaces between words. Built-in dictionary also doesn't recognize words. If you can find Sony reader try to load the book to it and see how attempt to highlight the word highlights whole line instead.
Same happens if text layer is missing and pdf has only images of pages. |
Advert | |
|
09-01-2011, 05:37 PM | #3 |
Member
Posts: 13
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Hi - this pdf certainly has genuine text as opposed to scanned pages. I can use pdftohtml and see the text in the resulting html file with whitespace.
Dumping just the offending line from the clippings file to another file and then running chardet against it just outputs the encoding type ascii. I have a bad feeling my kindle is somehow misinterpreting the pdf and I'm loosing the data at the time the highlight is written to clippings.txt and therefore I'm stuffed! |
09-02-2011, 12:27 AM | #4 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
That's interesting. However, I'm still sure that white spaces are missing from original pdf. Instead, characters are just drawn at prescribed positions within the line space. Just recently we found the same issue with pdfs created by Prince XML software if (and only if) pdf generated from xml file is fully justified: Adobe Reader Mobile used by Amazon and Sony cannot distinguish words for highlighting.
Could you make a sample of problem file available for download? |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Integrating Kindle clippings, DokuWiki and BibDesk for academic notes | houshuang | Amazon Kindle | 10 | 02-26-2012 09:09 AM |
A great idea for what to do with Kindle clippings | ficbot | Amazon Kindle | 0 | 05-28-2011 10:10 AM |
clippings from a book? (Kindle 3) | StickMaker | Amazon Kindle | 4 | 03-01-2011 09:35 PM |
Kindle DX-Highlights & “My Clippings” file | pavelh | Amazon Kindle | 5 | 06-21-2010 02:51 PM |
Is there a way to organize or delete clippings on the Kindle? | ficbot | Amazon Kindle | 3 | 03-30-2010 04:48 AM |