Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle

Notices

Reply
 
Thread Tools Search this Thread
Old 08-31-2011, 05:44 PM   #1
bmf
Member
bmf began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
Kindle clippings from pdfs have no whitespace

Hi

I often read pdfs on my kindle and make notes and highlights. I then parse the clippings file and associate notes with highlights so I can save time on capturing the knowledge. Some pdfs save highlights without any whitespace, eg

Quote:
becauseitmeansonly
Any ideas for avoiding this? I can work around it by manually inserting spaces but would rather not.
bmf is offline   Reply With Quote
Old 08-31-2011, 08:52 PM   #2
EbokJunkie
Addict
EbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blue
 
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
I have seen this happen. This means that text layer doesn't have spaces between words. Built-in dictionary also doesn't recognize words. If you can find Sony reader try to load the book to it and see how attempt to highlight the word highlights whole line instead.
Same happens if text layer is missing and pdf has only images of pages.
EbokJunkie is offline   Reply With Quote
Advert
Old 09-01-2011, 05:37 PM   #3
bmf
Member
bmf began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
Hi - this pdf certainly has genuine text as opposed to scanned pages. I can use pdftohtml and see the text in the resulting html file with whitespace.

Dumping just the offending line from the clippings file to another file and then running chardet against it just outputs the encoding type ascii. I have a bad feeling my kindle is somehow misinterpreting the pdf and I'm loosing the data at the time the highlight is written to clippings.txt and therefore I'm stuffed!
bmf is offline   Reply With Quote
Old 09-02-2011, 12:27 AM   #4
EbokJunkie
Addict
EbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blue
 
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
That's interesting. However, I'm still sure that white spaces are missing from original pdf. Instead, characters are just drawn at prescribed positions within the line space. Just recently we found the same issue with pdfs created by Prince XML software if (and only if) pdf generated from xml file is fully justified: Adobe Reader Mobile used by Amazon and Sony cannot distinguish words for highlighting.
Could you make a sample of problem file available for download?
EbokJunkie is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Integrating Kindle clippings, DokuWiki and BibDesk for academic notes houshuang Amazon Kindle 10 02-26-2012 09:09 AM
A great idea for what to do with Kindle clippings ficbot Amazon Kindle 0 05-28-2011 10:10 AM
clippings from a book? (Kindle 3) StickMaker Amazon Kindle 4 03-01-2011 09:35 PM
Kindle DX-Highlights & “My Clippings” file pavelh Amazon Kindle 5 06-21-2010 02:51 PM
Is there a way to organize or delete clippings on the Kindle? ficbot Amazon Kindle 3 03-30-2010 04:48 AM


All times are GMT -4. The time now is 05:50 PM.


MobileRead.com is a privately owned, operated and funded community.