|
|
Thread Tools | Search this Thread |
09-20-2018, 04:07 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Feb 2014
Device: Kindle Paperwhite 3
|
extra spaces in Kindle (e.g. Ganzhe i t swor t e) but not DC (e.g. Ganzheitsworte)
I scanned a German document at 600dpi. Then I used Briss to split each scanned page into two PDF pages. Then I ran Acrobat DC's OCR for 600dpi output. It worked, as can be verified by copying and pasting the text.
When I send the PDF to Kindle, however, virtually every word has spaces within it. What in DC, e.g., was properly "Ganzheitsworte," when selected within Kindle is "Ganzhe i t swor t e". This renders Kindle's integrated dictionary useless. Ideas? |
09-20-2018, 04:32 AM | #2 |
The Grand Mouse 高貴的老鼠
Posts: 71,507
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Use the text from Acrobat DC's OCR to create a kindle book instead. You shouldn't expect the same results from two different OCR systems.
|
Advert | |
|
09-21-2018, 12:25 AM | #3 |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
I don't understand. Did you somehow use Adobe to create a new PDF with an OCR layer in it, and send that PDF to the kindle? Or did you send the scanned pdf (after cropping with Briss) to the kindle without having performed any OCR beforehand? I don't know enough about Adobe DC to know if it will create a PDF with an OCR layer.
Last edited by willus; 09-21-2018 at 12:25 AM. Reason: Fixed typo |
09-21-2018, 01:40 AM | #4 |
Junior Member
Posts: 6
Karma: 10
Join Date: Feb 2014
Device: Kindle Paperwhite 3
|
willus: I scanned the book as a PDF, ran it through Briss, then used Acrobat DC to add an OCR layer.
pdurrant: Exporting the text from the PDF is not an option. The document has too many quotes in foreign languages, including Greek, using the Greek alphabet. Also, the OCR made quite a few mistakes on the footnotes. I don't think that Kindle runs its own OCR but rather processes the OCR layer in the PDF, adding spaces. |
09-21-2018, 03:03 AM | #5 |
The Grand Mouse 高貴的老鼠
Posts: 71,507
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
Advert | |
|
09-24-2018, 04:40 AM | #6 |
Junior Member
Posts: 6
Karma: 10
Join Date: Feb 2014
Device: Kindle Paperwhite 3
|
Whenever I've sent non-OCR'ed PDFs to my Kindle they lack a text layer. The same goes for this document when I use a version without the text layer.
|
09-24-2018, 06:07 AM | #7 |
The Grand Mouse 高貴的老鼠
Posts: 71,507
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Oh, how interesting. Could it be that the spaces are there in the text layer already?
What happens if you try to convert the PDF with text layer in calibre? |
09-24-2018, 10:26 AM | #8 |
Junior Member
Posts: 6
Karma: 10
Join Date: Feb 2014
Device: Kindle Paperwhite 3
|
When I copy text within Acrobat the spaces are absent.
I just used Calibre to export to TXT and RTF. The former only produces the document outline (but none of the document proper), which lacks the extra spaces. The latter produces the image layer, not the text. I have posted my quandary on the Kindle forum (https://www.mobileread.com/forums/sh...d.php?t=310958), hoping that someone over there has had the same issue. |
10-19-2018, 04:15 PM | #9 |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Typically double-posting is frowned upon at MR, though they definitely need a way to cross-post questions like this to multiple forums. I downloaded the PDF sample you posted in the other thread and looked at it. There are definitely no spaces in the OCR layer (see excerpt from decompressed PDF stream below), so it's a mystery as to why they are put in by Amazon's conversion.
Code:
... 0.05 Tc 9.4807 0 0 9.1 63.27 418.57 Tm (der )Tj 9.2469 0 0 9.1 79.35 418.57 Tm (Ganzheitsworte )Tj 9.65 0 0 9.1 146.38 418.57 Tm (mag )Tj /Suspect <</Conf 0 >>BDC 9.1849 0 0 9.1 167.15 418.57 Tm (salom )Tj ... |
Tags |
german language ebook, ocr |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extra spaces between words | Drybonz | Conversion | 4 | 12-14-2015 08:15 PM |
Extra spaces in AZW3 format on Kindle | ozshots | Calibre | 5 | 09-17-2013 05:04 AM |
Extra spaces in Sigil | noteon | Sigil | 2 | 04-08-2011 02:42 PM |
PDF->Mobi extra spaces inserted? | tapar | Conversion | 8 | 01-29-2011 08:33 PM |
I'm having a problem with extra paragraph spaces | akosimike | Calibre | 10 | 05-27-2010 06:53 PM |