![]() |
#1 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 48
Karma: 2112464
Join Date: Apr 2017
Device: Kindle 10th gen. Paperwhite (2018)
|
Converting DJVU with bad OCR
I have a DJVU file that I want to read on Kindle, which has high quality scans with little noise, all properly oriented etc. However, when I try to convert it to any other format (including PDF) with Calibre, I end up with the underlying OCR text, which is horrible - no paragraph breaks, probably never checked for spelling, any bold/italic text is completely garbled... How do I get rid of this and get the scanned images instead?
|
![]() |
![]() |
![]() |
#2 |
Unicycle Daredevil
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,944
Karma: 185432100
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
|
I have no idea how to do that in Calibre; I guess you will have to run your file through a proper OCR program like ABBYY Finereader (or one of the free OCR applications based on Tesseract). I don't know which of them process djvu, though; perhaps you will have to save the individual pages as images first.
Anyway, no matter how well your OCR turns out, you will always have to manually proofread the result. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
In addition, if you have no OCR you can print the DJVU-file alternatively to a PDF-Printer (included in windows) and open the PDF in MS-Word 2016 (or version 2013). The included OCR in Word did a surprisingly good Job. It's not perfect but in combination with spell checker and S&R you can do a decent job too.
|
![]() |
![]() |
![]() |
#4 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 48
Karma: 2112464
Join Date: Apr 2017
Device: Kindle 10th gen. Paperwhite (2018)
|
Can't I just remove the OCR text inside the DJVU file somehow? I'd prefer to have the images in PDF format on Kindle, the letters are clear and the pages are narrow, and I wouldn't have to deal with the whole OCR thing.
|
![]() |
![]() |
![]() |
#5 |
Unicycle Daredevil
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,944
Karma: 185432100
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
|
I've just done a quick web search for "djvu to pdf conversion". It seems there are tons of free online tools available; why don't you try one of those?
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
If I am understanding correctly, you want your DJVU (scanned images with hidden OCR) to be turned into a PDF (scanned images only).
There are multiple ways to handle this. If you aren't afraid of the commandline, you could use ddjvu to go from DJVU -> PDF: http://djvu.sourceforge.net/doc/index.html For example, here is one usage of ddjvu off of Stack Exchange: https://superuser.com/questions/1005.../596167#596167 Last edited by Tex2002ans; 12-21-2017 at 06:09 AM. |
![]() |
![]() |
![]() |
#7 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 48
Karma: 2112464
Join Date: Apr 2017
Device: Kindle 10th gen. Paperwhite (2018)
|
Solved
Recently had the same problem again (and forgot the solution), but this time around I found a GUI program before I remembered this thread. Apologies for bumping this long deceased thread, hopefully this helps someone else who googles for this.
In case anyone doesn't want to use command line, a non-command line freeware program does just what I wanted - it converted the DJVU to PDF, preserving the image layer and TOC but discarding the OCR layer: STDU Converter. It has lots of options but I just hit Convert and it did what I wanted, and gave a fairly small output file too. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
djvu ocr | konino | Workshop | 1 | 10-13-2017 07:10 PM |
Bad OCR... When spellcheck won't help | GrannyGrump | Workshop | 11 | 10-22-2015 08:42 AM |
Help converting djvu to pdf | poliandro | Workshop | 1 | 05-08-2015 04:45 PM |
Help converting Djvu to mobi | Stratogirl | Amazon Kindle | 3 | 07-07-2011 09:46 AM |
Converting OCR Text files | jedavis1 | Workshop | 10 | 10-01-2009 10:09 PM |