Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-18-2017, 11:51 PM   #1
Ridcully
Enthusiast
Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.
 
Ridcully's Avatar
 
Posts: 48
Karma: 2112464
Join Date: Apr 2017
Device: Kindle 10th gen. Paperwhite (2018)
Converting DJVU with bad OCR

I have a DJVU file that I want to read on Kindle, which has high quality scans with little noise, all properly oriented etc. However, when I try to convert it to any other format (including PDF) with Calibre, I end up with the underlying OCR text, which is horrible - no paragraph breaks, probably never checked for spelling, any bold/italic text is completely garbled... How do I get rid of this and get the scanned images instead?
Ridcully is offline   Reply With Quote
Old 12-19-2017, 01:33 AM   #2
doubleshuffle
Unicycle Daredevil
doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.
 
doubleshuffle's Avatar
 
Posts: 13,944
Karma: 185432100
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
I have no idea how to do that in Calibre; I guess you will have to run your file through a proper OCR program like ABBYY Finereader (or one of the free OCR applications based on Tesseract). I don't know which of them process djvu, though; perhaps you will have to save the individual pages as images first.

Anyway, no matter how well your OCR turns out, you will always have to manually proofread the result.
doubleshuffle is offline   Reply With Quote
Advert
Old 12-19-2017, 09:19 AM   #3
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
In addition, if you have no OCR you can print the DJVU-file alternatively to a PDF-Printer (included in windows) and open the PDF in MS-Word 2016 (or version 2013). The included OCR in Word did a surprisingly good Job. It's not perfect but in combination with spell checker and S&R you can do a decent job too.
Divingduck is offline   Reply With Quote
Old 12-19-2017, 03:01 PM   #4
Ridcully
Enthusiast
Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.
 
Ridcully's Avatar
 
Posts: 48
Karma: 2112464
Join Date: Apr 2017
Device: Kindle 10th gen. Paperwhite (2018)
Can't I just remove the OCR text inside the DJVU file somehow? I'd prefer to have the images in PDF format on Kindle, the letters are clear and the pages are narrow, and I wouldn't have to deal with the whole OCR thing.
Ridcully is offline   Reply With Quote
Old 12-19-2017, 03:07 PM   #5
doubleshuffle
Unicycle Daredevil
doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.
 
doubleshuffle's Avatar
 
Posts: 13,944
Karma: 185432100
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
I've just done a quick web search for "djvu to pdf conversion". It seems there are tons of free online tools available; why don't you try one of those?
doubleshuffle is offline   Reply With Quote
Advert
Old 12-21-2017, 06:06 AM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
If I am understanding correctly, you want your DJVU (scanned images with hidden OCR) to be turned into a PDF (scanned images only).

There are multiple ways to handle this.

If you aren't afraid of the commandline, you could use ddjvu to go from DJVU -> PDF:

http://djvu.sourceforge.net/doc/index.html

For example, here is one usage of ddjvu off of Stack Exchange:

https://superuser.com/questions/1005.../596167#596167

Last edited by Tex2002ans; 12-21-2017 at 06:09 AM.
Tex2002ans is offline   Reply With Quote
Old 06-01-2020, 02:35 PM   #7
Ridcully
Enthusiast
Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.Ridcully ought to be getting tired of karma fortunes by now.
 
Ridcully's Avatar
 
Posts: 48
Karma: 2112464
Join Date: Apr 2017
Device: Kindle 10th gen. Paperwhite (2018)
Solved

Recently had the same problem again (and forgot the solution), but this time around I found a GUI program before I remembered this thread. Apologies for bumping this long deceased thread, hopefully this helps someone else who googles for this.

In case anyone doesn't want to use command line, a non-command line freeware program does just what I wanted - it converted the DJVU to PDF, preserving the image layer and TOC but discarding the OCR layer: STDU Converter. It has lots of options but I just hit Convert and it did what I wanted, and gave a fairly small output file too.
Ridcully is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
djvu ocr konino Workshop 1 10-13-2017 07:10 PM
Bad OCR... When spellcheck won't help GrannyGrump Workshop 11 10-22-2015 08:42 AM
Help converting djvu to pdf poliandro Workshop 1 05-08-2015 04:45 PM
Help converting Djvu to mobi Stratogirl Amazon Kindle 3 07-07-2011 09:46 AM
Converting OCR Text files jedavis1 Workshop 10 10-01-2009 10:09 PM


All times are GMT -4. The time now is 01:03 PM.


MobileRead.com is a privately owned, operated and funded community.