Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 03-09-2010, 04:29 AM   #1
Begemot
Zealot
Begemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic something
 
Posts: 111
Karma: 18638
Join Date: Dec 2009
Device: Sony DPT-S1, Kindle DX, iPad, Kobo Mini,H20,iRiver StoryHD
DJVU to ePub best results?

What methods have you used to convert DJVU to ePub?

The current method I am using is as follows:
Open DJVU with DjVuLibre DjView 4.4
Export as PDF
Then add PDF to Calibre Library and make ePub(this is basically running pdf2html utility)

Problem is the conversion to PDF step.
4MB DJVU explodes to 35MB PDF!
Then PDF to ePub goes down to 20-25MB, but the results are less than stellar.
Slightly smaller problem is that running pdf2html on a 35MB PDF takes about a half an hour, but I could live with that if the quality was good.

All of this was done on Ubuntu 9.10, but I would be interested in hearing about DJVU to ePub solutions on Windows or Mac as well.
Begemot is offline   Reply With Quote
Old 03-09-2010, 09:04 AM   #2
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
DjVu doesn't contain text, it works on images, and that's your problem. It uses an image-compression technology that's highly optimised for text and allows far smaller file sizes than other formats that target more general image types.

You could export the images and OCR them (lots of work to catch the errors), or you could try slapping the images all together as-is (which is what you describe above, possibly with some down-ressing to make it look even worse). Go back to the author and get a file that contains text, because DjVu is useless for your purpose.
charleski is offline   Reply With Quote
Old 02-07-2011, 03:44 PM   #3
Websterny
Junior Member
Websterny began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2011
Device: Kobo Mini
I know nothing about this topic, but I have the same problem, and I note that DJVU files do seem to contain text - that is, the documents are searchable. They can be converted to non-searchable PDF with the print command (assuming you have a PDF print driver installed). But this significantly detracts from the utility of the files. And their size is a multiple of the original DJVU. There has got to be a better way.
Websterny is offline   Reply With Quote
Old 02-07-2011, 04:16 PM   #4
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by charleski View Post
DjVu doesn't contain text, it works on images, and that's your problem. It uses an image-compression technology that's highly optimised for text and allows far smaller file sizes than other formats that target more general image types.
DJVU's Can contain a hidden text layer (which is used in the search feature). This layer can be extracted and used as the basis for any other conversion.

For example most of the DJVU files on The Internet Archive (TIA) contain such a layer and I have used them as a basis for FB2 books.

Of course the files the OP is working on may not have such a layer as the original text may not have been OCRd and associated with the image layer.

BobC
BobC is offline   Reply With Quote
Old 02-07-2011, 08:59 PM   #5
pholy
Booklegger
pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.
 
pholy's Avatar
 
Posts: 1,801
Karma: 7999816
Join Date: Jun 2009
Location: Toronto, Ontario, Canada
Device: BeBook(1 & 2010), PEZ, PRS-505, Kobo BT, PRS-T1, Playbook, Kobo Touch
BobC - Can you tell us how to extract that hidden text layer? I haven't run across any DJVU books that I recall, but it would be good to know how to convert them when possible.
pholy is offline   Reply With Quote
Old 02-09-2011, 04:55 PM   #6
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by pholy View Post
BobC - Can you tell us how to extract that hidden text layer?
Either Highlight the text in the Image view and use <CTL>C to copy it to the clipboard or use the "Export Text" feature in WinDJView or some similar Viewer.

BobC
BobC is offline   Reply With Quote
Old 02-11-2011, 02:53 AM   #7
Begemot
Zealot
Begemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic something
 
Posts: 111
Karma: 18638
Join Date: Dec 2009
Device: Sony DPT-S1, Kindle DX, iPad, Kobo Mini,H20,iRiver StoryHD
OP here, I resorted to using export Text in WinDJView.

This gets you a text dump with no formatting whatsoever. For my Libre it works well enough, but in general, this procedure is suboptimal.

Most DJVU files do seem to have a text layer (unless there is some on the fly OCR happening when you select an area on the page, which seems unlikely).

Thus, there must be a way(at least theoretically until someone writes a converter) to preserve the formatting in the text layer.
Begemot is offline   Reply With Quote
Old 02-11-2011, 05:29 AM   #8
bugmen0t
Banned
bugmen0t began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Feb 2011
Device: Kindle 3 3G+WiFi
I just print from any djvu reader into a pdf printer, like primopdf or other... only thing is: 240 pages book, 4.5Mb originally was transformed into 40Mb... maybe trimming the quality of the pdf down...
bugmen0t is offline   Reply With Quote
Old 02-12-2011, 10:04 AM   #9
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by Begemot View Post
OP here, I resorted to using export Text in WinDJView.

This gets you a text dump with no formatting whatsoever. For my Libre it works well enough, but in general, this procedure is suboptimal.

Most DJVU files do seem to have a text layer (unless there is some on the fly OCR happening when you select an area on the page, which seems unlikely).

Thus, there must be a way(at least theoretically until someone writes a converter) to preserve the formatting in the text layer.
I can assure you that the text layer is just that - text; it's purpose is simply to provide the search capability. There is no formatting and in many books there are OCR "mis-reads".

If you want to understand DJVUs then you need to get the spec and study it. I've done quite a bit of work with adding TOCs to existing DJVUs and have converted a couple of books to FB2 - this involves manually proof-reading and correcting the dumped text then formatting it to match the original (italics, bold etc).

Don't expect too much out of what is a by-product of the search function.

BobC
BobC is offline   Reply With Quote
Old 02-12-2011, 02:10 PM   #10
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
There is a description of DJVU in the wiki.
DaleDe is offline   Reply With Quote
Old 05-05-2017, 11:17 AM   #11
Mex5150
Banned
Mex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the rough
 
Posts: 23
Karma: 7476
Join Date: Feb 2013
Location: To the left of your body, to the right of your mind, lost in a riddle.
Device: Android phone, Nook Touch Simple Glowlight
OK, resurrecting an old thread, sorry. But, I'm now stuck with this problem, has conversion moved on since this was originally posted? I can extract the text in the djvu file (in a few different ways) but I can't get inline images to convert. What I want is to go from djvu to epub, I'm not worried about extra steps in-between as long as both text and images make it to the final epub. Like the OP I'm running Linux, but I can probably borrow a windows machine for a while if that's the only way to do this. Any ideas?
Mex5150 is offline   Reply With Quote
Old 05-05-2017, 11:35 AM   #12
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Mex5150 View Post
Any ideas?
Calibre has a built-in DJVU converter that allows you to convert DJVU files with text layers to epubs.
Doitsu is offline   Reply With Quote
Old 05-05-2017, 11:52 AM   #13
Mex5150
Banned
Mex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the rough
 
Posts: 23
Karma: 7476
Join Date: Feb 2013
Location: To the left of your body, to the right of your mind, lost in a riddle.
Device: Android phone, Nook Touch Simple Glowlight
Quote:
Originally Posted by Doitsu View Post
Calibre has a built-in DJVU converter that allows you to convert DJVU files with text layers to epubs.
Calibre was the first thing I tried, but that just converts the text without the images. (probably should have mentioned I'd already failed with Calibre, oops)
Mex5150 is offline   Reply With Quote
Old 05-05-2017, 01:41 PM   #14
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Mex5150 View Post
Calibre was the first thing I tried, but that just converts the text without the images. (probably should have mentioned I'd already failed with Calibre, oops)
AFAIK, you can either extract the text layer or images, but not both. In order to keep the inline images, you'll have to convert the DJVU files to PDF files and then OCR them like any other PDF file with ABBYY Finereader.

If your Nook is rooted, you might be able to install KOReader, which supports DJVU files.

Alternatively, if the Nook PDF app doesn't have a reflow function, you might be able to use k2pdfopt to reformat PDF files generated from your DJVU files for your NOOK. (In case you can't figure out the optimal conversion settings, the author, willus, provides tech support in the k2pdfopt forum.)
Doitsu is offline   Reply With Quote
Old 05-05-2017, 03:37 PM   #15
Mex5150
Banned
Mex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the roughMex5150 is a jewel in the rough
 
Posts: 23
Karma: 7476
Join Date: Feb 2013
Location: To the left of your body, to the right of your mind, lost in a riddle.
Device: Android phone, Nook Touch Simple Glowlight
Quote:
Originally Posted by Doitsu View Post
AFAIK, you can either extract the text layer or images, but not both.
Well, that's a pain. Looks like I'll have to survive without the images then.

Thanks for your help anyway.
Mex5150 is offline   Reply With Quote
Reply

Tags
djvu epub conversion


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Qindle - Qt for Kindle (Now with PDF, DJVU, EPUB and CHM support) meem Kindle Developer's Corner 14 07-21-2011 04:49 PM
A real PDF to epub/djvu/rtf/html software?. DsOft ePub 35 01-02-2011 03:57 PM
Qindle .. Qt port with PDF, DJVU, EPUB and CHM support meem Kindle Developer's Corner 17 10-03-2010 06:19 AM
pdf to epub results in 'garbage'? wulfie Calibre 6 09-23-2010 08:01 AM
History Stillwell, Wendell. X-15 Research Results. Epub v1.0 12 July 2009 Nate the great ePub Books 1 08-26-2009 12:02 AM


All times are GMT -4. The time now is 05:51 PM.


MobileRead.com is a privately owned, operated and funded community.