Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 03-09-2010, 05:29 AM   #1
Begemot
Connoisseur
Begemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic something
 
Posts: 67
Karma: 18638
Join Date: Dec 2009
Device: Aluratek Libre, iRiver Story HD, Kindle DX Demo -> DXG
DJVU to ePub best results?

What methods have you used to convert DJVU to ePub?

The current method I am using is as follows:
Open DJVU with DjVuLibre DjView 4.4
Export as PDF
Then add PDF to Calibre Library and make ePub(this is basically running pdf2html utility)

Problem is the conversion to PDF step.
4MB DJVU explodes to 35MB PDF!
Then PDF to ePub goes down to 20-25MB, but the results are less than stellar.
Slightly smaller problem is that running pdf2html on a 35MB PDF takes about a half an hour, but I could live with that if the quality was good.

All of this was done on Ubuntu 9.10, but I would be interested in hearing about DJVU to ePub solutions on Windows or Mac as well.
Begemot is offline   Reply With Quote
Old 03-09-2010, 10:04 AM   #2
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,188
Karma: 727236
Join Date: Sep 2009
Device: PRS-505
DjVu doesn't contain text, it works on images, and that's your problem. It uses an image-compression technology that's highly optimised for text and allows far smaller file sizes than other formats that target more general image types.

You could export the images and OCR them (lots of work to catch the errors), or you could try slapping the images all together as-is (which is what you describe above, possibly with some down-ressing to make it look even worse). Go back to the author and get a file that contains text, because DjVu is useless for your purpose.
charleski is offline   Reply With Quote
Old 02-07-2011, 04:44 PM   #3
Websterny
Junior Member
Websterny began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2011
Device: Kobo Mini
I know nothing about this topic, but I have the same problem, and I note that DJVU files do seem to contain text - that is, the documents are searchable. They can be converted to non-searchable PDF with the print command (assuming you have a PDF print driver installed). But this significantly detracts from the utility of the files. And their size is a multiple of the original DJVU. There has got to be a better way.
Websterny is offline   Reply With Quote
Old 02-07-2011, 05:16 PM   #4
BobC
Addict
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 350
Karma: 245756
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, Various Android Apps
Quote:
Originally Posted by charleski View Post
DjVu doesn't contain text, it works on images, and that's your problem. It uses an image-compression technology that's highly optimised for text and allows far smaller file sizes than other formats that target more general image types.
DJVU's Can contain a hidden text layer (which is used in the search feature). This layer can be extracted and used as the basis for any other conversion.

For example most of the DJVU files on The Internet Archive (TIA) contain such a layer and I have used them as a basis for FB2 books.

Of course the files the OP is working on may not have such a layer as the original text may not have been OCRd and associated with the image layer.

BobC
BobC is offline   Reply With Quote
Old 02-07-2011, 09:59 PM   #5
pholy
Booklegger
pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.
 
pholy's Avatar
 
Posts: 1,800
Karma: 7999034
Join Date: Jun 2009
Location: Toronto, Ontario, Canada
Device: BeBook(1 & 2010), PEZ, PRS-505, Kobo BT, PRS-T1, Playbook, Kobo Touch
BobC - Can you tell us how to extract that hidden text layer? I haven't run across any DJVU books that I recall, but it would be good to know how to convert them when possible.
pholy is offline   Reply With Quote
Old 02-09-2011, 05:55 PM   #6
BobC
Addict
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 350
Karma: 245756
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, Various Android Apps
Quote:
Originally Posted by pholy View Post
BobC - Can you tell us how to extract that hidden text layer?
Either Highlight the text in the Image view and use <CTL>C to copy it to the clipboard or use the "Export Text" feature in WinDJView or some similar Viewer.

BobC
BobC is offline   Reply With Quote
Old 02-11-2011, 03:53 AM   #7
Begemot
Connoisseur
Begemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic somethingBegemot has a certain pleonastic something
 
Posts: 67
Karma: 18638
Join Date: Dec 2009
Device: Aluratek Libre, iRiver Story HD, Kindle DX Demo -> DXG
OP here, I resorted to using export Text in WinDJView.

This gets you a text dump with no formatting whatsoever. For my Libre it works well enough, but in general, this procedure is suboptimal.

Most DJVU files do seem to have a text layer (unless there is some on the fly OCR happening when you select an area on the page, which seems unlikely).

Thus, there must be a way(at least theoretically until someone writes a converter) to preserve the formatting in the text layer.
Begemot is offline   Reply With Quote
Old 02-11-2011, 06:29 AM   #8
bugmen0t
Banned
bugmen0t began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Feb 2011
Device: Kindle 3 3G+WiFi
I just print from any djvu reader into a pdf printer, like primopdf or other... only thing is: 240 pages book, 4.5Mb originally was transformed into 40Mb... maybe trimming the quality of the pdf down...
bugmen0t is offline   Reply With Quote
Old 02-12-2011, 11:04 AM   #9
BobC
Addict
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 350
Karma: 245756
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, Various Android Apps
Quote:
Originally Posted by Begemot View Post
OP here, I resorted to using export Text in WinDJView.

This gets you a text dump with no formatting whatsoever. For my Libre it works well enough, but in general, this procedure is suboptimal.

Most DJVU files do seem to have a text layer (unless there is some on the fly OCR happening when you select an area on the page, which seems unlikely).

Thus, there must be a way(at least theoretically until someone writes a converter) to preserve the formatting in the text layer.
I can assure you that the text layer is just that - text; it's purpose is simply to provide the search capability. There is no formatting and in many books there are OCR "mis-reads".

If you want to understand DJVUs then you need to get the spec and study it. I've done quite a bit of work with adding TOCs to existing DJVUs and have converted a couple of books to FB2 - this involves manually proof-reading and correcting the dumped text then formatting it to match the original (italics, bold etc).

Don't expect too much out of what is a by-product of the search function.

BobC
BobC is offline   Reply With Quote
Old 02-12-2011, 03:10 PM   #10
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,743
Karma: 5072190
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
There is a description of DJVU in the wiki.
DaleDe is offline   Reply With Quote
Reply

Tags
djvu epub conversion

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Qindle - Qt for Kindle (Now with PDF, DJVU, EPUB and CHM support) meem Kindle Developer's Corner 14 07-21-2011 05:49 PM
A real PDF to epub/djvu/rtf/html software?. DsOft ePub 35 01-02-2011 04:57 PM
Qindle .. Qt port with PDF, DJVU, EPUB and CHM support meem Kindle Developer's Corner 17 10-03-2010 07:19 AM
pdf to epub results in 'garbage'? wulfie Calibre 6 09-23-2010 09:01 AM
History Stillwell, Wendell. X-15 Research Results. Epub v1.0 12 July 2009 Nate the great ePub Books 1 08-26-2009 01:02 AM


All times are GMT -4. The time now is 10:04 AM.


MobileRead.com is a privately owned, operated and funded community.