Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 03-19-2013, 02:32 AM   #1
ghudod
Connoisseur
ghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toys
 
Posts: 56
Karma: 5502
Join Date: Oct 2012
Device: none
Request/Idea: Approach to converting complex documents like PDFs

I am often in need of converting complex documents to EPUB. They are so heavily formatted that no amount of automation is going to give acceptable results. The only acceptable output in these cases is to convert each page into an image. I have been using Dongsoft PDF to EPUB Converter (yes, I said Dongsoft) because they will do the image conversion, retain the table of contents of the PDF, and convert to a fixed layout EPUB in one process.

I can do relatively the same thing in Calibre but I need to convert the pages to images, zip the folder, rename to a CBZ, convert to an EPUB, and build the table of contents by hand (although I'm open to anyone who knows a simpler way).

OK, so here is the idea:

Converting to an image is the most reliable method of retaining the formatting of the source document. The only problem with this (other than the file size) is that you can no longer search or highlight. One way around this would be to overlay a layer of transparent text. With a fixed layout EPUB you could replicate the layout of the original PDF fairly precisely. Do you think a similar feature could be added to Calibre?

If it were possible to get pixel perfect overlay of text, it would also be possible to make the text of the PDF transparent before capturing the image of each page and just overlaying normal, opaque, appropriately colored text in the EPUB (but I realize even with scaling font options, this might be unlikely).

Anyway, this would solve a lot of problems converting documents with advanced formatting (including adding some additional options for comics if a good OCR were applied first).

Is this something that could be pursued?
ghudod is offline   Reply With Quote
Old 03-19-2013, 04:02 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,295
Karma: 27111240
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You're welcome to pursue it, but I dont see the point. How is viewing a collection of page images superior to viewing the original PDF in the first place? Most ebook readers have pdf viewers, turn off the pdf reflow in those viewers and the pdf will then be viewd just like an image page, but with searchable text.
kovidgoyal is online now   Reply With Quote
Advert
Old 03-19-2013, 03:22 PM   #3
ghudod
Connoisseur
ghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toysghudod shares his or her toys
 
Posts: 56
Karma: 5502
Join Date: Oct 2012
Device: none
Quote:
You're welcome to pursue it, but I dont see the point
Sorry, it was a request because I'm not a programmer. I do not have the experience to code it myself.

As for the point:
- *not* all readers support PDFs and those that do often treat the files differently in terms of both feature set and organizational scheme
- I wish to have consistency in my digital library
- EPUB is an open format, PDF is not. I would like to ween myself (and others) off of them out of principle.
- Other programs also treat EPUBs differently than PDFs. Evernote for example parses PDFs as searchable (creating a lot of unrelated matches in search results) so I would like to store them as EPUBs
- Sometimes, I wish to distribute a file and have it opened in an eBook reader rather than simply the default PDF reader. From a user experience point of view, eBook readers tend to be more immersive and engaging. Loading the same file in a default PDF reader (which is what everyone naturally does when the file is send via email) rarely has the same experience.
- OCR software typically exports to PDF with a transparent layer of text. They do not support EPUB. I would like to try making my comic collection and a collection of menus searchable.
- Finally, I feel like there should be some option out there that does a good job converting a PDF to EPUB for a million reasons that pop up in life that aren't listed above. Currently, an accurate and searchable option does not exist. Judging from the quantity of posts regarding the topic and the staggering number of programs that attempt to do the conversion, this is a significant need. It simply hasn't yet been done in a way that is even remotely accurate to the original form. With fixed layout EPUBs, the possibility exists. I thought Calibre would be a good fit for this feature given its strength in many people's eyes is converting between document formats.

Last edited by ghudod; 03-19-2013 at 06:36 PM.
ghudod is offline   Reply With Quote
Old 03-20-2013, 12:29 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,295
Karma: 27111240
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Note PDF is perfectly open format, a perfectly *bad* open format, but still an open format.

What you're asking to do is called a non-reflowable conversion. Creating a non-reflowable epub version of a pdf is pointless, it doesn't get you the primary advantage of epub - resolution independence. As such I am not personally interested in this, but feel free to pursue it yourself. Note there are plenty of tools out there that generate pdf->non reflowable html, or even non reflowable pdf optimized for small screen, for example k2pdfopt, pdftohtml, etc.
kovidgoyal is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
best converstion software for complex PDFs to AZW? Kyris Workshop 3 11-04-2012 03:03 AM
Ebook reader for complex unscanned PDFs Ubertrout Which one should I buy? 13 07-05-2010 07:10 AM
Converting complex MS-word documents Eclipse General Discussions 15 06-22-2010 06:59 PM
[Mobi output] convert complex documents deadland Calibre 2 03-02-2010 01:47 PM
converting long, somewhat complex docs to eReader Richard Maseles Other formats 4 01-07-2009 05:28 PM


All times are GMT -4. The time now is 02:25 AM.


MobileRead.com is a privately owned, operated and funded community.