View Single Post
Old 05-21-2012, 05:53 PM   #4
signum
Zealot
signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.
 
Posts: 119
Karma: 64428
Join Date: Aug 2011
Device: none
Quote:
Originally Posted by ralphiedee View Post
Well I figured either two ways. If the client can get me those files in word or Indesign then no matter what I have to re style the files for epub. I just finished copying and pasting text from one of the pdf pages I converted to ms. word client must understand that this will add a lot of time to the project as I have to take the background image, make it a template in iba or pages which ever the client chooses then style each paragraph to get close to the pdf style.

If you know of an easier way let me know.

RD
Here's what works well for me: do the "heavy lifting" part of the conversion with Calibre and the final polishing with Sigil. Your favorite HTML editor can be handy also.

As stated before, PDF is the WORST format to try to convert from. However, one of the amazing capabilities of the Calibre program is its ability to convert many PDFs. Usually, I have no particular need for the library management and synching features of Calibre, so I use just the batch convert program, called ebook-convert. It's not very well-known, but is part of the Calibre package. Use this to convert from PDF to HTML. This is not a useless step because EPUB is just HTML+CSS. You can check your progress from time to time by looking at the HTML with your browser. Once you have this polished to your satisfaction, convert the HTML to EPUB using Sigil directly on the HTML. (Don't panic! This can take a while.) Remember to "save as" an EPUB file. Polish the EPUB a little more with Sigil. Proofread carefully, comparing to the original PDF, and you're done!

The biggest problem with the PDF format is that it discards the entire document structure. Even paragraph boundaries are lost. Everything is just pixels turned on at an x-y location on some virtual paper. Even so, the Calibre people have done some amazing things to even recognize most paragraphs, although not always perfectly. Graphic elements are saved with links to them in the text. Likewise, styling touches such as italics and bold are handled nicely. Even headings and indents are usually recognized. However, if there is more than one column of text or a multi-column table, you've got some work to do because these are converted in the order found in the PDF, that is, linearized. For instance, a two-column table might have all of column 1 converted first into appropriate HTML, followed by all of column 2. There's no way to tell from the PDF that it's even displaying a table. Newspaper columns sometimes have the opposite result: the lines of the columns are interleaved.

One final observation. Proofreading takes much more time than any of the converting and editing steps mentioned above. At least you know you have something to work with that's close to what you want.
signum is offline   Reply With Quote