View Single Post
Old 11-13-2009, 12:56 PM   #3
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by neilmarr View Post
I need to convert almost 100 PDF versions of my own wee house's paperback titles to ePub. Our own technical side is up to the gills in other work and we can't afford to hire in extra help ... so the job's down to me. And I'm a bloody technodunce. Pretty well everything in this thread, for instance, may be plain English to my MR pals, but it's way over my head.

So my question is simply this: Is there a simple 'Sigil for Dummies' turorial of any sort that will explain, step-by-step, how I first convert my PDFs to ePub using the Sigil programme I've installed and then how to edit the result ready to go?
You have a tough road ahead of you.

First of, your source documents are PDF. That's the worst possible format to convert to epub. Epub is a reflowable format, meaning the paragraphs are "marked up": the text that needs to be displayed as a single paragraph is marked as such like this:

Quote:
<p>This is all one paragraph. This is all one paragraph. This is all one paragraph. This is all one paragraph. This is all one paragraph</p>
The <p> "tags" mark the boundaries of a paragraph. This is all XHTML. Other elements of the document are marked up with different tags. This is done so that the Reading System (a computer application or a hardware device) can adjust how the text displays. Hand-held devices have smaller screens and a paragraph appears "longer", whereas a computer monitor has a bigger screen so the paragraph takes up fewer lines. In essence, the display of the text adjusts to the size of the screen. So a book could have 300 pages when displayed on a computer screen, and 800 pages when displayed on the Sony Reader. It's the same book though, just displayed in different page/screen sizes, automatically.

PDF is a problem. It is not a reflowable document type. The "page" is fixed when the document is created. You cannot change it afterwards. Every character is effectively "burned in" on the virtual page. There is no semantic information about paragraphs, tables, images etc. Any converter (like for instance calibre) has to make guesses about the structure of your document, and these guesses often don't work. So converting PDF to any reflowable format is extremely difficult and error-prone.

I would suggest you get access to the original source documents from which the PDF versions were created. The original documents were surely reflowable and conversion to epub from those formats would be much more accurate.

As kjk noted, you will need calibre or some other converter to convert your original documents to a format Sigil can import, like (X)HTML or epub. Calibre can also do this for PDF books, but the results are usually not pretty. Not because of calibre, but because of the PDF format itself.

So in a nutshell, you need to convert your documents to XHTML or epub and then open these files in Sigil for editing. That's it.

Last edited by Valloric; 11-14-2009 at 08:35 AM. Reason: typo
Valloric is offline   Reply With Quote