07-22-2011, 10:50 AM | #1 |
Grand Sorcerer
Posts: 11,305
Karma: 43993832
Join Date: Feb 2010
Location: Monroe Wisconsin
Device: K3, Kindle Paperwhite, Calibre, and Mobipocket for Pc (netbook)
|
Converting PDF to HTML
I was wondering what the proper settings are for converting pdf to HTML using Calibre. I found some sermons by Charles Spurgeon that I want to combine into a single ebook, but they are in the form of PDF files so I thought that if it is possible to convert them to html in Calibre that I could then load them into a Sigil file and then convert the combined file into a Kindle compatible format. The problem is that I'm not certain what settings to use for converting the pdf's in Calibre.
|
07-22-2011, 11:04 PM | #2 | |
US Navy, Retired
Posts: 9,863
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
|
|
Advert | |
|
07-22-2011, 11:23 PM | #3 |
Grand Sorcerer
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
There are no good settings for converting PDFs to anything. The best option is, find some PDF converting software (I use Acrobat Pro, but not everyone has that), which could be random websites that offer free conversion, and convert to something you can edit--HTML, or Word, or TXT--and then work with that, comparing to the PDF to fix the formatting.
If you have the files from the Spurgeongems page, those look like they should convert fairly well. You'd need RegEx to get rid of the headers & footers, but since they were created from .docx files, they should convert to a more flexible format without losing much in formatting, just with the added page breaks. *** I saved one out to a Word doc; it didn't even add page breaks. Formatting needs a bit of tweaking (especially the blue-text quotes), but looks very clean, easy to work with. If I were combining a bunch of those to a single ebook, I'd probably start by combining them in Acrobat and saving them out as a single file. But that's probably the wrong way to do it; I'm just more familiar with Acrobat than HTML editors. |
07-23-2011, 04:28 AM | #4 |
Grand Sorcerer
Posts: 11,305
Karma: 43993832
Join Date: Feb 2010
Location: Monroe Wisconsin
Device: K3, Kindle Paperwhite, Calibre, and Mobipocket for Pc (netbook)
|
Thanks for the tips Elfwreck. Yep, those are the files I have. I figured it would be a good project to do and I figured why buy the Amazon ebooks when I could make my own at no cost. True it will take a while but at least I know it will look good and have a working TOC unlike with some of Amazon's books. I got a copy of their Biography of a Grizzly last evening and it looks terrible. No illustrations and no working TOC. Looks like someone just slapped it together in 5 minutes.
|
07-23-2011, 08:36 AM | #5 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Here are a couple of ways you can go about doing it. Each method really depends on how comfortable you are with the file type. It also depends on how much work you want need to do with formatting. #2 for example will need the most formatting added by you but this might be easier than trying to fix formatting errors due to conversion from PDF.
1) Use pdfmanipulate (part of calibre but it's command line only) or a similar tool to merge all of the individual PDFs into one file. Convert this single PDF into a any format you're comfortable working with. HTMLZ, EPUB, TXT, ect. Then use Sigil to make your changes. 2) Convert each individual PDF to TXT either using calibre or Acrobate. Combine the files by copy and paste. In this case I would recommend using Textile to do the bulk of the formatting. Convert to EPUB and make any minor formatting changes (if necessary) using Sigil. 3) Convert each individual PDF to HTML. Then create an index.html file that has a link to each individual HTML file. A toc essentially and import the index.html into calibre. calibre will read each link (<a> tags) in the index file and gather all of the individual HTML files putting it all into a ZIP archive. You can then convert to EPUB or HTMLZ to get a combined file suitable for Sigil. |
Advert | |
|
07-23-2011, 10:02 AM | #6 | |
Grand Sorcerer
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Converting PDF to HTML | Nirf | Calibre | 7 | 06-24-2010 08:51 AM |
Converting Merged HTML file to Epub/PDF Not Working | MV64 | Calibre | 1 | 06-07-2010 07:48 PM |
Converting multiple HTML files into a single hyperlinked PDF? | Jürgen Hubert | Reading and Management | 6 | 01-11-2010 07:44 AM |
Converting from html | mysweety | Calibre | 16 | 09-23-2009 08:20 AM |
Converting HTML to Mobi? | Sonist | Calibre | 5 | 02-10-2009 01:23 PM |