|  07-22-2011, 10:50 AM | #1 | 
| Grand Sorcerer            Posts: 11,310 Karma: 43993832 Join Date: Feb 2010 Location: Monroe Wisconsin Device: K3, Kindle Paperwhite, Calibre, and Mobipocket for  Pc (netbook) | 
				
				Converting PDF to HTML
			 
			
			I was wondering what the proper settings are for converting pdf to HTML using Calibre.  I found some sermons by Charles Spurgeon that I want to combine into a single ebook, but they are in the form of PDF files so I thought that if it is possible to convert them to html in Calibre that I could then load them into a Sigil file and then convert the combined file into a Kindle compatible format.  The problem is that I'm not certain what settings to use for converting the pdf's in Calibre.
		 | 
|   |   | 
|  07-22-2011, 11:04 PM | #2 | |
| US Navy, Retired            Posts: 9,897 Karma: 13806776 Join Date: Feb 2009 Location: North Carolina Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen | Quote: 
 | |
|   |   | 
| Advert | |
|  | 
|  07-22-2011, 11:23 PM | #3 | 
| Grand Sorcerer            Posts: 5,187 Karma: 25133758 Join Date: Nov 2008 Location: SF Bay Area, California, USA Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié) | 
			
			There are no good settings for converting PDFs to anything. The best option is, find some PDF converting software (I use Acrobat Pro, but not everyone has that), which could be random websites that offer free conversion, and convert to something you can edit--HTML, or Word, or TXT--and then work with that, comparing to the PDF to fix the formatting. If you have the files from the Spurgeongems page, those look like they should convert fairly well. You'd need RegEx to get rid of the headers & footers, but since they were created from .docx files, they should convert to a more flexible format without losing much in formatting, just with the added page breaks. *** I saved one out to a Word doc; it didn't even add page breaks. Formatting needs a bit of tweaking (especially the blue-text quotes), but looks very clean, easy to work with. If I were combining a bunch of those to a single ebook, I'd probably start by combining them in Acrobat and saving them out as a single file. But that's probably the wrong way to do it; I'm just more familiar with Acrobat than HTML editors. | 
|   |   | 
|  07-23-2011, 04:28 AM | #4 | 
| Grand Sorcerer            Posts: 11,310 Karma: 43993832 Join Date: Feb 2010 Location: Monroe Wisconsin Device: K3, Kindle Paperwhite, Calibre, and Mobipocket for  Pc (netbook) | 
			
			Thanks for the tips Elfwreck.  Yep, those are the files I have.  I figured it would be a good project to do and I figured why buy the Amazon ebooks when I could make my own at no cost.  True it will take a while but at least I know it will look good and have a working TOC unlike with some of Amazon's books.  I got a copy of their Biography of a Grizzly last evening and it looks terrible. No illustrations and no working TOC. Looks like someone just slapped it together in 5 minutes.
		 | 
|   |   | 
|  07-23-2011, 08:36 AM | #5 | 
| Sigil & calibre developer            Posts: 2,487 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR | 
			
			Here are a couple of ways you can go about doing it. Each method really depends on how comfortable you are with the file type. It also depends on how much work you want need to do with formatting. #2 for example will need the most formatting added by you but this might be easier than trying to fix formatting errors due to conversion from PDF. 1) Use pdfmanipulate (part of calibre but it's command line only) or a similar tool to merge all of the individual PDFs into one file. Convert this single PDF into a any format you're comfortable working with. HTMLZ, EPUB, TXT, ect. Then use Sigil to make your changes. 2) Convert each individual PDF to TXT either using calibre or Acrobate. Combine the files by copy and paste. In this case I would recommend using Textile to do the bulk of the formatting. Convert to EPUB and make any minor formatting changes (if necessary) using Sigil. 3) Convert each individual PDF to HTML. Then create an index.html file that has a link to each individual HTML file. A toc essentially and import the index.html into calibre. calibre will read each link (<a> tags) in the index file and gather all of the individual HTML files putting it all into a ZIP archive. You can then convert to EPUB or HTMLZ to get a combined file suitable for Sigil. | 
|   |   | 
| Advert | |
|  | 
|  07-23-2011, 10:02 AM | #6 | |
| Grand Sorcerer            Posts: 5,187 Karma: 25133758 Join Date: Nov 2008 Location: SF Bay Area, California, USA Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié) | Quote: 
 | |
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Converting PDF to HTML | Nirf | Calibre | 7 | 06-24-2010 08:51 AM | 
| Converting Merged HTML file to Epub/PDF Not Working | MV64 | Calibre | 1 | 06-07-2010 07:48 PM | 
| Converting multiple HTML files into a single hyperlinked PDF? | Jürgen Hubert | Reading and Management | 6 | 01-11-2010 07:44 AM | 
| Converting from html | mysweety | Calibre | 16 | 09-23-2009 08:20 AM | 
| Converting HTML to Mobi? | Sonist | Calibre | 5 | 02-10-2009 01:23 PM |