Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-22-2011, 10:50 AM   #1
crich70
Grand Sorcerer
crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.
 
crich70's Avatar
 
Posts: 11,305
Karma: 43993832
Join Date: Feb 2010
Location: Monroe Wisconsin
Device: K3, Kindle Paperwhite, Calibre, and Mobipocket for Pc (netbook)
Converting PDF to HTML

I was wondering what the proper settings are for converting pdf to HTML using Calibre. I found some sermons by Charles Spurgeon that I want to combine into a single ebook, but they are in the form of PDF files so I thought that if it is possible to convert them to html in Calibre that I could then load them into a Sigil file and then convert the combined file into a Kindle compatible format. The problem is that I'm not certain what settings to use for converting the pdf's in Calibre.
crich70 is offline   Reply With Quote
Old 07-22-2011, 11:04 PM   #2
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,863
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by crich70 View Post
I was wondering what the proper settings are for converting pdf to HTML using Calibre. I found some sermons by Charles Spurgeon that I want to combine into a single ebook, but they are in the form of PDF files so I thought that if it is possible to convert them to html in Calibre that I could then load them into a Sigil file and then convert the combined file into a Kindle compatible format. The problem is that I'm not certain what settings to use for converting the pdf's in Calibre.
As a starting point review this sticky post - Read this before Posting PDF Questions.
DoctorOhh is offline   Reply With Quote
Advert
Old 07-22-2011, 11:23 PM   #3
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
There are no good settings for converting PDFs to anything. The best option is, find some PDF converting software (I use Acrobat Pro, but not everyone has that), which could be random websites that offer free conversion, and convert to something you can edit--HTML, or Word, or TXT--and then work with that, comparing to the PDF to fix the formatting.

If you have the files from the Spurgeongems page, those look like they should convert fairly well. You'd need RegEx to get rid of the headers & footers, but since they were created from .docx files, they should convert to a more flexible format without losing much in formatting, just with the added page breaks.

***
I saved one out to a Word doc; it didn't even add page breaks. Formatting needs a bit of tweaking (especially the blue-text quotes), but looks very clean, easy to work with.

If I were combining a bunch of those to a single ebook, I'd probably start by combining them in Acrobat and saving them out as a single file. But that's probably the wrong way to do it; I'm just more familiar with Acrobat than HTML editors.
Elfwreck is offline   Reply With Quote
Old 07-23-2011, 04:28 AM   #4
crich70
Grand Sorcerer
crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.crich70 ought to be getting tired of karma fortunes by now.
 
crich70's Avatar
 
Posts: 11,305
Karma: 43993832
Join Date: Feb 2010
Location: Monroe Wisconsin
Device: K3, Kindle Paperwhite, Calibre, and Mobipocket for Pc (netbook)
Thanks for the tips Elfwreck. Yep, those are the files I have. I figured it would be a good project to do and I figured why buy the Amazon ebooks when I could make my own at no cost. True it will take a while but at least I know it will look good and have a working TOC unlike with some of Amazon's books. I got a copy of their Biography of a Grizzly last evening and it looks terrible. No illustrations and no working TOC. Looks like someone just slapped it together in 5 minutes.
crich70 is offline   Reply With Quote
Old 07-23-2011, 08:36 AM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Here are a couple of ways you can go about doing it. Each method really depends on how comfortable you are with the file type. It also depends on how much work you want need to do with formatting. #2 for example will need the most formatting added by you but this might be easier than trying to fix formatting errors due to conversion from PDF.

1) Use pdfmanipulate (part of calibre but it's command line only) or a similar tool to merge all of the individual PDFs into one file. Convert this single PDF into a any format you're comfortable working with. HTMLZ, EPUB, TXT, ect. Then use Sigil to make your changes.

2) Convert each individual PDF to TXT either using calibre or Acrobate. Combine the files by copy and paste. In this case I would recommend using Textile to do the bulk of the formatting. Convert to EPUB and make any minor formatting changes (if necessary) using Sigil.

3) Convert each individual PDF to HTML. Then create an index.html file that has a link to each individual HTML file. A toc essentially and import the index.html into calibre. calibre will read each link (<a> tags) in the index file and gather all of the individual HTML files putting it all into a ZIP archive. You can then convert to EPUB or HTMLZ to get a combined file suitable for Sigil.
user_none is offline   Reply With Quote
Advert
Old 07-23-2011, 10:02 AM   #6
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by user_none View Post
2) Convert each individual PDF to TXT either using calibre or Acrobate. Combine the files by copy and paste. In this case I would recommend using Textile to do the bulk of the formatting. Convert to EPUB and make any minor formatting changes (if necessary) using Sigil.
Not recommended in this specific case; there's a lot of formatting--bold text, italic text in spots, and indented/special colored quotes. And all that could be added back in, but with these particular files, the originals are clean and consistent. Converting to something that attempts to maintain the formatting, and fixing where it doesn't quite manage to do so correctly, is going to be a lot faster than stripping out all the formatting and starting from scratch.
Elfwreck is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting PDF to HTML Nirf Calibre 7 06-24-2010 08:51 AM
Converting Merged HTML file to Epub/PDF Not Working MV64 Calibre 1 06-07-2010 07:48 PM
Converting multiple HTML files into a single hyperlinked PDF? Jürgen Hubert Reading and Management 6 01-11-2010 07:44 AM
Converting from html mysweety Calibre 16 09-23-2009 08:20 AM
Converting HTML to Mobi? Sonist Calibre 5 02-10-2009 01:23 PM


All times are GMT -4. The time now is 07:25 AM.


MobileRead.com is a privately owned, operated and funded community.