![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 378
Karma: 1624276
Join Date: Aug 2010
Location: South, South Texas
Device: Pocketbook 620
|
Noob Question: HTML --> EPUB Help???
I'm new to all this and I'd like to ask for some advice converting HTML to EPUB.
![]() Ok, so I am looking at 30+ volumes spread across hundreds of web pages that I'd like to pull that text into a format that I can carry with me on my brand new pocketbook. I'm guessing that I want to convert those webpages to EPUB somehow but I'm absolutely clueless as what the most efficient way of doing this is. ![]() Any ideas or links that you think might help bring me up to speed are welcome! Just keep in mind that I'm a noob. Thanks! ![]() |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
I've never used it myself but there is a calibre commandline option called web2disk. This is a link to the relevant part of the calibre manual.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
If you zip your html Calibre can handle it as an input. Multiple files in the zip are good. If you change the zip extension to .epub Sigil can handle it. Calibre is a converter and eBook Manager while Sigil is an ePUB editor. Both are free downloads and have their own forums on this site.
Dale |
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Dec 2010
Device: Kindle DXi
|
I'm trying to do the same thing, but to mobi format, so I figured I would post here with my results so far in case they help anyone else. Also, I have a few questions at the bottom of this post if anyone can help me.
Here is the URL of the book I am trying to convert (one of several dozen that use the same format): http://www.history.army.mil/books/ww...3-44/index.htm Calibre was a great suggestion and works as advertised. I used wget to grab the book: wget -k -r http://www.history.army.mil/books/ww...3-44/index.htm Some quick background on wget -- it works the same as web2disk (mentioned above) and is available on every platform. It's installed by default on Linux, and can be installed on Mac using Mac Ports (ie - `sudo port install wget` from the command line) or is part of the Cygwin package in Windows. Note that the URL I am downloading is very well contained and doesn't point to any other websites or even to other parts of the website that I am downloading it from. If your book points off to other parts of the website (for example "home") or to other websites, you need to fine-tune the parameters to wget so that you don't end up downloading the whole Internet by accident. The relevant part of the wget manual is here: http://www.gnu.org/software/wget/man...rieval-Options After downloading, I found the files that had been downloaded and zipped up the whole directory (named "sp1943-44" in this instance). From there, I dropped the zip into Calibre, right-clicked on it and selected convert without changing any options. Seems to work good -- it looks like it got all the formatting right, images are in the right places and it automatically generated a table of contents. Note that I had to convert a second time in order to remove the limit on the number of links that would be included in the table of contents. One other option for downloading this book: you can use the free version Adobe Acrobat to download an entire website in PDF format. Simply select File->Create PDF->From Web Page... and drop in the URL. The formatting works great, but it suffers from all the usual PDF problems (you can adjust the flow of the text to fit on your device's screen, and in the case of my Kindle DX you can't highlight or add notes). Running the PDF through Amazon or Calibre conversion only creates an ugly mess. All that said, I have two questions: 1) For some reason, Calibre took a random image and made it the first page of the book. Any ideas on how to prevent that and / or ideas on easy ways to delete the first page of the book? 2) Is there some way to use the Calibre conversion from the command-line? There are a few dozen websites I would like to convert, but it's a bit inconvenient to use the GUI on Calibre when I could just write a quick script. Thanks, Adam |
![]() |
![]() |
![]() |
#5 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 378
Karma: 1624276
Join Date: Aug 2010
Location: South, South Texas
Device: Pocketbook 620
|
Intresting. If I could turn just the text into a .pdf, my pocketbook could reflow everything. Maybe I'll test that.
Has anyone tried pulling the text into MSWord or OpenOffice Writer and then converting them? Seems like a hassle. I've discovered Sigil, ecub and jutoh. May play around with these tomorrow. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Noob question | Phoul | Sony Reader | 11 | 11-26-2010 12:38 PM |
Format Question html to ePub | Fabe | Calibre | 3 | 04-21-2010 05:08 PM |
Mobi vs. ePub... kind of a noob question sorry.. | jessTay | ePub | 8 | 05-21-2009 11:16 AM |
Hi! I'm a NOOB with a question | Stacey34 | Sony Reader | 28 | 02-10-2009 08:13 PM |
Noob question | wgarider | Calibre | 2 | 02-07-2009 08:56 AM |