![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
![]()
Hello,
I'd like to concat a bunch of web pages into a single EPUB to read on my e-reader. I tried pandoc, but it's very slow and pretty much freezes my computer, so I tried Calibre which at least kept my computer responsive: Code:
copy /b *.html full.html pandoc -o full.epub full.html "C:\Program Files\Calibre2\ebook-convert.exe" full.html full.epub "-h" returns a bewildering number of otptions. Alternatively, what about first converting HTML files into simpler layouts (Markdown?) before joining them into a single file, and calling an HTML to EPUB converter? Thank you. Last edited by Shohreh; 05-16-2020 at 06:48 AM. |
![]() |
![]() |
![]() |
#2 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,355
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
I think the answer would depend on how comfortable you are working with the raw HTML code...
If you are OK with it, then Sigil (and I'm pretty sure Calibre) has a 'merge' feature that will remove the separate headers/footers and leave them combined into a single file. That process works well if the css is similar - or you make the css similar before merging. If you are saying "what's raw html code" then I would suggest leaving the pages separate. You can still bundle them into an ePub - there is no requirement to have everything as a single page. It is actually more preferred to keep the files in ePubs separated logically, such as chapters. Both Sigil and Calibre editor can perform this function admirably. When you read the ePub, with the pages as separate files, it just requires a swipe/tap when transitioning from one file to the next. |
![]() |
![]() |
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks. I'm used to working with HTML with Python.
I'm looking for a way to automate the process, and end up with pages that are as clean as possible on e-readers. Can an EPUB contain multiple HTML pages? -- Edit: Yup. https://www.reddit.com/r/Calibre/com...a_single_epub/ https://manual.calibre-ebook.com/faq...specific-order Last edited by Shohreh; 05-16-2020 at 08:35 AM. |
![]() |
![]() |
![]() |
#4 |
Klak
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
Before starting to work on epub I prefer to clean HTML to basic tags without any styling.
Step 1 is to open epub in Calibre editor, delete all css files, go to Remove unused css rules tool Step 2 is Custom cleaner plus in Sigil for the rest of bookmarks, ids... From Sigil you can export (x)html to editor of your choice or continue working in Sigil editor. |
![]() |
![]() |
![]() |
#5 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks. I'll look into automating the process with a Python script.
|
![]() |
![]() |
![]() |
#6 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,730
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
|
|
![]() |
![]() |
![]() |
#7 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
|
Quote:
Code:
<i> some italics></i> Code:
<span class="it"> some italics </span> Code:
.it { font-style: italic; } by using a brute-force technique as suggested all italics would be lost. There may also be "headers" that are just paragraphs styled to centred, bold and larger than the main text. Look before you carry out drastic surgery. BobC |
|
![]() |
![]() |
![]() |
#8 |
Klak
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
You are right about italics. I don't care about the headers, though.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Creating epub/kepub books (docx→epub/kepub via MS Word→Calibre) | SJC-Caron | ePub | 18 | 04-21-2016 11:10 AM |
Clean HTML from word For EPub | holdit | ePub | 10 | 10-21-2013 07:00 AM |
Clean HTML from word | holdit | Workshop | 6 | 10-09-2013 05:20 PM |
How to Clean/Strip HTML from epub file? | Jimbo724 | General Discussions | 9 | 12-12-2012 11:22 AM |
Best way to get clean HTML | JSWolf | Kindle Formats | 18 | 04-02-2009 11:00 AM |