![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
![]()
Hello,
Like everyone here, I tried using Calibre to convert a PDF to ePUB… with mixed results. It's mostly OK, but there are small issues every few pages. "Heuristic Processing" and regexes didn't work to remove headers/footers and other unwanted parts. I also tried a few online sites… with no better result. Is Calibre the only open-source solution? Is there any tool to turn PDF → HTML, to see if I could clean up the HTML myself before turning it into ePUB? Thank you. -- Edit: I just read that an ePUB is actually a zip file with HTML, CSS, PNG etc. Renaming the extension from .epub to .zip is all that is required to check the files. Last edited by Shohreh; 04-19-2020 at 11:11 AM. |
![]() |
![]() |
![]() |
#2 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
You could also use calibre's builtin editor or Sigil to check and edit the files. Very likely easier than extracting files from the epub to edit them. I use both since there are some tasks for which one is better/easier to use than the other.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks for the tips.
Apparently, Sigil is not available for Windows as binaries. Besides those below, is there a good, recent article about how to use Calibre to generate an EPUB with the least amount of errors, and, if need be, use its editor to clean up and finalize an EPUB? For instance, it seems not to like "inserts", tables, and headers/footers. |
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Where?
https://sigil-ebook.com/get/ → https://github.com/Sigil-Ebook/Sigil/releases Another couple of questions while I experiment: 1. In the Search & replace, although the regexes work fine when testing with the wizard, the strings still end up in the EPUB. Any idea why? 2. Is there a way to tell Calibre to ignore such and such pages in the input file, such as those that contain the ToC or tables, since I know they'll turn into garbage anyway? I might try and replace the tables later in the EPUB as PNG instead. |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Browse down to the 1.20 release, click on the arrow by Assets. See attached image. One question out of curiosity is in your regex: why are you looking for a period instead of an underscore? Your search string contains simmons\.qxd while you are searching for c12_simmons_qxd. Last edited by DNSB; 04-19-2020 at 08:59 PM. |
|
![]() |
![]() |
![]() |
#7 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks, I didn't know the files were hidden in the Assets section.
You're right for the regex. --- Edit: It says "Note that currently Sigil only provides binaries that work for Windows x86 and x64 and will only run on Vista or newer releases.", but only Sigil-1.2.0-Windows-x64-Setup.exe is available; The last 32 bit release is 0.9.14 (Jun 11, 2019). Last edited by Shohreh; 04-19-2020 at 10:05 PM. |
![]() |
![]() |
![]() |
#8 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
This has been discussed in the Sigil forum and is probably best continued there. I will admit the last 32 bit machine I ran was close to a decade ago so the disappearance of the x86 version is not a big deal to me. |
|
![]() |
![]() |
![]() |
#9 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
My e-reader reads PDFs relatively well; It just doesn't support changing the font size, making it a bit harder to read than EPUB. Maybe it's a limitation of PDF, not this reader particularly.
So it looks like, because of the nature of PDF, Calibre and other tools simply cannot handle more complex parts such as tables or insets. Incidently, considering how many e-reader users need to access PDFs, I might not be the only person curious to know more about PDF without delving too deeply either, and understand how it works (Postscript, etc.) under the hood and why it's such a pain to turn into EPUB. Are there other sites/books you would recommend?
|
![]() |
![]() |
![]() |
#10 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Is Caliber using pdftohtml before doctoring the HTML and turning it into an EPUB file?
I notice that…
Why doesn't Caliber use mutool instead? Do you know of a good tool to turn the HTML file from mutool into EPUB? |
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
that is non-reflowable HTML, it is just as useless as the original PDF file. And calibre uses poppler, which took over pdftohtml from xpdf over a decade ago.
|
![]() |
![]() |
![]() |
#12 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Good to know. The HTML files in the EPUB still mention pdtohtml, hence the confusion:
Code:
<meta name="generator" content="pdftohtml 0.36"/>" Edit: Is there a good article somewhere that goes through the options, and generally speaking, explains how to best use Calibre to convert a PDF into EPUB? Last edited by Shohreh; 04-21-2020 at 08:57 AM. |
![]() |
![]() |
![]() |
#13 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
![]() |
![]() |
![]() |
#14 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thank you.
Turns out my e-reader isn't bad at displaying PDF after changing a couple of settings, including a… "Reflow text" option. Since PDFs are so common, when buying an e-reader, it's a good idea to check how well it supports PDFs without bothering trying to turn them into EPUBs. Out of curiosity, I'll see if I can find infos about what all the settings in Caliber's Convert dialog mean. Am I correct in understanding that Caliber first lets poppler convert the PDF into HTML + PNG/JPG, and then goes to work on the HTML based on the settings in the Conversion dialog? Last edited by Shohreh; 04-22-2020 at 07:02 AM. |
![]() |
![]() |
![]() |
#15 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Plato, alternative e-book reader (ePUB, PDF, DJVU) for Remarkable Tablet | darvin88 | More E-Book Readers | 6 | 07-20-2018 05:16 PM |
iPad Alternative to GoodReader? (PDF on iOS) | lo-fi | Apple Devices | 12 | 04-19-2017 01:17 AM |
epub → pdf conversion: remove a section | dma_k | Conversion | 8 | 08-31-2016 05:40 PM |
Creating epub/kepub books (docx→epub/kepub via MS Word→Calibre) | SJC-Caron | ePub | 18 | 04-21-2016 11:10 AM |
Calibre and pdf to epub | JCSullivan | Calibre | 3 | 05-26-2010 09:46 PM |