Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 04-19-2020, 10:34 AM   #1
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Question [PDF → ePUB] Alternative to Calibre?

Hello,

Like everyone here, I tried using Calibre to convert a PDF to ePUB… with mixed results. It's mostly OK, but there are small issues every few pages.

"Heuristic Processing" and regexes didn't work to remove headers/footers and other unwanted parts.

I also tried a few online sites… with no better result.

Is Calibre the only open-source solution?

Is there any tool to turn PDF → HTML, to see if I could clean up the HTML myself before turning it into ePUB?

Thank you.

--
Edit: I just read that an ePUB is actually a zip file with HTML, CSS, PNG etc. Renaming the extension from .epub to .zip is all that is required to check the files.

Last edited by Shohreh; 04-19-2020 at 11:11 AM.
Shohreh is offline   Reply With Quote
Old 04-19-2020, 01:29 PM   #2
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Shohreh View Post
Edit: I just read that an ePUB is actually a zip file with HTML, CSS, PNG etc. Renaming the extension from .epub to .zip is all that is required to check the files.
You could also use calibre's builtin editor or Sigil to check and edit the files. Very likely easier than extracting files from the epub to edit them. I use both since there are some tasks for which one is better/easier to use than the other.
DNSB is offline   Reply With Quote
Advert
Old 04-19-2020, 06:34 PM   #3
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Thanks for the tips.

Apparently, Sigil is not available for Windows as binaries.

Besides those below, is there a good, recent article about how to use Calibre to generate an EPUB with the least amount of errors, and, if need be, use its editor to clean up and finalize an EPUB? For instance, it seems not to like "inserts", tables, and headers/footers.
Attached Thumbnails
Click image for larger version

Name:	5DA98A5F-FF73-4B39-BA16-D04A46016EE4.png
Views:	232
Size:	42.2 KB
ID:	178538  
Shohreh is offline   Reply With Quote
Old 04-19-2020, 06:54 PM   #4
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Apparently, Sigil is not available for Windows as binaries.
Why do you say this? I see both 64-bit and 32-bit versions available for Windows
itimpi is offline   Reply With Quote
Old 04-19-2020, 07:58 PM   #5
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Where?

https://sigil-ebook.com/get/https://github.com/Sigil-Ebook/Sigil/releases

Another couple of questions while I experiment:
1. In the Search & replace, although the regexes work fine when testing with the wizard, the strings still end up in the EPUB. Any idea why?

2. Is there a way to tell Calibre to ignore such and such pages in the input file, such as those that contain the ToC or tables, since I know they'll turn into garbage anyway? I might try and replace the tables later in the EPUB as PNG instead.
Attached Thumbnails
Click image for larger version

Name:	FA718F5A-2BD7-47E3-8E74-4D0C08AEAAA1.png
Views:	228
Size:	94.6 KB
ID:	178541  
Shohreh is offline   Reply With Quote
Advert
Old 04-19-2020, 08:46 PM   #6
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Shohreh View Post
Disregard the Sigil 1.2.1 source only release --as it says, for an issue with Python and Linux.

Browse down to the 1.20 release, click on the arrow by Assets. See attached image.

One question out of curiosity is in your regex: why are you looking for a period instead of an underscore? Your search string contains simmons\.qxd while you are searching for c12_simmons_qxd.
Attached Thumbnails
Click image for larger version

Name:	Sigil_assets.png
Views:	225
Size:	42.0 KB
ID:	178543  

Last edited by DNSB; 04-19-2020 at 08:59 PM.
DNSB is offline   Reply With Quote
Old 04-19-2020, 09:52 PM   #7
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Thanks, I didn't know the files were hidden in the Assets section.

You're right for the regex.

---
Edit: It says "Note that currently Sigil only provides binaries that work for Windows x86 and x64 and will only run on Vista or newer releases.", but only Sigil-1.2.0-Windows-x64-Setup.exe is available; The last 32 bit release is 0.9.14 (Jun 11, 2019).

Last edited by Shohreh; 04-19-2020 at 10:05 PM.
Shohreh is offline   Reply With Quote
Old 04-19-2020, 11:26 PM   #8
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Shohreh View Post
Thanks, I didn't know the files were hidden in the Assets section.

You're right for the regex.

---
Edit: It says "Note that currently Sigil only provides binaries that work for Windows x86 and x64 and will only run on Vista or newer releases.", but only Sigil-1.2.0-Windows-x64-Setup.exe is available; The last 32 bit release is 0.9.14 (Jun 11, 2019).
There are issues with some of the binaries used by Sigil not being supplied in 32 bit versions. This requires that not only compiling Sigil but also custom compiling those binaries.

This has been discussed in the Sigil forum and is probably best continued there.

I will admit the last 32 bit machine I ran was close to a decade ago so the disappearance of the x86 version is not a big deal to me.
DNSB is offline   Reply With Quote
Old 04-20-2020, 08:49 AM   #9
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
My e-reader reads PDFs relatively well; It just doesn't support changing the font size, making it a bit harder to read than EPUB. Maybe it's a limitation of PDF, not this reader particularly.

So it looks like, because of the nature of PDF, Calibre and other tools simply cannot handle more complex parts such as tables or insets.

Incidently, considering how many e-reader users need to access PDFs, I might not be the only person curious to know more about PDF without delving too deeply either, and understand how it works (Postscript, etc.) under the hood and why it's such a pain to turn into EPUB.

Are there other sites/books you would recommend?
Shohreh is offline   Reply With Quote
Old 04-20-2020, 08:31 PM   #10
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Is Caliber using pdftohtml before doctoring the HTML and turning it into an EPUB file?

I notice that…
  1. pdftohtml hasn't been updated since 2006-08-03 (!)
  2. mutool, which is open-source like pdftohtml, does an excellent job (graphically speaking) converting a PDF into a single HTML file with pictures embedded as base64.

Why doesn't Caliber use mutool instead?

Do you know of a good tool to turn the HTML file from mutool into EPUB?
Shohreh is offline   Reply With Quote
Old 04-20-2020, 10:49 PM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
that is non-reflowable HTML, it is just as useless as the original PDF file. And calibre uses poppler, which took over pdftohtml from xpdf over a decade ago.
kovidgoyal is offline   Reply With Quote
Old 04-21-2020, 07:54 AM   #12
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Good to know. The HTML files in the EPUB still mention pdtohtml, hence the confusion:

Code:
<meta name="generator" content="pdftohtml 0.36"/>"
--
Edit: Is there a good article somewhere that goes through the options, and generally speaking, explains how to best use Calibre to convert a PDF into EPUB?
Attached Thumbnails
Click image for larger version

Name:	701F1210-3636-414F-9FCA-A16C87A91871.png
Views:	207
Size:	14.1 KB
ID:	178589  

Last edited by Shohreh; 04-21-2020 at 08:57 AM.
Shohreh is offline   Reply With Quote
Old 04-21-2020, 11:29 AM   #13
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Shohreh View Post
Good to know. The HTML files in the EPUB still mention pdtohtml, hence the confusion:

Code:
<meta name="generator" content="pdftohtml 0.36"/>"
--
Edit: Is there a good article somewhere that goes through the options, and generally speaking, explains how to best use Calibre to convert a PDF into EPUB?
The best way to convert PDF is not to do so (IMO). In the real world, be prepared to spend some quality time with an epub editor to fix up the issues from converting a page oriented format to a reflowable format.
DNSB is offline   Reply With Quote
Old 04-22-2020, 05:57 AM   #14
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Thank you.

Turns out my e-reader isn't bad at displaying PDF after changing a couple of settings, including a… "Reflow text" option.

Since PDFs are so common, when buying an e-reader, it's a good idea to check how well it supports PDFs without bothering trying to turn them into EPUBs.

Out of curiosity, I'll see if I can find infos about what all the settings in Caliber's Convert dialog mean.

Am I correct in understanding that Caliber first lets poppler convert the PDF into HTML + PNG/JPG, and then goes to work on the HTML based on the settings in the Conversion dialog?

Last edited by Shohreh; 04-22-2020 at 07:02 AM.
Shohreh is offline   Reply With Quote
Old 04-22-2020, 07:40 AM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Shohreh View Post

Am I correct in understanding that Caliber first lets poppler convert the PDF into HTML + PNG/JPG, and then goes to work on the HTML based on the settings in the Conversion dialog?
yes .
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Plato, alternative e-book reader (ePUB, PDF, DJVU) for Remarkable Tablet darvin88 More E-Book Readers 6 07-20-2018 05:16 PM
iPad Alternative to GoodReader? (PDF on iOS) lo-fi Apple Devices 12 04-19-2017 01:17 AM
epub → pdf conversion: remove a section dma_k Conversion 8 08-31-2016 05:40 PM
Creating epub/kepub books (docx→epub/kepub via MS Word→Calibre) SJC-Caron ePub 18 04-21-2016 11:10 AM
Calibre and pdf to epub JCSullivan Calibre 3 05-26-2010 09:46 PM


All times are GMT -4. The time now is 05:52 PM.


MobileRead.com is a privately owned, operated and funded community.