View Single Post
Old 08-03-2023, 09:23 AM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by michaelbr View Post
I understand that PDF is not very friendly when converting to ePub, [...]
For PDF->EPUB, see my recent posts in:

Quote:
Originally Posted by michaelbr View Post
[...] and the result requires a lot of work.
Yep. It always will.

PDF is meant as a final output format, NOT as any sort of easy-to-use input format.

You might be able to create an "okay" rough ebook very quickly... especially if it's very basic like a Fiction book with only text + a few chapter headings.

But if you are dealing with anything complicated, or a crappier scan... converting PDF requires much more elbow grease.

Quote:
Originally Posted by michaelbr View Post
1 - open PDF file in browser (Brave in my case)
2 - copy the text from browser and paste into an ePub editing app (Sigil)
No. I would not trust this with a ten foot pole.

PDFs have two layers, imagine it like a front/back:
  • Image Layer
  • Text Layer

The image layer is the visible stuff you can see—like the yellowed paper, dust, etc., when a book is scanned in.

The text layer is the hidden stuff—this is where the OCR + text is placed. (This is how you can search through a document for certain words, highlight sections of the book, etc.)

Quite often, what you see with your own two eyes IS NOT what the PDF's text actually says underneath.

- - -

For example, just see the Archive.org book files I linked in this post:

You can see the ENORMOUS difference in quality+formatting just from a few tweaks I did.

- - -

Side Note: Even if the PDF was purely digital, like a document you saved straight out of LibreOffice/Word... I probably wouldn't trust a copy/paste out of PDF.

Depending on how someone saved the PDF, there's a ton of nasty stuff that can occur, like:
  • Hard hyphens
  • Ligatures
  • Headers/Footers
  • Broken paragraph breaks
  • Paragraphs out of order

which would be broken + carried over when trying to do a simple copy from PDF -> paste into a program.

This is why Calibre has all those PDF settings to try to detect/undo a lot of that garbage "automatically"... but even that can't help you in many cases.

Quote:
Originally Posted by michaelbr View Post
the step 3 requires a lot of work if there are lot of links/photos, anyone knows any tools/tips to make it automatic or easier than manually?
Try to get the book/document in any other format, if possible. Like if you are working with an author, get the original DOCX file.

If the PDF is the only thing that exists, then you can convert it as a last resort, but know that there's no easy, quick, one-button push to do it.

With PDF, there's no avoiding it... You'll need to put in the work to get a perfect ebook out the other end!

Last edited by Tex2002ans; 08-03-2023 at 09:30 AM.
Tex2002ans is offline