Quote:
Originally Posted by NovelFan
And if I convert the PDF version into the ePub version using calibre, then formatting is mostly gone except for a few bold and italic parts, also fonts are ignored, it's always the generic font. Also images are tiny and not optimized or out of place.
|
PDF is the absolute WORST format to convert.
PDF is meant as an output-only format—not as an input into anything else.
As you can see, you get LOTS of pain and junk carried over if you try to "one-button push" convert PDFs.
To convert PDF into a proper ebook requires lots of elbow grease.
Quote:
Originally Posted by NovelFan
Is there a more intelligent converter that retains design much more?
|
Yes, you need to use an actual OCR program... like ABBYY Finereader.
Quote:
Originally Posted by NovelFan
A gui to guide calibre could solve all problems. You simply mark parts as pagination, as headings, and it automatically marks all similar ones like the automatic table detection in tabula, which also works with pdf.
It just needs some guidance, after all, most authors stick to a pattern in their book, shown by formatting and frames and all you need to do is define what is what to then use similarity search for you to proof read before conversion quickly.
|
That's exactly what Finereader (or some other the other OCR tools) do.
It automatically marks:
- Sections
- Headers/Footers
- Tables/Images
- Footnotes
- [...]
- Formatting
- Bold/Italics/Smallcaps
- Superscript/Subscript
- [...]
- "Unsure Characters"
Then, it allows you to:
- See a side-by-side+magnified comparison of original vs. OCRed text.
- Allows you to quickly compare and make your corrections.
If you want even more knowledge...
I've extensively explained PDF->ebook workflows over the past 12 years. Most recently a few months ago in: