View Single Post
Old 02-24-2023, 10:31 AM   #5
Lukusaukko
Connoisseur
Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.Lukusaukko ought to be getting tired of karma fortunes by now.
 
Posts: 56
Karma: 392326
Join Date: Feb 2023
Device: Kobo Libra 2
Quote:
Originally Posted by BetterRed View Post
Did you try passing the single column output of k2pdfopt through MS Word or LO Writer to produce a DOCX and then convert that to EPUB in calibre.
Writer can't open PDF's into editable format - at least in the version I have installed - it always opens them in Draw if at all. Word would work better - I know it can open multi-column PDF's directly, with varying degrees of success, but as a Linux user, it's not really an option.

Looks like so far the best bet is using pdftohtml to convert the file to XML and then use a text editor and various regular expressions to strip or replace the xml tags with html tags before using ebook-convert to convert it. Takes quite a bit of manual work, but it's doable, at least for the more interesting use cases - it's probably 1-2 hours of work to do a book, if the layout and formatting is consistent so I can effectively use regexp.
Lukusaukko is offline   Reply With Quote