Quote:
Originally Posted by BetterRed
Did you try passing the single column output of k2pdfopt through MS Word or LO Writer to produce a DOCX and then convert that to EPUB in calibre.
|
Writer can't open PDF's into editable format - at least in the version I have installed - it always opens them in Draw if at all. Word would work better - I know it can open multi-column PDF's directly, with varying degrees of success, but as a Linux user, it's not really an option.
Looks like so far the best bet is using pdftohtml to convert the file to XML and then use a text editor and various regular expressions to strip or replace the xml tags with html tags before using ebook-convert to convert it. Takes quite a bit of manual work, but it's doable, at least for the more interesting use cases - it's probably 1-2 hours of work to do a book, if the layout and formatting is consistent so I can effectively use regexp.