View Single Post
Old 08-31-2018, 06:21 PM   #44
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,252
Karma: 16544692
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Quote:
Originally Posted by sealbeater View Post
I don't think it would be very difficult at all, actually. Depends on the nature of the PDF, of course. Even if it's just images, it may be doable. It's just a matter of leveraging already existing tools. pdftotext, pdftohtml, pdftops and pdftodvi all already exist. I've never tried to make an epub before so I would need to read up on the structure. Anyway, its not something I have time for anytime soon, I was just commenting that $99 bux seems high for something that could probably be scripted out in 20 minutes.
This sounds like a statement from someone who has not fully understood the problem!

Creating the epub is the easy bit. In my experience extracting the PDF contents and creating algorithms to rejoin the text fragments into paragraphs with correct text in the correct order is the hard bit. Not to mention making sure you don't lose bold, italics and scenebreaks in the process. Getting rid of page headers/footer and unwanted end-of-line hyphens also presents a challenge. Extracting all the images is also a hit-and-miss affair. I could go on ...

Once you've sorted out the above for simple fiction books you'll need to solve the problem of PDFs with text in multiple columns and handling footnotes if you're going to convert non-fiction PDFs.

If you think you can come up with a generic "magic button" solution for converting any/all PDFs to high-quality epub by writing a script in 20 minutes (or 20 hours or 20 days) I suggest you drop all your current projects, including your day job. I suspect you'd be able to retire on the proceeds. You may even be considered the New Messiah.
jackie_w is offline   Reply With Quote