View Single Post
Old 08-31-2018, 07:00 PM   #45
sealbeater
Banned
sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.
 
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
Quote:
Originally Posted by jackie_w View Post
This sounds like a statement from someone who has not fully understood the problem!
Perhaps. Perhaps you are overthinking it? What do I know, I've never ever ever had to convert or deal with PDFs before.

/s for those who missed it.

Quote:
Originally Posted by jackie_w View Post
Creating the epub is the easy bit. In my experience extracting the PDF contents and creating algorithms to rejoin the text fragments into paragraphs with correct text in the correct order is the hard bit.
You've used the Poppler tools?


Quote:
Originally Posted by jackie_w View Post
Not to mention making sure you don't lose bold, italics and scenebreaks in the process.
I'm not that big of a stickler but if PostScript supports it, I'm sure I could extract it.


Quote:
Originally Posted by jackie_w View Post
Getting rid of page headers/footer and unwanted end-of-line hyphens also presents a challenge.
Sed works wonders.

Quote:
Originally Posted by jackie_w View Post
Extracting all the images is also a hit-and-miss affair. I could go on ...

What's hit or miss about it?


Quote:
Originally Posted by jackie_w View Post
Once you've sorted out the above for simple fiction books you'll need to solve the problem of PDFs with text in multiple columns and handling footnotes if you're going to convert non-fiction PDFs.
That is something I would have to think about but I believe when there's a will, there's a way. Maybe something with PostScript and multi-line justification. I would have to investigate should I ever have enough time and interest.




Quote:
Originally Posted by jackie_w View Post
If you think you can come up with a generic "magic button" solution for converting any/all PDFs to high-quality epub by writing a script in 20 minutes (or 20 hours or 20 days) I suggest you drop all your current projects, including your day job. I suspect you'd be able to retire on the proceeds. You may even be considered the New Messiah.
I think I could do a good enough job to meet my needs. I like how much you qualified my statement with requirements. "Any/all PDFs" "high-quality epub (whatever that means)", etc. I think I could come up with a good solution. I know I would try my hand at it rather than shell out $99 bux. I appreciate your suggestion that I drop all my current projects, including my day job to persue this but my day job pays quite well and it's thanks to my skills in coming up with solutions to problems like this that is why I get paid so well. I already am able to retire if I were to choose to do so.

You are free to regard me as your New Messiah if you like however.
sealbeater is offline   Reply With Quote