MobileRead Forums - View Single Post - pdf->epub - using idents as a cue to line-unwrapping

retiredbiker · 12-31-2018, 12:21 PM

Quote:

Originally Posted by VcSaJen

If it's not possible, are there any other tool that have that option?

Read the sticky post about pdf conversion. As to other tools, I use "pdftotext" from the Poppler utilities. The --layout option will give you a text file with all the leading spaces and any extra linefeeds between paragraphs intact. Then a little regex will easily get you to the "real" paragraphs you want. If you are really lucky, there may be, say, 5 linefeeds at each chapter break, so you can get those with regex as well.

Assuming, of course, that what you want exists to start with. As with anything pdf, success depends on what is inside the source file. Pdftotext will at least show you what is there, and it may vary from excellent to impossible. Simple books like novels often work well with this, but if you have double columns or something complex like a science textbook, its a lot more work.