View Single Post
Old 12-31-2018, 11:21 AM   #2
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
Quote:
Originally Posted by VcSaJen View Post
If it's not possible, are there any other tool that have that option?
Read the sticky post about pdf conversion. As to other tools, I use "pdftotext" from the Poppler utilities. The --layout option will give you a text file with all the leading spaces and any extra linefeeds between paragraphs intact. Then a little regex will easily get you to the "real" paragraphs you want. If you are really lucky, there may be, say, 5 linefeeds at each chapter break, so you can get those with regex as well.

Assuming, of course, that what you want exists to start with. As with anything pdf, success depends on what is inside the source file. Pdftotext will at least show you what is there, and it may vary from excellent to impossible. Simple books like novels often work well with this, but if you have double columns or something complex like a science textbook, its a lot more work.
retiredbiker is offline   Reply With Quote