View Single Post
Old 03-15-2019, 10:00 PM   #6
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 451
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
You could simply open it in the Editor, delete any files you don't want in the result, then:

->Arrange the header of the top file as you desire, including removing CSS links if you like.
->Select all the text files you want, right-click, and merge them.
->Use some regex to get rid of all the IDs, attributes, spans, divs, and what-have-you.

->Another regex search to turn all the various <p class="whatever"> to simply <p>...or maybe two or three searches if you want several in your result. Same with <hn...> lines.

(If you have a book based all on <div>s instead of <p>s, adjust accordingly.)

That should leave you with one file, as simplified as you desire. Just export it.

Given all the stuff in books, I doubt you'd ever automate this, but go through one and save your searches, and the next ones should literally take only minutes.

I basically do this when faced with some ancient, amateur scanned book that looks like a ransom note. Then I re-format and split the single file into chapters of whatever. But the basic clean-up only takes a couple of minutes with saved searches.

Last edited by retiredbiker; 03-15-2019 at 10:05 PM.
retiredbiker is offline   Reply With Quote