You could simply open it in the Editor, delete any files you don't want in the result, then:
->Arrange the header of the top file as you desire, including removing CSS links if you like.
->Select all the text files you want, right-click, and merge them.
->Use some regex to get rid of all the IDs, attributes, spans, divs, and what-have-you.
->Another regex search to turn all the various <p class="whatever"> to simply <p>...or maybe two or three searches if you want several in your result. Same with <hn...> lines.
(If you have a book based all on <div>s instead of <p>s, adjust accordingly.)
That should leave you with one file, as simplified as you desire. Just export it.
Given all the stuff in books, I doubt you'd ever automate this, but go through one and save your searches, and the next ones should literally take only minutes.
I basically do this when faced with some ancient, amateur scanned book that looks like a ransom note. Then I re-format and split the single file into chapters of whatever. But the basic clean-up only takes a couple of minutes with saved searches.
Last edited by retiredbiker; 03-15-2019 at 10:05 PM.
|