I've stumbled across this as well, and have also stubbed my toes on regex. My preferred (but labour intensive) procedure is:
- export pdf as txt;
- manually remove headers & footers in editor of your choice (I use Note Tab Light) - the search function can be useful here;
- apply standard sequence of search & replace to remove hard line endings etc.
From there on you can either play with Sigil or try dropping it into Calibre (I play with Sigil)
Hope this helps a little
|