@user_none, where can I go to check out what you've done? I tried searching for pluginize through Launchpad, but didn't find anything. If pluginize refers to some sort of plugin architecture that users can enable disable functions that sounds like a better place to maintain them than html.py anyway.
Kovid's advice was enough to get me going, so I already started making some changes to my local copy of html.py. I don't want to duplicate any effort when it comes to submitting changes. I discovered some problems with at least one of the regexps there which causes some anomalies in some books, so I was going to submit those fixes at a minimum.
btw, I've tested on a number of other books. While the first regex doesn't get every wrapped line I haven't seen any scenario where it makes things worse. I've found the second regex will wrap things like page headers and footers(since they lack punctuation), which just winds up making those even more difficult to take out later.
|