An input plugin is defined as a subclass of InputFormatPlugin, by convention these subclass are usually in files called input.py. To add support for preprocessing for a particular input format, just reimplement the following method in the input plugin for that format
Code:
def preprocess_html(self, html):
'''
This method is called by the conversion pipeline on all HTML before it
is parsed. It is meant to be used to do any required preprocessing on
the HTML, like removing hard line breaks, etc.
:param html: A unicode string
:return: A unicode string
'''
return html