Nevermind. I just solved the problem:
As I said I just copied the index_to_soup function into my new recipe and renamed it to my_index_to_soup. Then I added the following lines before the "return" statement:
Code:
#remove erroneous strings from input file
massage.append((re.compile("<!#BeginList>"), lambda match:''))
massage.append((re.compile("<!#EndList>"), lambda match:''))
and voila, the junk is removed ...