Woah you are fast haha. I got rid of the test files and completely refactored the book parsing algorithm.. Now it uses all regex instead of a mix of regex and some other algorithms. It's much more accurate as far as I can tell and this way, I don't have to encode/decode the html which should make it work better for books with non-ascii characters.
|