Quote:
Originally Posted by odinokij
(*) Note: I don't speak german nor dutch, so if you find any fail in these languages, please report so I can fix it.
I hope it will also be useful for you,
Odinokij.
|
Hey great!


...
I started working on the 'matching.py' of the original plug-in, but your version looks a lot more elaborated. When comparing to the original matching.py, I have some questions:
1) In general: I do not speak spanish/portuguese, so I do not understand your comments
2) In def fuzzy_it(text, patterns=None):, why didn't you change (tweaks.get('title_sort_articles', r'^(a|the|an)\s+'), ''),
3) In def get_title_tokens I added something like:
'NL', 'ebook', 'e-Book' and 'druk' as possible alternatives in
Quote:
(r'(?i)[({\[](\d{4}|ebook|e-book|NL|omnibus|anthology|hardcover|paperback|mass \s*market|edition|ed\.)[\])}]', ''),
|
4) In def get_title_tokens I think we should need to add 'een', because the words 'de', 'het' and 'een' represent articles as used in the Dutch language.
Quote:
tokens_du = ('een', 'de', 'het', 'van', 'met', 'naar')
|