MobileRead Forums - View Single Post

Northguy · 02-03-2018, 08:48 AM

Quote:

Originally Posted by odinokij

(*) Note: I don't speak german nor dutch, so if you find any fail in these languages, please report so I can fix it.

I hope it will also be useful for you,

Odinokij.

Hey great!

...

I started working on the 'matching.py' of the original plug-in, but your version looks a lot more elaborated. When comparing to the original matching.py, I have some questions:
1) In general: I do not speak spanish/portuguese, so I do not understand your comments

2) In def fuzzy_it(text, patterns=None):, why didn't you change (tweaks.get('title_sort_articles', r'^(a|the|an)\s+'), ''),

3) In def get_title_tokens I added something like:

'NL', 'ebook', 'e-Book' and 'druk' as possible alternatives in

Quote:

(r'(?i)[({\[](\d{4}|ebook|e-book|NL|omnibus|anthology|hardcover|paperback|mass \s*market|edition|ed\.)[\])}]', ''),

4) In def get_title_tokens I think we should need to add 'een', because the words 'de', 'het' and 'een' represent articles as used in the Dutch language.

Quote:

tokens_du = ('een', 'de', 'het', 'van', 'met', 'naar')