View Single Post
Old 05-09-2018, 05:42 AM   #410
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 695
Karma: 2180740
Join Date: Jan 2017
Location: Poland
Device: Misc
Improved code cleanup_file_name procedure, because it can also handle book titles in non-Latin characters.
Applies to many plugins: Borkify, ePub2, ePub3-itizer, FolderOut, iBooksFix, KEPUB, NCXRemove, sampleOutput and probably some others.

Code:
# borrowed from calibre from calibre/src/calibre/__init__.py; updated by KevinH for epub3 output plugin
def cleanup_file_name(name):
    import unicodedata
    _filename_sanitize = re.compile(r'[\xae\0\\|\?\*<":>\+/]')
    substitute='_'
    one = ''.join(char for char in unicodedata.normalize(
            'NFKD', name
        ) if unicodedata.category(char) != 'Mn')
    one = one.replace(u'\u2013', '-').replace(u'\u2014', '-')\
                   .replace(u'\u0142', 'l').replace(u'\u0141', 'L')
    one = _filename_sanitize.sub(substitute, one)
    one = re.sub(r'\s', '_', one).strip()
    one = re.sub(r'^\.+$', '_', one)
    one = one.replace('..', substitute)
    # Windows doesn't like path components that end with a period
    if one.endswith('.'):
        one = one[:-1]+substitute
    # Mac and Unix don't like file names that begin with a full stop
    if len(one) > 0 and one[0:1] == '.':
        one = substitute+one[1:]
    return one

This line:
Code:
    one = one.replace(u'\u2013', '-').replace(u'\u2014', '-')\
                   .replace(u'\u0142', 'l').replace(u'\u0141', 'L')
applies to en dash, em dash, latin small letter l with stroke, latin capital letter L with stroke (https://en.wikipedia.org/wiki/%C5%81).
BeckyEbook is offline   Reply With Quote