MobileRead Forums - View Single Post

BeckyEbook · 05-09-2018, 05:42 AM

Improved code cleanup_file_name procedure, because it can also handle book titles in non-Latin characters.
Applies to many plugins: Borkify, ePub2, ePub3-itizer, FolderOut, iBooksFix, KEPUB, NCXRemove, sampleOutput and probably some others.

Code:

# borrowed from calibre from calibre/src/calibre/__init__.py; updated by KevinH for epub3 output plugin
def cleanup_file_name(name):
    import unicodedata
    _filename_sanitize = re.compile(r'[\xae\0\\|\?\*<":>\+/]')
    substitute='_'
    one = ''.join(char for char in unicodedata.normalize(
            'NFKD', name
        ) if unicodedata.category(char) != 'Mn')
    one = one.replace(u'\u2013', '-').replace(u'\u2014', '-')\
                   .replace(u'\u0142', 'l').replace(u'\u0141', 'L')
    one = _filename_sanitize.sub(substitute, one)
    one = re.sub(r'\s', '_', one).strip()
    one = re.sub(r'^\.+$', '_', one)
    one = one.replace('..', substitute)
    # Windows doesn't like path components that end with a period
    if one.endswith('.'):
        one = one[:-1]+substitute
    # Mac and Unix don't like file names that begin with a full stop
    if len(one) > 0 and one[0:1] == '.':
        one = substitute+one[1:]
    return one

This line:

Code:

    one = one.replace(u'\u2013', '-').replace(u'\u2014', '-')\
                   .replace(u'\u0142', 'l').replace(u'\u0141', 'L')

applies to en dash, em dash, latin small letter l with stroke, latin capital letter L with stroke (https://en.wikipedia.org/wiki/%C5%81).