Improved code
cleanup_file_name procedure, because it can also handle book titles in non-Latin characters.
Applies to many plugins: Borkify, ePub2, ePub3-itizer, FolderOut, iBooksFix, KEPUB, NCXRemove, sampleOutput and probably some others.
Code:
# borrowed from calibre from calibre/src/calibre/__init__.py; updated by KevinH for epub3 output plugin
def cleanup_file_name(name):
import unicodedata
_filename_sanitize = re.compile(r'[\xae\0\\|\?\*<":>\+/]')
substitute='_'
one = ''.join(char for char in unicodedata.normalize(
'NFKD', name
) if unicodedata.category(char) != 'Mn')
one = one.replace(u'\u2013', '-').replace(u'\u2014', '-')\
.replace(u'\u0142', 'l').replace(u'\u0141', 'L')
one = _filename_sanitize.sub(substitute, one)
one = re.sub(r'\s', '_', one).strip()
one = re.sub(r'^\.+$', '_', one)
one = one.replace('..', substitute)
# Windows doesn't like path components that end with a period
if one.endswith('.'):
one = one[:-1]+substitute
# Mac and Unix don't like file names that begin with a full stop
if len(one) > 0 and one[0:1] == '.':
one = substitute+one[1:]
return one
This line:
Code:
one = one.replace(u'\u2013', '-').replace(u'\u2014', '-')\
.replace(u'\u0142', 'l').replace(u'\u0141', 'L')
applies to en dash, em dash, latin small letter l with stroke, latin capital letter L with stroke (
https://en.wikipedia.org/wiki/%C5%81).