MobileRead Forums - View Single Post

dhdurgee · 11-18-2015, 05:04 PM

I have now used the kindlegen plugin on over 100 documents and would up tweaking the plugin.py just a little more to remove more characters from the title for the filename:

# replace unwanted characters in book title
title = re.sub('[/|\u2019|\u201C|\u201D|\u2024|\u2025|\u2026|\?|<|>| \\\\|:|;|\.|,|\+|=|!|&|\*|\||\"|\^|\'|\s]+', '_', dc_title.group(1))
title = title.strip("_")

You might want to consider updating your script as well, and you can probably improve on my tweaks. I was surprised by the unicode in some of the document titles and think there might be a better way to deal with this, perhaps using ranges to remove non-alphabetics? In fact, I am starting to wonder if perhaps the best bet might be to invert the sense and define a list of permitted characters and translate/trim anything else.

Dave

11-18-2015, 05:04 PM	#19
dhdurgee Guru Posts: 918 Karma: 3000002 Join Date: Jun 2010 Device: K3W, PW4	I have now used the kindlegen plugin on over 100 documents and would up tweaking the plugin.py just a little more to remove more characters from the title for the filename: # replace unwanted characters in book title title = re.sub('[/\|\u2019\|\u201C\|\u201D\|\u2024\|\u2025\|\u2026\|\?\|<\|>\| \\\\\|:\|;\|\.\|,\|\+\|=\|!\|&\|\*\|\\|\|\"\|\^\|\'\|\s]+', '_', dc_title.group(1)) title = title.strip("_") You might want to consider updating your script as well, and you can probably improve on my tweaks. I was surprised by the unicode in some of the document titles and think there might be a better way to deal with this, perhaps using ranges to remove non-alphabetics? In fact, I am starting to wonder if perhaps the best bet might be to invert the sense and define a list of permitted characters and translate/trim anything else. Dave