View Single Post
Old 11-18-2015, 04:04 PM   #19
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 910
Karma: 3000002
Join Date: Jun 2010
Device: K3W, PW4
I have now used the kindlegen plugin on over 100 documents and would up tweaking the plugin.py just a little more to remove more characters from the title for the filename:

# replace unwanted characters in book title
title = re.sub('[/|\u2019|\u201C|\u201D|\u2024|\u2025|\u2026|\?|<|>| \\\\|:|;|\.|,|\+|=|!|&|\*|\||\"|\^|\'|\s]+', '_', dc_title.group(1))
title = title.strip("_")

You might want to consider updating your script as well, and you can probably improve on my tweaks. I was surprised by the unicode in some of the document titles and think there might be a better way to deal with this, perhaps using ranges to remove non-alphabetics? In fact, I am starting to wonder if perhaps the best bet might be to invert the sense and define a list of permitted characters and translate/trim anything else.

Dave
dhdurgee is offline   Reply With Quote