What was the source material for this content? I'd bet the japanese words were probably italicized or otherwise had some odd formatting/markup that got lost in the conversion and took a space with it (I've seen similar things happen before with soft hyphens, for example). Without knowing where this content originated, there's no way to know why it converted poorly. Given that it apparently happens on every japanese word inline in a sentence, it was almost certainly a conversion issue.
As for fixing it, Sigil is probably your best bet. It has find/replace across all HTML files, and works within the epub format. Another option would be to use Calibre's "Tweak ePub" functionality (hotkey: T), which expands the epub into a temp folder where you can use your favorite text or HTML editor to modify the files, and then packages everything back up when done. Another similar option would be to expand the epub yourself. There is no real "epub format" per se. It's just HTML and CSS files in a zip container renamed .epub (there are metadata files and certain file location requirements, but as long as you're re-zipping everything back up you're not going to break the format). There's really no reason to use this approach, since it's just the manual version of using Calibre's "Tweak ePub" option.
Personally, I'd just use Sigil to fix things and edit the file directly from the calibre file store (yes, I know, not really recommended to go mucking about in the calibre storage folders, but as long as the epubs exist on disk this is relatively safe). However if you have another editor that you prefer (notepad++, visual studio, whatever), you could expand the epub, load all of the html files into your favorite editor, and use that editor's find/replace functionality to do the cleanup.
For this specific instance, you could certainly write a script in perl, python, sed, powershell, etc that would fix each instance of lost spaces. There's no universal automated solution for cleanups like this, though, because every cleanup task is different (for example, your next cleanup might be removing newlines after -s, rather than inserting spaces before certain words).
|