I have quite a few e-books from Baen and noticed that they use straight quotes instead of curly quotes. So I searched for a script or program that automatically converts the quotes, but couldn't find one that I could use. Then I discovered the program "sed" and made a dirty little one-liner that converts the HTML file from the exploded Baen LIT:
Code:
#!/bin/bash
set -e
mv "$1" "$1.backup"
cat "$1.backup" | sed -e 's|"\([^"][^"]*\)"|“\1”|g' -e 's|"|“|g' -e 's|=“[^”]*”|="\0"|g' -e 's|=“||g' -e 's|”"|"|g' -e 's|=”[^“]*“|="\0"|g' -e 's|=”||g' -e 's|“"|"|g' -e "s| '| ‘|g" -e "s|'|’|g" -e "s|“|\“|g" -e "s|”|\”|g" -e "s|‘|\‘|g" -e "s|’|\’|g" -e "s|\. \. \.|\…|g" -e "s|\.\.\.|\…|g" -e "s|\.\ \.\ \.|\…|g" -e "s|\. \. \. \.|\…|g" -e "s|\.\.\.\.|\…|g" -e "s|\.\ \.\ \.\ \.|\…|g" -e 's|\“+//|"+//|g' -e 's|//EN\”|//EN"|g' -e 's|\“http://openebook|"http://openebook|g' -e 's|\.dtd\”>|\.dtd">|g' > "$1"
exit 0
Maybe someone will find this useful. It converts straight quotes to curly quotes and ". . ." and ". . . ." to "…" (…) – and of course it makes a backup of the original HTML file.
I haven't tried, but
there's sed for Windows too.
Are there other – maybe nicer – ways to do this task? My one-liner works well with Baen books, but has some limitations.
By the way, why is it that Baen Mobipocket e-books look nicer when exploding the MS Reader LIT and converting to Mobipocket yourself?