Another way to accomplish what theducks said about a regular expression is to use the plugin Diaps editing toolbag. With it you can delete the outer p, span, and strong, and then convert the span around the chapter title to h3. When cleaning up books from Gutenberg I nuke all of their classes and whatnot, merge the files like DNSB said, copy everything between the body tags and paste it into a new file, then in that split on the h3 or whatever tags.
|