Quote:
Originally Posted by JustinD
I have an epub book that has the following formatting:
Since long before the coming of Gods and mortals, the great rock of Krasnegar<br class="calibre1" />
had stood amid the storms and ice of the Winter Ocean, resolute and eternal.<br class="calibre1" />
Throughout long arctic nights it glimmered under the haunted dance of aurora and<br class="calibre1" />
the rays of the cold, sad moon, while the icepack ground in useless anger around<br class="calibre1" />
So, each line is unnaturally shortened by the <br class="calibre1" />
How do I edit to remove this while keeping my actual paragraphs? Sorry for what is a simple question but I am new to this. I am hoping I could have a regex but given there doesn't seem to be anything distinguishing the end of para from line I am a bit stumped.
any thoughts?
Justin
|
Jellby has it correct (and I had forgotten about those terribly formatted, exceptions.
)
TEST your REPLACE code on a few before using the 'replace all' button
Save your work befor starting the NEXT whole document replace.
File
1 (open from the Recent list) 'Discard' is your friend
Step 1: use a COUNT SEARCH
BEFORE (all HTML files) to get an idea of how bad it is
Regex:
Code:
<br class="calibre1" />\s+<br class="calibre1" />
to look to
see if they did those type of paragraph breaks.
If you have a lot (more than a few per section split) of </p> tags, those are probably just scene breaks .
Step 1.5: change the scene break to a scene marker (your choice)
the REPLACE for the search term above
Code:
</> <p class="scenebreak">* * *</p> <p class="whatever...">
Notes:
scenebreak is the name of your css styling selector. The first </p> closes the previous <p> tag. the last <p class=
whatever was used to start the original P tag" to make a next paragraph start. Tidy will make the code pretty, so don't worry about newlines
Step 2: is to
Now replace the lone BR
Note: don't try and get all cases in a singe pass, but
really-really take care to ONLY replace your current target case
Search:
Code:
(\w)<br class="calibre1" />\s+<br class="calibre1" />(\w)
Code:
\1</p> <p class="[COLOR="RoyalBlue"][COLOR="RoyalBlue"]whatever...">\2
the \1 and \2 puts whatever was matched before and after the BR, back with a end P and a start next P replacing the BR
Step 3:
you may have to create additional searches to handle punctuation and quote (remember to escape wild cards in the search) combination's.
Take your time to learn what works