MobileRead Forums - View Single Post - Simple edit/replace question from beginner

theducks · 03-19-2011, 09:43 AM

Quote:

Originally Posted by JustinD

I have an epub book that has the following formatting:

Since long before the coming of Gods and mortals, the great rock of Krasnegar 
had stood amid the storms and ice of the Winter Ocean, resolute and eternal. 
Throughout long arctic nights it glimmered under the haunted dance of aurora and 
the rays of the cold, sad moon, while the icepack ground in useless anger around 

So, each line is unnaturally shortened by the  

How do I edit to remove this while keeping my actual paragraphs? Sorry for what is a simple question but I am new to this. I am hoping I could have a regex but given there doesn't seem to be anything distinguishing the end of para from line I am a bit stumped.

any thoughts?
Justin

Jellby has it correct (and I had forgotten about those terribly formatted, exceptions.

)

TEST your REPLACE code on a few before using the 'replace all' button

Save your work befor starting the NEXT whole document replace.
File 1 (open from the Recent list) 'Discard' is your friend

Step 1: use a COUNT SEARCH BEFORE (all HTML files) to get an idea of how bad it is

Regex:

Code:

 <br class="calibre1" />\s+<br class="calibre1" />

to look to see if they did those type of paragraph breaks.

If you have a lot (more than a few per section split) of tags, those are probably just scene breaks .

Step 1.5: change the scene break to a scene marker (your choice)
the REPLACE for the search term above

Code:

</> <p class="scenebreak">* * *</p> <p class="whatever...">

Notes: scenebreak is the name of your css styling selector. The first closes the previous tag. the last <p class=whatever was used to start the original P tag" to make a next paragraph start. Tidy will make the code pretty, so don't worry about newlines

Step 2: is to Now replace the lone BR

Note: don't try and get all cases in a singe pass, but really-really take care to ONLY replace your current target case
Search:

Code:

(\w)<br class="calibre1" />\s+<br class="calibre1" />(\w)

Code:

\1</p> <p class="[COLOR="RoyalBlue"][COLOR="RoyalBlue"]whatever...">\2

the \1 and \2 puts whatever was matched before and after the BR, back with a end P and a start next P replacing the BR

Step 3:
you may have to create additional searches to handle punctuation and quote (remember to escape wild cards in the search) combination's.

Take your time to learn what works