Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-17-2023, 05:02 PM   #1
btired
Junior Member
btired began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Sep 2023
Device: iPhone
Using Regex to replace line breaks

I am editing and cleaning up a bunch of Calibre epub conversions and they're messy and full of trash code. Regex has been so helpful to find stuff that regular find and replace can't do efficiently.

I'm still pretty new to using Regex and I was wondering if it can help me find the following: Carriage returns / new lines that are not new lines that end with a tag; then replace them with a blank 'space'.

Here's and example of uncorrected text:
Code:
<p class="p2">‘Professor!’
It
was
Vesuvius.
She sounded frightened. ‘Professor!’</p>
<p class="p2">Sara
looked to Robert.</p>

<p class="p1"><br/>‘What is it?’ Sara asked. Robert put his arm around her,
but she barely seemed to notice.</p>
...and ideally, I'd like the corrected file to look like this:
Code:
<p class="p2">‘Professor!’ It was Vesuvius. She sounded frightened. ‘Professor!’</p>
<p class="p2">Sara looked to Robert.</p>

<p class="p1"><br/>‘What is it?’ Sara asked. Robert put his arm around her, but she barely seemed to notice.</p>
Is this possible to do with regex? I've tried to figure it out but can't quite get it.
btired is offline   Reply With Quote
Old 09-17-2023, 05:17 PM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
Start with a copy of your epub and try running Sigil's Mend and Prettify tool.

It may do what you want. If not, regex can.
KevinH is offline   Reply With Quote
Old 09-17-2023, 08:53 PM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
That is NOT a Calibre conversion (or artifact).

<p><br />...
Gimmie a proper Scene break or first paragraph by using a Top Margin.
I don't even try for a 1 click repair. After EACH success, I save in case a later fix goes off the rails

Once that is done,
I fix the lowercase lowercase ones. (This is the only one I use Replace all after a few tests. I replace the Class= with the actual one used in the book

The rest I step thru Replace-Next or Find (Skips that one). These are fairly few and take only a few minutes.
Then I fix Honorifics (many have a period)
Next is Initials <I can't remember if I do the Before or After Honorifics)

Then I fix Uppercase (eg Names)

Upper-Upper is mostly for acronyms
Attached Thumbnails
Click image for larger version

Name:	ToLower.JPG
Views:	70
Size:	19.7 KB
ID:	203847   Click image for larger version

Name:	Honorifics.JPG
Views:	67
Size:	27.2 KB
ID:	203848   Click image for larger version

Name:	Initials.JPG
Views:	63
Size:	26.5 KB
ID:	203849   Click image for larger version

Name:	ToUpper.JPG
Views:	57
Size:	32.1 KB
ID:	203850   Click image for larger version

Name:	Upper-Upper.JPG
Views:	52
Size:	28.3 KB
ID:	203851  
theducks is offline   Reply With Quote
Old 09-18-2023, 04:56 PM   #4
btired
Junior Member
btired began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Sep 2023
Device: iPhone
Quote:
Originally Posted by KevinH View Post
Start with a copy of your epub and try running Sigil's Mend and Prettify tool.

It may do what you want. If not, regex can.
KevinH, thank you so much! That solved my problem and did exactly what I needed it to do - I didn't know that tool existed.

Quote:
Originally Posted by theducks View Post
That is NOT a Calibre conversion (or artifact).

<p><br />...
Gimmie a proper Scene break or first paragraph by using a Top Margin.
The text I was giving as an example already had the code altered to how I wanted it, it was just the multiple unnecessary carriage returns / new lines that I wanted to fix. I should've been clearer in my explanation

Thanks for those regex's - they'll come in handy in my future editing!
btired is offline   Reply With Quote
Old 09-28-2023, 12:25 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Hey, btired, welcome to MobileRead!!!

Quote:
Originally Posted by btired View Post
I'm still pretty new to using Regex and I was wondering if it can help me find the following: Carriage returns / new lines that are not new lines that end with a tag; then replace them with a blank 'space'.
Like KevinH said, the easiest way is to:
  • Tools > Reformat HTML > Mend and Prettify All HTML Files

- - -

After that, if you wanted to merge "broken" paragraphs together, then see my answers in:

That would help combine things like:

Code:
<p>This is an example</p>
<p>of lines that accidentally</p>
<p>didn't merge into a single paragraph.</p>
and give you:

Code:
<p>This is an example of lines that accidentally didn't merge into a single paragraph.</p>
Tex2002ans is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Find and replace across line breaks skb Sigil 2 02-07-2017 04:08 PM
Removing Line breaks using regex in PDF when converting tankervin Conversion 3 01-12-2017 04:23 PM
Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil Contre-jour Sigil 9 02-01-2013 10:47 AM
Line breaks on Kindle, no line breaks on 4 PC Siavahda Kindle Formats 0 10-20-2012 05:50 AM
Find/Replace bogus line breaks in Text editor, w/Regular Expression scubaddictions Conversion 15 07-21-2011 08:52 AM


All times are GMT -4. The time now is 02:50 PM.


MobileRead.com is a privately owned, operated and funded community.