Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-18-2011, 09:51 PM   #1
JustinD
Member
JustinD began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Mar 2011
Device: iPad mini, iPhone, Kindle
Simple edit/replace question from beginner

I have an epub book that has the following formatting:

Since long before the coming of Gods and mortals, the great rock of Krasnegar<br class="calibre1" />
had stood amid the storms and ice of the Winter Ocean, resolute and eternal.<br class="calibre1" />
Throughout long arctic nights it glimmered under the haunted dance of aurora and<br class="calibre1" />
the rays of the cold, sad moon, while the icepack ground in useless anger around<br class="calibre1" />


So, each line is unnaturally shortened by the <br class="calibre1" />

How do I edit to remove this while keeping my actual paragraphs? Sorry for what is a simple question but I am new to this. I am hoping I could have a regex but given there doesn't seem to be anything distinguishing the end of para from line I am a bit stumped.

any thoughts?
Justin
JustinD is offline   Reply With Quote
Old 03-19-2011, 04:46 AM   #2
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
If real paragraphs are not marked in any special way (a <p>, a blank line, two consecutive <br>...), I'm afraid there's no other way than doing it manually. You could assume every <br> followed by an uppercase letter (with maybe a quote mark in between) is a new paragraph, but that's going to fail quite often.
Jellby is offline   Reply With Quote
Advert
Old 03-19-2011, 09:43 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by JustinD View Post
I have an epub book that has the following formatting:

Since long before the coming of Gods and mortals, the great rock of Krasnegar<br class="calibre1" />
had stood amid the storms and ice of the Winter Ocean, resolute and eternal.<br class="calibre1" />
Throughout long arctic nights it glimmered under the haunted dance of aurora and<br class="calibre1" />
the rays of the cold, sad moon, while the icepack ground in useless anger around<br class="calibre1" />


So, each line is unnaturally shortened by the <br class="calibre1" />

How do I edit to remove this while keeping my actual paragraphs? Sorry for what is a simple question but I am new to this. I am hoping I could have a regex but given there doesn't seem to be anything distinguishing the end of para from line I am a bit stumped.

any thoughts?
Justin
Jellby has it correct (and I had forgotten about those terribly formatted, exceptions. )

TEST your REPLACE code on a few before using the 'replace all' button

Save your work befor starting the NEXT whole document replace.
File 1 (open from the Recent list) 'Discard' is your friend


Step 1: use a COUNT SEARCH BEFORE (all HTML files) to get an idea of how bad it is

Regex:
Code:
 <br class="calibre1" />\s+<br class="calibre1" />
to look to see if they did those type of paragraph breaks.

If you have a lot (more than a few per section split) of </p> tags, those are probably just scene breaks .

Step 1.5: change the scene break to a scene marker (your choice)
the REPLACE for the search term above
Code:
</> <p class="scenebreak">* * *</p> <p class="whatever...">
Notes: scenebreak is the name of your css styling selector. The first </p> closes the previous <p> tag. the last <p class=whatever was used to start the original P tag" to make a next paragraph start. Tidy will make the code pretty, so don't worry about newlines

Step 2: is to Now replace the lone BR

Note: don't try and get all cases in a singe pass, but really-really take care to ONLY replace your current target case
Search:
Code:
(\w)<br class="calibre1" />\s+<br class="calibre1" />(\w)
Code:
\1</p> <p class="[COLOR="RoyalBlue"][COLOR="RoyalBlue"]whatever...">\2
the \1 and \2 puts whatever was matched before and after the BR, back with a end P and a start next P replacing the BR

Step 3:
you may have to create additional searches to handle punctuation and quote (remember to escape wild cards in the search) combination's.

Take your time to learn what works
theducks is online now   Reply With Quote
Old 03-19-2011, 02:09 PM   #4
Faster
Connoisseur
Faster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of light
 
Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
Hi JustinD,
No idea whether theducks method works but this is what I'd use on a copy of the file:
Go to Code View in Sigil. In the Find/Replace box check the following - Match Case, Minimal Matching, Regular Expression, All.

In the Find box copy and paste this:
Code:
([^.:\?\!"”'’0-9])<br class="calibre1" />
In the Replace box put this:
Code:
\1space
(that's the digit one after the backslash and hit the space bar rather than putting the word space)

Set Look in to All HTML Files.

Hit Replace All then take a look in the Book View.

Back to Code View -
In the Find Box:

Code:
<br class="calibre1" />
In Replace:

Code:
<p>
Set Look in to All HTML Files.

Hit Replace All then take a look in the Book View.
The reason for switching to and fro from Code View to Book View is that Sigil cleverly fixes up any incomplete tagging when you do this.

Can't be sure this will work as a full solution because your sample is quite brief so you may need to manually adjust any glitches.

Last edited by Faster; 03-19-2011 at 02:13 PM.
Faster is offline   Reply With Quote
Old 03-20-2011, 03:01 AM   #5
JustinD
Member
JustinD began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Mar 2011
Device: iPad mini, iPhone, Kindle
Faster, you answer was perfect! Thanks also to theducks, but I found "Faster's" answer easier to execute and it worked very well. It made a really annoyingly formatted book into something very readable.

I have another question: Another book where I had a problem with
<p class="calibre2">
I ran a regex of : Find: ([a-z])</p>\s+<p class="calibre2">
Replace: \1
that I saw somewhere.

This worked great but now I have no paragraphs of any real note. Just wondering what would be the easy way to recreate some paragraphs. Any thoughts? I know I cannot find the original breaks but I am not hung up on the breaks being the same but just easier for reading.
JustinD is offline   Reply With Quote
Advert
Old 03-20-2011, 05:35 AM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Why not just a replace of <p class="calibre2"> to just <p> and create <p> to your liking in the stylesheet?
Toxaris is offline   Reply With Quote
Old 03-20-2011, 11:48 AM   #7
Faster
Connoisseur
Faster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of light
 
Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
Hi again. First the expression I gave you can be cleaned by removing the backslashes within the character class. Realized this two minutes after posting, but I knew it wouldn't do any harm Here's the change:
Code:
([^.:\?\!"”'’0-9])<br class="calibre1" />
becomes
([^.:?!"”'’0-9])<br class="calibre1" />
The reason is that only ]\^- need to be escaped within a character class.

Toxaris I think JustinD has performed the Regex and lost the original so he wants to know how to deal with the outcome.
My first suggesion would be to download the original file again if possible and follow Toxaris' suggestion.
If that's not an option you would need to decide at which punctuation characters you want to create a paragraph break. For example if you want the breaks where there is a full stop followed by a space:
Find:
Code:
(.)space
Replace:
Code:
\1<p>
The reason for the space is that you don't want to find the full stop just before some quote mark, for example:
Code:
"That's my dog." he said.
If you want to find more than the full stop just include them in a character class [...].
Example find:
Code:
([.!?])space
- but nothing is going to make it look correct as you'll be making every sentence into a paragraph.
Faster is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Edit MultiData Search and Replace to multiple fields Doug-W Library Management 1 02-22-2011 03:17 PM
erm, simple question , hope for simple answer! al zymers Amazon Kindle 5 09-25-2010 01:01 PM
Calibre Beginner Question GonnaTri Calibre 0 06-07-2010 01:50 PM
Very Simple Question dreamhunter Calibre 2 03-27-2010 05:37 PM
Simple question for a simple mind :) PKFFW OpenInkpot 6 08-27-2009 09:00 PM


All times are GMT -4. The time now is 06:10 PM.


MobileRead.com is a privately owned, operated and funded community.