![]() |
#1 |
Member
![]() Posts: 16
Karma: 10
Join Date: Mar 2011
Device: iPad mini, iPhone, Kindle
|
Simple edit/replace question from beginner
I have an epub book that has the following formatting:
Since long before the coming of Gods and mortals, the great rock of Krasnegar<br class="calibre1" /> had stood amid the storms and ice of the Winter Ocean, resolute and eternal.<br class="calibre1" /> Throughout long arctic nights it glimmered under the haunted dance of aurora and<br class="calibre1" /> the rays of the cold, sad moon, while the icepack ground in useless anger around<br class="calibre1" /> So, each line is unnaturally shortened by the <br class="calibre1" /> How do I edit to remove this while keeping my actual paragraphs? Sorry for what is a simple question but I am new to this. I am hoping I could have a regex but given there doesn't seem to be anything distinguishing the end of para from line I am a bit stumped. any thoughts? Justin |
![]() |
![]() |
![]() |
#2 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
If real paragraphs are not marked in any special way (a <p>, a blank line, two consecutive <br>...), I'm afraid there's no other way than doing it manually. You could assume every <br> followed by an uppercase letter (with maybe a quote mark in between) is a new paragraph, but that's going to fail quite often.
|
![]() |
![]() |
![]() |
#3 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,889
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
![]() ![]() ![]() File 1 (open from the Recent list) 'Discard' is your friend ![]() Step 1: use a COUNT SEARCH BEFORE (all HTML files) to get an idea of how bad it is ![]() Regex: Code:
<br class="calibre1" />\s+<br class="calibre1" /> If you have a lot (more than a few per section split) of </p> tags, those are probably just scene breaks . Step 1.5: change the scene break to a scene marker (your choice) the REPLACE for the search term above Code:
</> <p class="scenebreak">* * *</p> <p class="whatever...">
![]() Step 2: is to Now replace the lone BR Note: don't try and get all cases in a singe pass, but really-really take care to ONLY replace your current target case Search: Code:
(\w)<br class="calibre1" />\s+<br class="calibre1" />(\w) Code:
\1</p> <p class="[COLOR="RoyalBlue"][COLOR="RoyalBlue"]whatever...">\2
Step 3: you may have to create additional searches to handle punctuation and quote (remember to escape wild cards in the search) combination's. Take your time to learn what works ![]() |
|
![]() |
![]() |
![]() |
#4 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
|
Hi JustinD,
No idea whether theducks method works but this is what I'd use on a copy of the file: Go to Code View in Sigil. In the Find/Replace box check the following - Match Case, Minimal Matching, Regular Expression, All. In the Find box copy and paste this: Code:
([^.:\?\!"”'’0-9])<br class="calibre1" /> Code:
\1space Set Look in to All HTML Files. Hit Replace All then take a look in the Book View. Back to Code View - In the Find Box: Code:
<br class="calibre1" /> Code:
<p> Hit Replace All then take a look in the Book View. The reason for switching to and fro from Code View to Book View is that Sigil cleverly fixes up any incomplete tagging when you do this. Can't be sure this will work as a full solution because your sample is quite brief so you may need to manually adjust any glitches. Last edited by Faster; 03-19-2011 at 02:13 PM. |
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 16
Karma: 10
Join Date: Mar 2011
Device: iPad mini, iPhone, Kindle
|
Faster, you answer was perfect! Thanks also to theducks, but I found "Faster's" answer easier to execute and it worked very well. It made a really annoyingly formatted book into something very readable.
I have another question: Another book where I had a problem with <p class="calibre2"> I ran a regex of : Find: ([a-z])</p>\s+<p class="calibre2"> Replace: \1 that I saw somewhere. This worked great but now I have no paragraphs of any real note. Just wondering what would be the easy way to recreate some paragraphs. Any thoughts? I know I cannot find the original breaks but I am not hung up on the breaks being the same but just easier for reading. |
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Why not just a replace of <p class="calibre2"> to just <p> and create <p> to your liking in the stylesheet?
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
|
Hi again. First the expression I gave you can be cleaned by removing the backslashes within the character class. Realized this two minutes after posting, but I knew it wouldn't do any harm Here's the change:
Code:
([^.:\?\!"”'’0-9])<br class="calibre1" /> becomes ([^.:?!"”'’0-9])<br class="calibre1" /> Toxaris I think JustinD has performed the Regex and lost the original so he wants to know how to deal with the outcome. My first suggesion would be to download the original file again if possible and follow Toxaris' suggestion. If that's not an option you would need to decide at which punctuation characters you want to create a paragraph break. For example if you want the breaks where there is a full stop followed by a space: Find: Code:
(.)space Code:
\1<p> Code:
"That's my dog." he said. Example find: Code:
([.!?])space |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Edit MultiData Search and Replace to multiple fields | Doug-W | Library Management | 1 | 02-22-2011 03:17 PM |
erm, simple question , hope for simple answer! | al zymers | Amazon Kindle | 5 | 09-25-2010 01:01 PM |
Calibre Beginner Question | GonnaTri | Calibre | 0 | 06-07-2010 01:50 PM |
Very Simple Question | dreamhunter | Calibre | 2 | 03-27-2010 05:37 PM |
Simple question for a simple mind :) | PKFFW | OpenInkpot | 6 | 08-27-2009 09:00 PM |