Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-06-2014, 09:20 PM   #1
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
Posts: 6,147
Karma: 9489060
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch (Wifi only)
Saved Search examples

If anyone has useful Saved Searches they would like to share, you can share them in this thread.

Generic rules to fix common problems, for example.
Or just anything clever and cool which you are proud of and want to admire.



NOTE: To make it easier to read, it would be nice if all Search & Replace fields were wrapped in the
[CODE]content goes here[/CODE]
tags.

Also, you can export the saved search as a .json, and upload it here in a zipped folder.

Moderator Notice
This thread has been made a sticky, and unlike most other sticky threads, this one is open to all who have a useful saved Search/Replace they wish to share. Do not use this thread to ask any questions. Start a new thread. Posts that don't belong here will be deleted or moved, but you are encouraged to post if you have something to share.

Please add a descriptive title to each post and explain what your Saved Search accomplishes.

Last edited by DoctorOhh; 04-08-2014 at 01:14 AM. Reason: added some formatting/sharing guidelines
eschwartz is offline   Reply With Quote
Old 04-07-2014, 07:14 AM   #2
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,056
Karma: 1444487
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Finding and joining broken paragraphs

([a-z])</p>whatever else is in the middle here<p>([a-z])

Replace with \1space\2

With case sensitive ticked.

Doesn't get absolutely everything, but can be used very quickly.

Potshots welcome from people who actually know regex welcome. I just guess and see if it works!
mrmikel is offline   Reply With Quote
Old 04-07-2014, 07:18 AM   #3
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,056
Karma: 1444487
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Uncapitalized letters after period and quote

\.” [a-z]

Case sensitive ticked.

No easy replace in this case, but at least you can find it. Remove quote to just find sentences uncapitalized without the quote.
mrmikel is offline   Reply With Quote
Old 04-07-2014, 04:17 PM   #4
arspr
None
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 396
Karma: 245000
Join Date: Mar 2011
Location: Madrid, Spain
Device: Sony PRS-650 >>> Kobo Aura HD
Preventing line wraps around dashes in Spanish dialogues

As dashes are wrap points in HTML, dialogues in Spanish ebooks can look terrible.

Example in one line:
Code:
—Bla, Bla, Bla, —John said—. More bla, bla, bla.
Wrong:
Code:
—Bla, Bla, Bla, —John said
—. More bla, bla, bla.
Wrong:
Code:
—Bla, Bla, Bla, —
John said—. More 
bla, bla, bla.
Right:
Code:
—Bla, Bla, Bla, —John 
said—. More bla, bla, bla.

The next two following searches add a <span> around the partner word with a specified class. (In my example just <span class="nw">).

Then add the next CSS definition for this class:
Code:
.nw { white-space: nowrap;}
and you will have prevented the wrong wrapping in Spanish books.

First S&R
Search:
Code:
\x20(—|–|&mdash;|&ndash;)([^ <]+)( |</p>|</div>)
Replace:
Code:
\x20<span class="nw">\1\2</span>\3
Second S&R
Search:
Code:
\x20([^ >]+)(—|–|&mdash;|&ndash;)(\.|\.\.\.|,|;|:|…|&hellip;)?\x20
Replace:
Code:
\x20<span class="nw">\1\2\3</span>\x20
Additional usage notes
Spoiler:
  • Yes, you need both S&R and in that order.
  • Do not forget about setting up the additional CSS style or it would be useless.
  • As you can see they look for dashes and just dashes (in unicode or in named entity flavour). Some horribly formatted books use minus signs that these searches won't catch.
  • Case Sensitive or Dot All settings are probably irrelevant but I've got them in OFF.
  • Because of the [^ <]+ and [^ >]+ parts of the Searches they are completely safe to use. I mean they won't catch and destroy code like:
    Code:
    —Bla, Bla, Bla, —<b>John</b> <i>said</i>—. More bla, bla, bla.
    They will just ignore it. You will never get something wrong like:
    Code:
    —Bla, Bla, Bla, <span class="nw">—<b>John</span></b> <i><span class="nw">said</i>—.</span> More bla, bla, bla.
    You'll have to manually fix this kind of situations.
  • Using them where dashes are used as sentence or word separators is also safe:
    Code:
    First sentence—Second sentence.
    This situation, pretty common in English books, is also ignored.
  • As hinted in other thread I've used \x20 for the starting and ending spaces needed in the regexes, in order to make them clearly visible.
  • Obviously there's no point in adding a <span> around the very first starting dash and word, and these searches don't do that.
  • Strange situation that I remember having found once or twice. If there's some kind of CSS setting directly on <span> tags then it will be also applied to the newly created tags. I remember suffering a
    Code:
    span {font-size: 1.3em;}
    which I had to override with
    Code:
    .nw {font-size: 1em; white-space: nowrap;}
    while not losing where it was being originally applied.

Last edited by arspr; 04-07-2014 at 04:22 PM.
arspr is offline   Reply With Quote
Old 05-21-2014, 10:46 AM   #5
Zajora
Junior Member
Zajora began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2014
Device: Kindle Keyboard
I have created a fair number of regex fixes. I make changes to them every so often, so I'll probably edit this post if I do. I'd use code tags, but they take up too much room. If a regex should be replaced with a space, it will say "space". If there is nothing, then (logically) it should be replaced with nothing.

Of course, there is no guarantee any of these will work properly. I always check them a number of times before doing replace all, since there are a TON of ways eBooks can have formatting that wrecks these regexes.

Scenario: Apostrophes have been replaced with double quotes
Match: (?<=\w)(“|”)(?=\w)
Replace:

Scenario: There is a linebreak in the middle of a character's dialogue
Match: (?<=“[^”]*)</p>\s*<p[^>]*>(?!“)
Replace: space

Scenario: A tag closes, is followed by 0+ spaces or newlines, is then reopened and is then followed by a lowercase letter
Match: </(?P<tag>\w+)>\s*<(?P=tag) [^/>]+>(?=[a-z])
Match: (?<![".!?>*”“…~’])</(?P<tag>\w+)>\s*<(?P=tag) [^/>]+>
Replace: space
Notes: The second one is an alternate, which I think is better, but I'm not 100% sure it covers all the cases of the former.

Scenario: "LL" Ligatures have been replaced with a single "L".
Match: (l (?=(y|s|ed|ey|ion|en|ar|ars|er|ow|et|owed|enge|age |enging|ected|egal|ections|ect|apse|ular|op|owing| ocks|ied|ier|ies|ing|ingly|ered|icit|est)(\W)))|(l (?![(–<-])(?=\W))|(?<=’)l(?=\W)|(?<= (wi|du|a|we|te|sma|ca|sti|fu|fa|chi|sha|wa|pha|se| bi|ha|ki|pu|ce|ba|ski|hi|fi|fe|he|ro|ta|i|sme|bri| sta|we))l(?=\W)
Replace: ll
Notes: This regex doesn't really work that well, but it's faster than doing it manually. I would recommend using the spellcheck afterwards and catching the most common ones. This regex is actually a bunch of individual ones chained together by ORs (|) so it's easier to see what's doing what.

Scenario: More than 1 space in a row
Match: (?<=\S) {2,}(?=\S)
Replace: space

Scenario: There are tags (which may be nested) that are either empty or just have a number in them
Match: (<[^/>]*>)+\s*\d*\s*(</[^>]*>)+
Replace:
Notes: This may remove things you'd like to keep, such as scenebreaks/whitespace, or the chapter links.

Scenario: There's a linebreak or spaces before a closing tag
Match: (?<![".!?>*”“…~’])</(?P<tag>\w+)>\s*<(?P=tag) [^/>]+>
Replace:

For future use:

Scenario:
Match:
Replace:

Last edited by Zajora; 05-21-2014 at 10:51 AM.
Zajora is offline   Reply With Quote
Old 05-22-2014, 08:03 PM   #6
Section8
Connoisseur
Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.
 
Section8's Avatar
 
Posts: 98
Karma: 1129710
Join Date: Oct 2011
Location: Arlington, TX
Device: Nook Color (daughter), Sony PRST1
I have a nook, and the only real regexes I've written are for fixing stylesheets to work around its margin bug: if "publisher defaults" are disabled, the nook doesn't handle the css "margin" setting. I've been using these to convert all 4 forms of "margin" to the equivalent margin-top, margin-right, etc. These were written for Sigil, but I *think* they work in the calibre editor.

First: find margin:
Find: margin *:

Convert margin: a (single value):
Find: margin *: *([^\s;]+)(\s*(;|}))
Replace: margin-top: \1; margin-right: \1; margin-bottom: \1; margin-left: \1\2

Convert margin: a, b (2 values)
Find: margin *: *([^\s;]+) +([^\s;]+)([\s]*(;|}))
Replace: margin-top: \1; margin-right: \2; margin-bottom: \1; margin-left: \2\3

Convert margin a, b, c (3 values)
Find: margin *: *([^\s;]+) +([^\s;]+) +([^\s;]+)([\s]*(;|}))
Replace: margin-top: \1; margin-right: \2; margin-bottom: \3; margin-left: \2\4

Convert margin a, b, c, d (4 values)
Find: margin *: *([^\s;]+) +([^\s;]+) +([^\s;]+) +([^\s;]+)(\s*(;|}))
Replace: margin-top: \1; margin-right: \2; margin-bottom: \3; margin-left: \4\5
Section8 is offline   Reply With Quote
Old 06-19-2014, 02:17 PM   #7
user743
Addict
user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.
 
Posts: 241
Karma: 44444
Join Date: Mar 2014
Device: Kindle PW2 special offers removed by Amazon for FREE
switch script links to html links.
Code:
<script>	AddIndex\("(.+?)", (".+?"), ".+?"\); </script>
<a href=\2>\1<a>
regex. not case sensitive. dot all.
change double quotes to single quotes if necessary.
user743 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Saved Search: How 2 theducks Library Management 3 11-27-2012 02:10 PM
Editing a saved search crossi Library Management 3 06-26-2012 12:12 PM
[0.8.33] Deleting Saved Search Will Not Stick nynaevelan Library Management 8 01-13-2012 09:42 PM
Remove Saved Search TonytheBookworm Calibre 3 12-10-2010 11:45 AM
Need help setting up a saved search ficbot Calibre 1 07-10-2010 02:10 PM


All times are GMT -4. The time now is 10:46 AM.


MobileRead.com is a privately owned, operated and funded community.