Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-27-2016, 10:01 AM   #496
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,517
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by ReaderRabbit View Post
OK, here is a simple question for ya. In Sigil (0.7.4), I have a book where there is no separation between sentences. I am using this to find them: ([a-z])([\.\,\?\!])([A-Z])
which works perfectly. But what do I use in replace to move the new sentence over one space? There is over 3500 found and I don't want to insert a space manually for that many errors. Any suggestions?
How about: \1\2 \3
Toxaris is offline   Reply With Quote
Old 07-27-2016, 10:28 AM   #497
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 23,073
Karma: 24012262
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: K4NT, Galaxy Tab A, Kobo Aura2
that might miss those that start or end with quotes
Code:
 ([a-z])([\.\,\?\!]["]*)(["]*[A-Z])
(I only show straight quotes. 0 or 1)
theducks is offline   Reply With Quote
Advert
Old 07-27-2016, 10:40 AM   #498
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,274
Karma: 83106403
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by Psymon View Post
Hope it's okay for a veritable Regex newbie to post a query in this thread -- I'm only just beginning to learn about this stuff, but with any like it'll eventually start sinking in.
Ah, that is exactly what this thread is for.
Certainly we don't expect the people who already know the answer to ask questions...

Quote:
I seem to have developed an affinity for doing up electronic versions of "ye olde bookes" -- for example, right now I'm doing up several Shakespeare plays in the original Elizabethan English, endeavouring to give it somewhat of the "look and feel" of early typographic styles, complete with use of the long-ess (i.e. "ſ", the character that looks like an "f" but without the crossbar, and is actually an "s"). Along with the unusual use of the "u" and "v" characters in early typography, where an "ſ" is use instead of "s" has to do with placement within a word, rather than the "sound" of the character or anything else like that.

Very often when I find digital transcriptions of these early texts, they've kept the "u" and "v" oddities, but for some reason have changed all the long-esses to just "s" instead -- and so I have to change them back. The rule for when this is supposed to occur is actually fairly simple (although not all early printers/typographers followed this, but the vast majority did): virtually every instance of "s" should be changed to "ſ" unless it falls at the end of the word, then it remains as "s."

So to fix my texts up, I've been searching for every instance of "s" and then changing it to "ſ" -- which right away causes all my HTML code to need to be fixed up first, because things like "css," "class," "span," etc. get screwed up in the process -- and then I do another series of searches, looking for instances of "ſ" (long-ess) plus a "." or "," or ":" or ";" or "?" or "!" or ")" or "[space]" or "[apostrophe -- curly or otherwise], plus "<" should there be a closing </i> or </p> tag or something, i.e. wherever it might occur at the end of a word, and then changing it back to "s" again.

It's not that big a deal, actually, I can "correct" the long-esses in a gtwhole book in, like, 5 or 10 minutes or so, but it would be totally cool to just whiz it off with one, single regex search, of course.

Oh, and it would have to be case-sensitive, of course -- all instances of upper-case "S" remain as "S."

[snip -- same thing with other character substitutions]
Case-sensitivity is a setting in the S&R box.

Using the power of lookaround zero-length assertions and word boundary zero-length assertions, the following regex will find a character-that-is-not-at-the-end-of-a-word (in this case "s") that is not inside HTML tags:

Find:
Code:
(?<=>[^<]*)s\B(?=[^>]*<)
Replace: (you guessed this one already, right?)
Code:
ſ

Explanation:
  1. Just check for a tag closing character ">", followed by zero or more characters-that-aren't-a-tag-opener-"<"... wrapped in a lookbehind, so you don't clutter up the actual match.
  2. Followed by a random character -- whatever you are looking for, in this case "s" -- followed by a negated word boundary zero-length assertion "\B".
  3. Followed by zero or more characters-that-aren't-a-tag-closer-">" followed by a tag opener "<"... again wrapped in a lookahead, so you don't clutter up the actual match.

Last edited by eschwartz; 07-27-2016 at 10:53 AM.
eschwartz is offline   Reply With Quote
Old 07-27-2016, 10:43 AM   #499
ReaderRabbit
Member
ReaderRabbit began at the beginning.
 
ReaderRabbit's Avatar
 
Posts: 24
Karma: 10
Join Date: Mar 2011
Location: Colorado
Device: Cruz Tablet
Quote:
Originally Posted by Toxaris View Post
How about: \1\2 \3
Wonderful! Worked perfectly
ReaderRabbit is offline   Reply With Quote
Old 08-20-2016, 10:57 AM   #500
Leonatus
Guru
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 660
Karma: 7077424
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Hi all Regex cracks!
Is there a way to remove by one expression all anchor tags in an epub with the following syntax:
Code:
<a name="pagexx" title="yy" id="pagexx"></a>
where xx stands for the page number, and yy for diverse abbreviations of former issuers.

Maybe it's even not so difficult, but It's too much for my poor old brains.

Thanks in advance!
Leonatus is offline   Reply With Quote
Advert
Old 08-20-2016, 11:27 AM   #501
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 23,073
Karma: 24012262
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: K4NT, Galaxy Tab A, Kobo Aura2
Quote:
Originally Posted by Leonatus View Post
Hi all Regex cracks!
Is there a way to remove by one expression all anchor tags in an epub with the following syntax:
Code:
<a name="pagexx" title="yy" id="pagexx"></a>
where xx stands for the page number, and yy for diverse abbreviations of former issuers.

Maybe it's even not so difficult, but It's too much for my poor old brains.

Thanks in advance!
Yes
Select an example, ctrl-F (This also puts the selection in Find)
Right click in the find box: Tokenize

Replace should be: either blank or a space
you could also do it the other way:
replace each SET of the numbers with a \d+ (one or more digits, an Integer)
theducks is offline   Reply With Quote
Old 08-20-2016, 11:37 AM   #502
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 1,761
Karma: 11819190
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone X/6/iPad 1,2 & Air/Surface Pro/Kindle PW
^^^ What theducks said.
eg:
find: <a name="page\d+" title="\d+" id="page\d+"></a>
replace: blank
if what you are replacing is JUST numbers

-or-

find: <a name="page(.*?)" title="(.*?)" id="page(.*?)"></a>
replace: blank
if what you are replacing can include letters or symbols.
Turtle91 is offline   Reply With Quote
Old 08-20-2016, 11:37 AM   #503
Leonatus
Guru
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 660
Karma: 7077424
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Thank you! It works while the issuer names are identic. But in fact, there are several names. It should be possible to catch them all. (Some names are separated by slashes, b.t.w.)
This refers to theducks' answer.

Last edited by Leonatus; 08-20-2016 at 11:40 AM.
Leonatus is offline   Reply With Quote
Old 08-20-2016, 11:41 AM   #504
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 1,761
Karma: 11819190
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone X/6/iPad 1,2 & Air/Surface Pro/Kindle PW
Ooops - ninjad you Leonatus!
Turtle91 is offline   Reply With Quote
Old 08-20-2016, 12:22 PM   #505
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,516
Karma: 14553849
Join Date: Dec 2010
Device: Kindle PW2
@Leonatus:

Use the following quick and dirty regex:

Code:
<a name=".*?" title=".*?" id=".*?"></a>
Doitsu is offline   Reply With Quote
Old 08-20-2016, 12:28 PM   #506
Leonatus
Guru
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 660
Karma: 7077424
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Thank you all! Works like a charm! Fantastic!

(But I committed the error to leave the "dot all" and "minimal match" boxes checked. The result is the loss of big parts of text. So, whoever wishes to take profit from this item, take care!)

Last edited by Leonatus; 08-20-2016 at 12:40 PM.
Leonatus is offline   Reply With Quote
Old 09-16-2016, 02:26 PM   #507
Psymon
Chief Bohemian Misfit
Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.
 
Psymon's Avatar
 
Posts: 571
Karma: 427486
Join Date: May 2013
Device: iPad, ADE
Hey, folks -- I am trying to learn/do this regex stuff on my own (however slowly)! I'm stumped on something that I would think should be fairly easy, though.

In my book, I've got almost 300 paragraphs that start off with a dropcap, with this being an example of how those paragraphs begin...

Code:
<span class="initial">H</span>onourable
What I want to do is make that first word in smallcaps, and so the code in this latter example would then be...

Code:
<span class="initial">H</span><span class="smallcaps">ONOURABLE</span>
So basically what I want to do is convert the case of that first word to uppercase and then wrap that smallcaps span around the relevant part of the word.

For my regex search I initially came up with this...

<span class=\"initial\">(.+?)</span>([^>]*)\s

...and for replace this...

<span class="initial">\1</span><span class="smallcaps">\U\2\E</span>

...(and in this latter there's an invisible space there that I suppose you won't "see" in this post -- but it would be there in my S&R, of course).

For the life of me, though, that \s won't stop at the first space, that is, after the first word -- it selects the entire paragraph up to the last space in the paragraph! -- and it's also possible that there might actually be not a space, but a comma (or other punctuation) instead, and I'd like that closing span (for my smallcaps) to come before that.

I've searched around the 'net trying to find the solution to this, but just can't seem to find it -- every "answer" that I find on other sites and try just doesn't seem to work.

Thanks in advance, if anyone can help!

(PS. I'm not sure if my "replace" code is correct either, actually -- although I never got that far with figuring this out!)

Last edited by Psymon; 09-16-2016 at 02:35 PM.
Psymon is offline   Reply With Quote
Old 09-16-2016, 03:58 PM   #508
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 20,126
Karma: 103978634
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Psymon View Post
For my regex search I initially came up with this...

<span class=\"initial\">(.+?)</span>([^>]*)\s
Try this instead:
Code:
<span class="initial">(.+?)</span>(\w*)
You don't need to escape the quotation marks in your search criteria. The \w will match only word characters, which means it will stop before any punctuation that might occur (an apostrophe in the word you want to smallcap will trip this up).

If any unicode characters can be expected, you may want to make the \w unicode-aware with the (*UCP) command.
Code:
(*UCP)<span class="initial">(.+?)</span>(\w*)
The (.+?) part can be a bit greedy. If one letter is all that's ever expected, I'd probably use (\w) instead.
Code:
(*UCP)<span class="initial">(\w)</span>(\w*)
If an opening quote may be in the raised|dropped cap as well, then explicitly include it (making it optional of course):
Code:
(*UCP)<span class="initial">(“?\w)</span>(\w*)
Probably gonna play hell on one-letter drop|raised-cap words, too ("I" and "A"), though.

Quote:
Originally Posted by Psymon View Post
...and for replace this...

<span class="initial">\1</span><span class="smallcaps">\U\2\E</span>
The replace should work fine as is.

To eliminate the issue of one-letter word drop/smallcaps, I'd probably do something like.
FIND:
Code:
(*UCP)<span class="initial">“?\w</span>\K(\w*)
REPLACE:
Code:
<span class="smallcaps">\U\1\E</span>
EDIT: None of the optional regex search options should be checked (other than maybe the "wrap" option) for any of my examples, by the way.

Last edited by DiapDealer; 09-16-2016 at 04:07 PM.
DiapDealer is online now   Reply With Quote
Old 09-16-2016, 04:39 PM   #509
Psymon
Chief Bohemian Misfit
Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.
 
Psymon's Avatar
 
Posts: 571
Karma: 427486
Join Date: May 2013
Device: iPad, ADE
Quote:
Originally Posted by DiapDealer View Post
Try this instead:
<big snip>

Thank you so, so much, DiapDealer! That did indeed seem to do the trick! I know I did have at least one (maybe more) one-letter opening words, but I'll find out eventually if anything went funny there -- once my book is done, I'll be going through the entire thing page-by-page (several times, in different orientations, etc.) too look for any weirdness going on anywhere.

In the meantime, though, that does seem to do the have done the trick! And thank you so much, too, for your detailed explanation of everything -- I'll study that more closely as well, and do my best to learn from it!

Psymon is offline   Reply With Quote
Old 09-16-2016, 05:23 PM   #510
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 20,126
Karma: 103978634
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Glad to help. Good luck!
DiapDealer is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 07:20 AM.


MobileRead.com is a privately owned, operated and funded community.