![]() |
Quote:
Code:
<p class="let">([A-Z])(\w{0,20}\s)((\w{0,20}\s){3}) |
1 Attachment(s)
@mmat No, getting strange results..
@Perkin No, same as above. Sending the source odt as well |
I've found a problem/difference with replacing once and replacing all, using both my and mmat1's replace strings, when doing a step replace the last space is inside the span, but just doing a replace all, the last space is outside it.
Just posting a report in 0.5.3 release thread. |
Quote:
|
Using your epub and concatenating the two let1/2 styles and removing the now extraneous </span>, I get a odd result as well, so it's not the actual regex, it's the css :D
looking into it. @mmat1, tidy is off, and even if on shouldn't alter that anyway. |
Quote:
@perkin:You're right, Sigil shouldn't do this, but sometimes it does unexpected things... |
Until the css is sorted, you can just use mmat1's solution with(out) the combined let1/2 styles.
Code:
<p class="let">([A-Z])(\w{0,20}\s)((\w{0,20}\s){3}) |
Quote:
Code:
<p class="let">([A-Z])([^ ]{0,20}\s)(([^ ]{0,20}\s){3}) |
And I've noticed that if/because the spans are combined, the measurements are now from the new fontsize, which is 4.6em rather from default fontsize
|
First thanks to the three of you for your quick and efficient help.
Quote:
I will study it and check Perkin's measurements. :) Have some of you any idea how to execute one after another, in automatic mode, several regex? (not related to each other like these two) Is there a tool that can do it? |
Right, create a new style in the css (or adapt your existing one)
Code:
.let3{Then using Code:
<p class="let">([A-Z])([^ ]{0,20}\s)(([^ ]{0,20}\s){3})Code:
<p class="let"><span class="let3">\1</span><span class="smcpTypeV">\2\3</span> |
Quote:
It's not as complicated as it first looks, once you get to using it. If you do a sequence, that runs multiple s&r's over any files etc. They also do a few other brilliant regex products (testing/building/explaining) and a texteditor (which is my preferred editor). |
@Perkin
After successive refinements, this looks better indeed. I will try this to morrow. I go straight from odt to EPUB. I just tweak the EPUB a little after converting, but most of the work is done without touching directly any html file. I am a Linux user, but I see now which kind of tool can do it. My idea was to add to our existing EPUB converting program a kind of editable super macro (the user could insert or modify any Regex inside). But, if the user needs Power Grep to use it, it would be a self-defeating purpose. On the other hand, if Power Grep can help me prepare this kind of super macro which later could be used without it, as a kind of program, yes it would be worthwhile. |
Quote:
So I may mention the command-line oriented unix-tools sed or awk, which are available for windows as well. With awk you can do nearly anything, but you'll have to build some skills first... |
Quote:
|
Quote:
And it was not stated in any context of your skills, if it was understood this way i must apologize. |
Quote:
Not having used awk I just thought it was another grep style prog using regex - again my fault. (My brain is only half working at moment 'cause of medicaton - at least that's what I'm blaming it on :) - I think I've used today's lucidity quota up earlier, on the actual regex problem and css adjustment.) |
Quote:
I hope, you'll be better soon. |
I tried your solutions: both are working.
I have rather stay with mmat's though because with the second one I found it a little more tricky to finetune the dropcap position. BTW, a good part of my short science on dropcaps come from here. Thanks again and take care. NB: I'll have a look at "awk". |
I have a few documents in MS Word that I'm going to convert to ebooks. The documents have a lot of endnotes, and as usual, Word puts out a lot of junk when saved a html. I am very new to regex, and was wondering if I could get help. I want to search on:
Code:
<a href="#_edn1" name="_ednref1" title=""><span class="MsoEndnoteReference"><span class="MsoEndnoteReference"><b><span style="font-size:8.0pt;font-family:"Times New Roman","serif";color:black">[1]</span></b></span></span></a>Code:
<a href="#_edn1" name="_ednref1" title=""><sup>[1]</sup></a>Thanks in advance. |
Quote:
Code:
<a href="#_edn(\d+)" name="_ednref(\d+)" title=""><span class="MsoEndnoteReference"><span class="MsoEndnoteReference"><b><span style="font-size:8.0pt;font-family:"Times New Roman","serif";color:black">\[(\d+)\]</span></b></span></span></a>Code:
<a href="#_edn\1" id="_ednref\2" title=""><sup>[\3]</sup></a>Find: Code:
<a href="#_edn(\d+)" name="_ednref\d+" title=""><span class="MsoEndnoteReference"><span class="MsoEndnoteReference"><b><span style="font-size:8.0pt;font-family:"Times New Roman","serif";color:black">\[\d+\]</span></b></span></span></a>Code:
<a href="#_edn\1" id="_ednref\1" title=""><sup>[\1]</sup></a>EDIT: The above stuff is all based on the assumption that the <b>, <span>, and font-family/size stuff is identical in all of the original endnote code instances. You'd need to make judicious use of (.*?) if not. (and I had a mistake in the first edition of this post that I corrected) |
Thank you sir!!
Changing name= to id= is one of the first S&R I do on a document. You would think that Word would have changed over by now. |
Specifying space character in replace field?
With the old regex engine, I could use '\x20' to specify a space in the replacement pattern, but that no longer works in the current version.
Other than using a literal space, how do I specify a space character in the replace field? (I don't want to use a literal space, because I often save my s/r patterns in a development notes file, and they're hard to see in plain text.) |
You could use
Code:
& #32;Edit: I think you might only be able use that if the replace is part of text - not inside a tag. |
Quote:
After lots of experimenting, I discovered that I could use Code:
\U \EG |
Hi
It's just a small question. To select letters intended to become dropcaps, I use this part of a Regex: ([A-Z]) However, I realize this does not select accented capitals that do exist in French (like É, À, Ô and so on). Of course, I can just suppress their accents. But if I wish to make a drop-cap out of an accented capital, what would be the code? ([.]) is a catch-all. Have you better? |
Quote:
the dash just means range. the normal is any one of these. You can use both as I have |
Quote:
Code:
\p{Lu} |
@DiapDealer, theducks
Thanks very much for your replies. As this regex is intended to be used for French texts, I will use theducks' proposal. I just did not know one could add letters this way as I did not see any example of it. |
Quote:
|
@DiapDealer
Did not see you reply in time. It really needed your explanation. Yes of course, this is also a very convenient solution. I note it. Thanks again. |
Change Chapter text to Heading
How can I change in Sigil all the occurrences of "Chapter" like the following example:
Quote:
...or even "1", "2", "3",... with Quote:
Edit: Never mind I think I found the solution in JeremyR's post. Many thanks, JeremyR :2thumbsup |
You don't say what the original Chapter One looks like in code view. Just the text isn't sufficient to make sure the find/replace is correct.
Assuming you have Code:
<p>Chapter SOMETHING</p>Code:
<h1>Chapter SOMETHING</h1>Code:
Find: (?sU)<p>Chapter (.*)</p> |
Quote:
In Code View it is Quote:
Quote:
Don't know if I'm doing this right though :) I have more books with the same issue. I'll try with your code next time. Many thanks. |
Successive Find and Replace
I wish to clean an html text which suffers from recurrent mistakes from an OCR engine (Cuneiform). When I meet one the mistakes, I make a replacement and I note it. After some pages, I met most of the mistakes and now I intend to build a regex, adding as many as 15 successive simple search and replace like the following two. A@ → à B@ → ç I do not know how to perform these 15 F&R within a simple regex.Suppose I would like to build it for the two above, what should I write? Nota: I already use utf8 for the whole text. |
Quote:
|
OK. Thanks for your answer. I will try to find another solution
|
You could create a simple sed script with one line for each character that you need to fix. E.g.
Code:
s/A@/à/gCode:
sed -f fix.sed -i *.html |
@Doitsu
Wow!! It's working very well! Thanks a lot!! What means BOM? |
Sorry, I was only thinking in terms of the F&R regex feature of Sigil. :o
|
| All times are GMT -4. The time now is 07:52 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.