![]() |
Are there instances of hyphen after a period that you do not want to replace? If there aren't you can just replace all ".-" with ".¬" (where I use ¬ for the non-breaking hyphen), with appropriate escaping of the period if needed.
|
I'm sure that the Regex gurus will come up with a much more efficient Regex, but I'd simply search for a capital letter with a period followed by ‐ and another capital letter followed by a period:
Find: ([[:upper:]]\.)‐([[:upper:]]\.) Replace: \1‑\2 This should work in Sigl and any other Editor with PCRE support. |
Hi
Like for many things, I gather experience book after book. After preparing an history book, I realized that to use a hyphen for J.-C. (70 occurrences of it in one book) was NOT a nice idea. I have no idea how many words of this kind I may find and I am really not sure that all occurrences of .- should deserve the same treatment. That's why, I thought first to add them one by one. But, in fact, I realize there does not seem to be a very big risk to try your solutions. So I will try them. Thanks for them. :) And enjoy a Merry Chrismas. |
Well, try searching for ".-" first and see which occurrences you find. With any luck you'll see they all want to be non-breaking, or you may see a pattern (like Doitsu's suggestion) and find some typos ;)
|
found myself parsing messy html today, removing empty <p> tags, or <p> tags containing , or <p><i></i></p>, <p><b> </b><p> etc. so that i could space the paragraphs consistently in css, and, inspired by this thread, thought i'd share the snippet in case anyone has a use for it.
i realize it could probably be more concise, and i wouldn't just blindly replace all, but it seems to do the job. it removes <p> tags that may also contain <b>, <i>, <span>, have no content, or 1 or more spaces, or a <br>,<br/>,<br />. Code:
<p[^>]*>((<\w+[^>/]*>)+)?(<br((\s)?/)?>| |\s*)((</\w+[^>]*>)+)?</p> |
Simple question about quotes
I have two very basic questions. First about finding straight quotes (") and replacing them with curved opening quotes [A-Z] then the different curved closing quotes using maybe [," or ." or ?" or !"].
And here's why. . . I want to eventually find paragraghs with broken quotes. Paragraphs that have a “ (opening quote) but not a ” (closing quote). Does this make sense? So this is a two part question. Thanks so much . . . I know you brainyachs will have a solution. :thumbsup: I am using Sigil v 0.6.2 |
Quote:
Good luck :D How do you tell where the missing quote belongs? "That's nice," she said. "I'll take it from here." 'Smarty' can handle most straight=>curly conversions, but I don't think it comes with a Ouija board plugin :rofl: |
There would be an opening quote in one paragraph and the closing quote in the next paragraph. I want to bring the two paragraphs together. I also prefer the curved quotes to the straight quotes in books.
|
Quote:
Smarty (is used in Calibre and may be available elsewhere), can then do the curly conversion. |
This is simply something that regex doesn't lend itself very well to. Heuristic algorithms are better suited for this job (but can still fall short of being perfect).
|
OK, I will check into making changes in Calibre tho I would not know how to distinguish between Opening and Closing quotes when straight quotes are used in the original text. I thought Regex would probably be good at that.
My thanks to theducks and DiapDealer |
Quote:
|
I'm sorry I have gone through the thread, but couldn't find anything that would work for my needs. I've read the wiki article, tried in vain many different combinations, but I'm obviously doing something wrong.
I would appreciate a little help on this: I need to make a simple match to grab this thing with whatever characters within the id name: <h2 class="story_title" id="(whatever)"> Many many thanks! ----------Edited------------ I found one in the "Saved Searches" in Sigil, touched it up a bit and it did the job. (?sU)<h2([^>]*>.*) |
Quote:
Code:
<h2 class="story_title" id="(.+?)"> |
Quote:
Though, I'm not sure I understand this (not English-wise, but just what it's all about :)): Quote:
Thank you very much! |
Quote:
1) you are not in REGEX mode 2) you have accidentally included a Leading or trailing space in the search selection <<< I choose this one :D ( ) delineate the captured (stuff :D ) text |
Could anybody help me in this case?:
There is a code from EPUB: Quote:
I tried to use: (*[a-z])</p> <p class="calibre2">([a-z]*) <-- did not worked. Is said. that no match found. |
this should work, but i'd still go through the file and replace them one at a time. it'll match a </p> not preceded by a non-alphanumeric character (like ?, ., !, etc.)
Code:
find: |
Quote:
Excuse me, but it didn't work for me. No matches found .... |
Quote:
it matches in sigil, make sure you have regex mode turned on. |
theducks,
it was #1 :smack:. What a shame! :D Thank you! |
Find & fix quote in split paragraphs
In Sigil this expression has been helpful:
(“[^”\r\n]*)</p>\s+<p class="calibre."> Replace with (has a trailing space): \1 This indentifies paragraphs where a opening smart quote is not matched with a closing smart quote and joins that paragraph with the next one. Its not fool proof, but saves a lot of time. I use calibre conversion to switch straight quotes to smart quotes. Its under "Look Feel", check by "smarten punctuation". Easier to fix its mistakes than to find and fix 'em all. Good Luck! |
Quote:
In calibre, you can use the 'modify e-pub' plugin that can do the smarten punctuation, without a full conversion. |
I've read previous posts but my problem is either not covered in them or I simply missed it.
How can I, in the Replace field in Sigil, refer to part of the search regex given in the Find field? For example, the text "theAmerican" should be changed to "the American". The Search field is easy, "[a-z][A-Z]" but "[a-z] [A-Z]" does not work in the Replace field beacause Sigil replaces the regex text as a literal instead of keeping the lower case and the upper case letters, whatever they are. I have almost no knowledge of regular expressions, please help me in this. |
Quote:
Find: ([a-z])([A-Z]) Replace: \1 \2 For more information search for backreferences. |
It works perfectly, thank you so much, Doitsu!
|
I'm off topic here, as this is about a *.CBR and otherwise has nothing to do with Sigil, but the recent post reminded me I meant to inquire.
I've got a series of images with a page number added at the end, but the existing page numbering is a disaster. Is there a means of stripping any ending numbers (only), without removing numbers from other locations in the filename? Ultimately I want my output to look like: Terminator 2 -- ch14 pg023.jpg With preceding zeros as placeholders to force proper viewing order. I've tried: [0-9,3] to find the page numbers, but that removes all of the numbers in the example filename shown above. If I try appending the $, then I get no matches. I know it has to be something I am doing wrong. Adding page numbers back in is a straight %03d replacement which I've been doing as a second step after stripping the pages (it's a total renumber, nothing can be saved). PS: My apologies if this message needs to be moved, but I wasn't sure where else it might be more relevant within the forums. |
I take it you're extracting, using a proper re-name util, then re-packing.
Search : (.*pg)\d{1,3}(\.jpg) (stores all upto (and including) 'pg', discard the next digits, store extension) Replace \1<whatever inserts counter>\2 so if ? is a number counter char \1???\2 (or depending on regex you'd need $ instead of \ for group replacement. |
Perkin,
Thanks for understanding what I'm doing despite me leaving the extract/fix/repack process unmentioned. :) I did however leave you with a bad example - that was the output, not the input. The thing is, I cannot use "pg" as that is something I am adding. Basically the input files are variously named but with numbers at the end. So a more realistic input page might be: Terminator 2 23.jpg that I want to rename into: Terminator 2 -- ch14 pg023.jpg There are other naming issues, but I've managed to handle them. Perhaps not optimally, but they get the job done. I just can't seem to isolate the numbers at the end of the filename and strip them. To the best of my knowledge my bulk renamer is using python flavored regex. I haven't tried you code yet, but I will. |
--edit, not sure if this helps, unfamiliar with 'extract/fix/repack process'
i think you'd need to do a couple of passes to turn 1.jpg into 001.jpg, 11.jpg into 011.jpg, etc. Code:
(.*?\s)(\d)(\.jpg) |
@Sabardeyn, can you give a few more (differing?) example filenames - so we can see what's consistent or what isn't, with what you would like them mapped to.
What's the name of your Batch Rename app/script, I can then scan through it's docs and try and see what the correct replace would be. |
Joining Paragraphs when opening and closing quotes are not in same paragraph
Quote:
Thank you Muskrat. This answers my question from 1/13/13 perfectly! :) Step 1: I go into Calibre and change straight quotes to curley quotes, then Step 2: I open the book in Sigil and use your Regex suggestion and it works perfectly. At first it didn't work then I checked to see if I accidentally copied the blank space after your find expression, and I had. I backed the blank space out and it worked ;) I ♥ brainiacs! |
Quote:
Is there a way to find the reverse situation, to find paragraphs where there is an ending quote but was no starting quote at the beginning of the paragraph? |
Quote:
Code:
<p[^>]*>(?<!")(\w.+?")</p> |
I couldn't get that to work at all and was about to give up and then realized you didn't use the curly smart quotes. Once I changed it to smart quotes, it would work somewhat, but it will also pick up any sentence or paragraph that doesn't immediately start with a quote. So it would pick up paragraphs like this:
Pamela shuddered. “We’ve been making ourselves polite to a murderess.” And there's usually far too many of those types of sentences to want to read through over 500 of them to find the beginning quote buried further in. |
Sorry, this is such an obvious question and is probably answered somewhere but I didn't find it.
What would be the best way to find and eliminate page numbers such as: He glanced 190</p> <p class="calibre1">up at the big clock |
Quote:
Code:
\d+</p>\s+<p class=".*?"> |
Quote:
Code:
find: (<p[^>]*>)(?:\s+)?([^“]+?”)(?:\s+)?(</p>) |
Quote:
What about page numbers like this: <p class="calibre1">200</p> I used to be able to find them with the 'Wildcard' search and replace. I am using version version 0.6.2 of Sigil. Where has that feature gone? I ♥ brainiacs |
Quote:
|
| All times are GMT -4. The time now is 07:52 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.