Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-19-2012, 08:25 AM   #76
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,866
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by roger64 View Post
Successive Find and Replace

I wish to clean an html text which suffers from recurrent mistakes from an OCR engine (Cuneiform).

When I meet one the mistakes, I make a replacement and I note it. After some pages, I met most of the mistakes and now I intend to build a regex, adding as many as 15 successive simple search and replace like the following two.
A@ → à
B@ → ç
I do not know how to perform these 15 F&R within a simple regex.Suppose I would like to build it for the two above, what should I write?

Nota: I already use utf8 for the whole text.
I'm not sure what you're asking for is feasible. What you've described is something that would be more suited to an external program/algorithm (or a plugin) rather than one single Regular Expression. Finding all 15 with one expression wouldn't be the hard part... replacement based on "if/then" logic is where it would fall apart.
DiapDealer is online now   Reply With Quote
Old 06-19-2012, 10:07 AM   #77
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
OK. Thanks for your answer. I will try to find another solution
roger64 is offline   Reply With Quote
Old 06-19-2012, 10:47 AM   #78
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,762
Karma: 24088559
Join Date: Dec 2010
Device: Kindle PW2
You could create a simple sed script with one line for each character that you need to fix. E.g.

Code:
s/A@/à/g
s/B@/ç/g
Then simply save the lines as a utf8 text file (without BOM), e.g. fix.sed, and execute it with sed:

Code:
sed -f fix.sed -i *.html
(Note that this will overwrite the original files.)
Doitsu is offline   Reply With Quote
Old 06-19-2012, 11:07 AM   #79
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@Doitsu

Wow!! It's working very well! Thanks a lot!!
What means BOM?

Last edited by roger64; 06-19-2012 at 11:26 AM. Reason: success
roger64 is offline   Reply With Quote
Old 06-19-2012, 11:09 AM   #80
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,866
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Sorry, I was only thinking in terms of the F&R regex feature of Sigil.
DiapDealer is online now   Reply With Quote
Old 06-19-2012, 11:27 AM   #81
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by DiapDealer View Post
Sorry, I was only thinking in terms of the F&R regex feature of Sigil.
No sorry, me too
roger64 is offline   Reply With Quote
Old 06-19-2012, 11:28 AM   #82
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,762
Karma: 24088559
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by roger64 View Post
What means BOM?
BOM = byte order mark.

At least the Windows GNU sed port requires that both the .html files and the sed script be utf8 files without byte order marks. AFAIK, .html files created by Sigil are automatically saved without BOMs. I.e. you only have to make sure that the sed script doesn't have one either.

Quote:
Originally Posted by DiapDealer View Post
Sorry, I was only thinking in terms of the F&R regex feature of Sigil.
Every now and then you may want to widen your horizon.
But you are of course right, Sigil doesn't do sed.

That's when even rudimentary sed or Perl skills come in handy.
Doitsu is offline   Reply With Quote
Old 06-19-2012, 11:43 AM   #83
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,866
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Doitsu View Post
Every now and then you may want to widen your horizon.
But I suffer from acute agoraphobia.
DiapDealer is online now   Reply With Quote
Old 06-19-2012, 03:00 PM   #84
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
Posts: 13,684
Karma: 79983758
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Quote:
Originally Posted by roger64 View Post
@Doitsu

Wow!! It's working very well! Thanks a lot!!
What means BOM?
Byte Order Mark
PeterT is offline   Reply With Quote
Old 06-20-2012, 04:53 AM   #85
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thanks all for the lesson.
roger64 is offline   Reply With Quote
Old 06-22-2012, 07:05 PM   #86
soulafein
Enthusiast
soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.
 
Posts: 43
Karma: 29634
Join Date: Jun 2012
Location: Poland, Poznań
Device: Amazon Kindle Paperwhite 2
Hi! I'm looking for an expression that erase "- " but not " - ".
(example: sim- ple, not: word - word).
Could somebody help me??
soulafein is offline   Reply With Quote
Old 06-22-2012, 07:37 PM   #87
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by soulafein View Post
Hi! I'm looking for an expression that erase "- " but not " - ".
(example: sim- ple, not: word - word).
Could somebody help me??
search: ([a-z])-([a-z])

replace: \1\2

only if surrounded by lowercase letters BUT it also gets legitimate hyphenated words
theducks is offline   Reply With Quote
Old 06-22-2012, 07:48 PM   #88
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,866
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by soulafein View Post
Hi! I'm looking for an expression that erase "- " but not " - ".
(example: sim- ple, not: word - word).
Could somebody help me??
There's no real way of knowing that only complete words are on either side of the hyphen, but strictly in keeping with what you asked...

Find: (?<!\s)-\s Or: \w\K-\s
Replace: <empty/blank>

Please test first, and do keep in mind that there's many situations in normal written text where what you're looking for will (and should) occur. I certainly wouldn't suggest using "Replace all" but it may help you narrow down the occurrences enough where you can sign off on each and every replacement.

Last edited by DiapDealer; 06-22-2012 at 08:45 PM.
DiapDealer is online now   Reply With Quote
Old 06-22-2012, 07:55 PM   #89
goldilocks
Addict
goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.
 
Posts: 344
Karma: 1222222
Join Date: Aug 2009
Location: Florida
Device: Sony PRS-505
Help! I am clueless about regex. I have a Word document I saved as HTML Filtered (sure didn't seem to filter much!). I imported it into Calibre and converted to ePub. Between MSO and Calibre I ended up with over 41,000 rows in the CSS. Every paragraph has its own class. Examples:
<p class="MsoNormal79"><span class="calibre14">
<p class="MsoNormal80"><span class="calibre20">
<p class="MsoNormal81"><span class="calibre20">
<p class="MsoNormal82"><span class="calibre17">

I want them all to say:
<p class="paragraphtext">

Can I put something in find to replace them all at once?

Karen
goldilocks is offline   Reply With Quote
Old 06-22-2012, 09:07 PM   #90
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,866
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
You could very well end up with a disaster if you're not careful. I would start with the paragraphs first as spans can get a bit hairy.

If you're absolutely sure that you want to change everything that has a class name of "MsoNormalXX" (X being numerals) to "paragraphtext", then:

Find: <p class="MsoNormal\d+">
Replace: <p class="paragraphtext">

Make sure you have good backups in case things don't turn out the way you've planned.
DiapDealer is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 06:34 PM.


MobileRead.com is a privately owned, operated and funded community.