Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-19-2012, 08:25 AM   #76
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 8,314
Karma: 36123946
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
Quote:
Originally Posted by roger64 View Post
Successive Find and Replace

I wish to clean an html text which suffers from recurrent mistakes from an OCR engine (Cuneiform).

When I meet one the mistakes, I make a replacement and I note it. After some pages, I met most of the mistakes and now I intend to build a regex, adding as many as 15 successive simple search and replace like the following two.
A@ → à
B@ → ç
I do not know how to perform these 15 F&R within a simple regex.Suppose I would like to build it for the two above, what should I write?

Nota: I already use utf8 for the whole text.
I'm not sure what you're asking for is feasible. What you've described is something that would be more suited to an external program/algorithm (or a plugin) rather than one single Regular Expression. Finding all 15 with one expression wouldn't be the hard part... replacement based on "if/then" logic is where it would fall apart.
DiapDealer is offline   Reply With Quote
Old 06-19-2012, 10:07 AM   #77
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,317
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
OK. Thanks for your answer. I will try to find another solution
roger64 is online now   Reply With Quote
Old 06-19-2012, 10:47 AM   #78
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 1,686
Karma: 4392001
Join Date: Dec 2010
Device: Kindle 3
You could create a simple sed script with one line for each character that you need to fix. E.g.

Code:
s/A@/à/g
s/B@/ç/g
Then simply save the lines as a utf8 text file (without BOM), e.g. fix.sed, and execute it with sed:

Code:
sed -f fix.sed -i *.html
(Note that this will overwrite the original files.)
Doitsu is offline   Reply With Quote
Old 06-19-2012, 11:07 AM   #79
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,317
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
@Doitsu

Wow!! It's working very well! Thanks a lot!!
What means BOM?

Last edited by roger64; 06-19-2012 at 11:26 AM. Reason: success
roger64 is online now   Reply With Quote
Old 06-19-2012, 11:09 AM   #80
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 8,314
Karma: 36123946
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
Sorry, I was only thinking in terms of the F&R regex feature of Sigil.
DiapDealer is offline   Reply With Quote
Old 06-19-2012, 11:27 AM   #81
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,317
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Quote:
Originally Posted by DiapDealer View Post
Sorry, I was only thinking in terms of the F&R regex feature of Sigil.
No sorry, me too
roger64 is online now   Reply With Quote
Old 06-19-2012, 11:28 AM   #82
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 1,686
Karma: 4392001
Join Date: Dec 2010
Device: Kindle 3
Quote:
Originally Posted by roger64 View Post
What means BOM?
BOM = byte order mark.

At least the Windows GNU sed port requires that both the .html files and the sed script be utf8 files without byte order marks. AFAIK, .html files created by Sigil are automatically saved without BOMs. I.e. you only have to make sure that the sed script doesn't have one either.

Quote:
Originally Posted by DiapDealer View Post
Sorry, I was only thinking in terms of the F&R regex feature of Sigil.
Every now and then you may want to widen your horizon.
But you are of course right, Sigil doesn't do sed.

That's when even rudimentary sed or Perl skills come in handy.
Doitsu is offline   Reply With Quote
Old 06-19-2012, 11:43 AM   #83
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 8,314
Karma: 36123946
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
Quote:
Originally Posted by Doitsu View Post
Every now and then you may want to widen your horizon.
But I suffer from acute agoraphobia.
DiapDealer is offline   Reply With Quote
Old 06-19-2012, 03:00 PM   #84
PeterT
Taking a break; Fed up
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 6,102
Karma: 38639832
Join Date: Nov 2007
Location: Toronto
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
Quote:
Originally Posted by roger64 View Post
@Doitsu

Wow!! It's working very well! Thanks a lot!!
What means BOM?
Byte Order Mark
PeterT is online now   Reply With Quote
Old 06-20-2012, 04:53 AM   #85
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,317
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Thanks all for the lesson.
roger64 is online now   Reply With Quote
Old 06-22-2012, 07:05 PM   #86
soulafein
Enthusiast
soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.soulafein is faster than slow light.
 
Posts: 25
Karma: 29634
Join Date: Jun 2012
Device: Amazon Kindle Touch
Hi! I'm looking for an expression that erase "- " but not " - ".
(example: sim- ple, not: word - word).
Could somebody help me??
soulafein is offline   Reply With Quote
Old 06-22-2012, 07:37 PM   #87
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 13,607
Karma: 5126946
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by soulafein View Post
Hi! I'm looking for an expression that erase "- " but not " - ".
(example: sim- ple, not: word - word).
Could somebody help me??
search: ([a-z])-([a-z])

replace: \1\2

only if surrounded by lowercase letters BUT it also gets legitimate hyphenated words
theducks is offline   Reply With Quote
Old 06-22-2012, 07:48 PM   #88
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 8,314
Karma: 36123946
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
Quote:
Originally Posted by soulafein View Post
Hi! I'm looking for an expression that erase "- " but not " - ".
(example: sim- ple, not: word - word).
Could somebody help me??
There's no real way of knowing that only complete words are on either side of the hyphen, but strictly in keeping with what you asked...

Find: (?<!\s)-\s Or: \w\K-\s
Replace: <empty/blank>

Please test first, and do keep in mind that there's many situations in normal written text where what you're looking for will (and should) occur. I certainly wouldn't suggest using "Replace all" but it may help you narrow down the occurrences enough where you can sign off on each and every replacement.

Last edited by DiapDealer; 06-22-2012 at 08:45 PM.
DiapDealer is offline   Reply With Quote
Old 06-22-2012, 07:55 PM   #89
goldilocks
Addict
goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.goldilocks ought to be getting tired of karma fortunes by now.
 
Posts: 201
Karma: 1028386
Join Date: Aug 2009
Location: Florida
Device: Sony PRS-505
Help! I am clueless about regex. I have a Word document I saved as HTML Filtered (sure didn't seem to filter much!). I imported it into Calibre and converted to ePub. Between MSO and Calibre I ended up with over 41,000 rows in the CSS. Every paragraph has its own class. Examples:
<p class="MsoNormal79"><span class="calibre14">
<p class="MsoNormal80"><span class="calibre20">
<p class="MsoNormal81"><span class="calibre20">
<p class="MsoNormal82"><span class="calibre17">

I want them all to say:
<p class="paragraphtext">

Can I put something in find to replace them all at once?

Karen
goldilocks is offline   Reply With Quote
Old 06-22-2012, 09:07 PM   #90
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 8,314
Karma: 36123946
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
You could very well end up with a disaster if you're not careful. I would start with the paragraphs first as spans can get a bit hairy.

If you're absolutely sure that you want to change everything that has a class name of "MsoNormalXX" (X being numerals) to "paragraphtext", then:

Find: <p class="MsoNormal\d+">
Replace: <p class="paragraphtext">

Make sure you have good backups in case things don't turn out the way you've planned.
DiapDealer is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 12:16 PM.


MobileRead.com is a privately owned, operated and funded community.