Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2021, 03:48 AM   #1
Ghitulescu
Fanatic
Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.
 
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
Regular expression for removing blanks between letters

Some OCR softwares interpret/convert a spaced word as a suite of characters separated by blank spaces: like S w i t z e r l a n d. In some cases, these can be solved by hand (for instance only important concepts are widened), however, when entire paragraphs across the whole book are widened an automatized method would be very helpful.
I tried [a-z;A-Z].[a-z;A-Z]. and similar but these only identify the places, do not replace the correct letters.

I could not find any relevant thread, which I hope does not suggest there is no solution to this

Thank you for any hint or solution
Ghitulescu is offline   Reply With Quote
Old 01-25-2021, 06:25 AM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,512
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
The problem will be the correct space between two words. You would end up combining words by removing the space. I don't know how you would know a space between letters is in the word or between two words. You'll just have to fix this by hand.
JSWolf is offline   Reply With Quote
Advert
Old 01-25-2021, 08:23 AM   #3
Ghitulescu
Fanatic
Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.
 
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
Well, essentially there are very very very few one-letter words words like "a"
This is why I used a regex with two spaces.

The only practical solution is to use pairs of letters separated and preceded or followed by another space.
I hoped for a nice solution.
Ghitulescu is offline   Reply With Quote
Old 01-25-2021, 08:37 AM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,512
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Ghitulescu View Post
Well, essentially there are very very very few one-letter words words like "a"
This is why I used a regex with two spaces.

The only practical solution is to use pairs of letters separated and preceded or followed by another space.
I hoped for a nice solution.
That won't work either without error. By hand is your only reliable solution. But what you can do is use regex to try to find the problem words. I would go with letter space letter space letter space to find 3 letters with spaces. Then you can fix the words by hand. You could also try letter space letter space and see how that goes.

Last edited by JSWolf; 01-25-2021 at 08:40 AM.
JSWolf is offline   Reply With Quote
Old 01-26-2021, 08:09 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Ghitulescu View Post
Some OCR softwares interpret/convert a spaced word as a suite of characters separated by blank spaces: like S w i t z e r l a n d. In some cases, these can be solved by hand (for instance only important concepts are widened), however, when entire paragraphs across the whole book are widened an automatized method would be very helpful.
I would tackle this in multiple passes.

But as JSWolf has stated, you have to be extremely careful of combining letters/words that shouldn't be. Quite often, books will have things like "Person B" + "Project X" + "time y".

Example Sentence

Let's take this as an example:

Code:
<p>A decent example of S w i t z e r l a n d that I found within a G e r m a n example.</p>
Step 1

You replace the space between with a temporary character, like '+' or '¬'.

BUT, you want to handle single-letter words NOT "A", "a", or "I":

Search: \b([B-HJ-Zb-z]) ([B-HJ-Zb-z])\b
Replace: \1+\2

After you run this, you'll get:

Spoiler:
Code:
<p>A decent example of S+w+i+t+z+e+r+l a n+d that I found within a G+e+r+m a n example.</p>


Step 2

Then you want to match the "A", "a", or "I" between two already connected letters:

Search: (\+\w) ([aAI]) (\w)\b
Replace: \1+\2+\3

Spoiler:
Code:
<p>A decent example of S+w+i+t+z+e+r+l+a+n+d that I found within a G+e+r+m+a+n example.</p>


Those 2 Regexes should get you 95%+ of the way there.

From there, you have to manually check/correct. (Apostrophes, accents, emphasized words that start with 'a', or other odd cases.)

Step 3

Once you've completed everything, you replace the temporary '+' with a blank. That will merge the words together:

Search: \+
Replace: ***LEAVE THIS COMPLETELY BLANK***

Code:
<p>A decent example of Switzerland that I found within a German example.</p>
Step 3 (Alternate)

Or, if you wanted to keep the emphasis, you can do something like this:

First replace "1 letter + plus sign + 1 letter" with a span:

Search: (\w)\+(\w)
Replace: <span class="emph">\1\2</span>

Spoiler:
Code:
<p>A decent example of <span class="emph">Sw</span>+<span class="emph">it</span>+<span class="emph">ze</span>+<span class="emph">rl</span>+<span class="emph">an</span>+d that I found within a <span class="emph">Ge</span>+<span class="emph">rm</span>+<span class="emph">an</span> example.</p>


Then tackle the dangling single letters at the end (the "+d" in Switzerland):

Search: <span class="emph">(\w+)</span>\+(\w)
Replace: <span class="emph">\1\2</span>

Spoiler:
Code:
<p>A decent example of <span class="emph">Sw</span>+<span class="emph">it</span>+<span class="emph">ze</span>+<span class="emph">rl</span>+<span class="emph">and</span> that I found within a <span class="emph">Ge</span>+<span class="emph">rm</span>+<span class="emph">an</span> example.</p>


Then keep merging the "emph spans followed by a plus sign" by running this until there's 0 replacements left:

Search: <span class="emph">(\w+)</span>\+<span class="emph">
Replace: <span class="emph">\1

Code:
<p>A decent example of <span class="emph">Switzerland</span> that I found within a <span class="emph">German</span> example.</p>
Then, I highly recommend running DiapDealer's "TagMechanic" (Sigil) or "Diap's Editing Toolbag" (Calibre) to flip those <span>s into <em>.

I wrote step-by-step instructions last year in "How do I change italic <i> shortcut to use <em> instead?".

This will ultimately get you the final outcome you want:

Code:
<p>A decent example of <em>Switzerland</em> that I found within a <em>German</em> example.</p>

Last edited by Tex2002ans; 01-26-2021 at 08:42 PM.
Tex2002ans is offline   Reply With Quote
Advert
Old 01-27-2021, 04:06 AM   #6
Ghitulescu
Fanatic
Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.
 
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
Quote:
Originally Posted by Tex2002ans View Post
Search: \b([B-HJ-Zb-z]) ([B-HJ-Zb-z])\b
Replace: \1+\2
Thank you,
I have extended the query to fit 4-letter words (the nice ones ) and the work was substantially reduced, although still exists.
Ghitulescu is offline   Reply With Quote
Old 01-27-2021, 06:20 PM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Ghitulescu View Post
I have extended the query to fit 4-letter words (the nice ones ) and the work was substantially reduced, although still exists.
Can you show a before/after example of what you mean?

Quote:
Originally Posted by Ghitulescu View Post
Thank you,


And for future info, the "gap between letters" is called letterspacing. It's sometimes used as emphasis instead of bold/italics, and can be replicated using CSS:

Click image for larger version

Name:	Letterspacing.Emphasis.png
Views:	284
Size:	3.6 KB
ID:	185016

HTML:

Code:
<p>As much mud in the streets as if the waters had but newly retired from the face of the earth, and it would not be wonderful to meet a <em>Megalosaurus</em>, forty feet long or so, waddling like an elephantine lizard up <em>Holborn Hill</em>.</p>
CSS:

Code:
em {
	font-style: normal;
	font-weight: bold;
	letter-spacing: .2em;
}
For more detailed info, also see:

Especially all the posts in those two threads, we went into extremely detailed discussions about differences between italics/emphasis, bold/strong, plus different methods of application.
Tex2002ans is offline   Reply With Quote
Old 01-29-2021, 08:15 AM   #8
Ghitulescu
Fanatic
Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.
 
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
Code:
\b([A-Za-z]) ([A-Za-z]) ([A-Za-z]) ([A-Za-z])\b
as most words are 4+ letters long. Also, being foreign language, the elimination of I (first person) would have been counterproductive (lots of foreign glyphs are OCRed as I, for instance ïìîı, because they are longer than i, also l is considered as I in sans-serif fonts).

I could live with a handful of 3-letter long "escapees"

I know it was called letterspacing, but the use of this term would have forced me to rewrite the sentence once again I tried to use simple words
The OCR insert however spaces.
Ghitulescu is offline   Reply With Quote
Old 01-29-2021, 08:32 AM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,512
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
delete post
JSWolf is offline   Reply With Quote
Old 01-29-2021, 08:35 AM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,512
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Ghitulescu View Post
Code:
\b([A-Za-z]) ([A-Za-z]) ([A-Za-z]) ([A-Za-z])\b
as most words are 4+ letters long. Also, being foreign language, the elimination of I (first person) would have been counterproductive (lots of foreign glyphs are OCRed as I, for instance ïìîı, because they are longer than i, also l is considered as I in sans-serif fonts).

I could live with a handful of 3-letter long "escapees"

I know it was called letterspacing, but the use of this term would have forced me to rewrite the sentence once again I tried to use simple words
The OCR insert however spaces.
Again do it by hand. What if you have something like "o n a bus"? You would end up with "ona bus"

You cannot regex this away. You have to do it by hand because you will combine letters/words you do not want to.

Use the regex for searching. But do the fixing by hand.
JSWolf is offline   Reply With Quote
Old 01-29-2021, 09:19 AM   #11
Ghitulescu
Fanatic
Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.
 
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
Lucky me: where blanks are (whitespaces), the OCR inserts a double-space.

Yes, handwork is needed.
Ghitulescu is offline   Reply With Quote
Old 01-29-2021, 06:56 PM   #12
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Ghitulescu View Post
as most words are 4+ letters long.
And if you give more real-life examples, then the regex can be made more robust.

I tested both steps on the examples I gave, and it works perfectly fine on any 2+ single letters (not "a", "A", or "I") next to each other.

Quote:
Originally Posted by Ghitulescu View Post
Lucky me: where blanks are (whitespaces), the OCR inserts a double-space.
Good to hear. If those two-spaces only occur in the letterspaced words... then your life is easier, no fancy regex needed!

And can I ask:

Which OCR are you using?

Can you share an example page or something from this specific book?

I'd be interested in taking a look.

Quote:
Originally Posted by Ghitulescu View Post
Also, being foreign language, the elimination of I (first person) would have been counterproductive (lots of foreign glyphs are OCRed as I, for instance ïìîı, because they are longer than i, also l is considered as I in sans-serif fonts).
Which language? I was assuming English and no accents.

Yes, of course, different languages are going to have their own little single-letter-word quirks...

Like in Spanish, you'd want to avoid 'y' (since that = "and").

But then you would just swap out the [aAI] regex with a [yY] (or equivalent).

Accents, similar situation. You'll just have to make much uglier and harder-to-understand regex.

Quote:
Originally Posted by Ghitulescu View Post
I know it was called letterspacing, but the use of this term would have forced me to rewrite the sentence once again I tried to use simple words
Yep, it's just helpful when someone searches for a solution in the future for "how do I fix a gap between letters?".

Quote:
Originally Posted by JSWolf View Post
Again do it by hand. What if you have something like "o n a bus"? You would end up with "ona bus"

You cannot regex this away. You have to do it by hand because you will combine letters/words you do not want to.

Use the regex for searching. But do the fixing by hand.
You can use Regex to do the vast bulk of the corrections, then manually fix the edge cases.

Better/faster to do:
  • 95% correct with a 2-step regex.
  • 5% manually find/correct/fix.

than:
  • 100% manually fix.

And as usual, I've been pondering on how to get Spellcheck Lists to help you solve this issue more efficiently.

Instead of using a '+' or '¬', it might be better to use a period:

Code:
<p>A decent example of S.w.i.t.z.e.r.l.a.n.d that I found within a G.e.r.m.a.n example.</p>
This allows you to spot all of them easily in Sigil's or Calibre's Spellcheck Lists:

Click image for larger version

Name:	Spellcheck.List.-.Letterspacing.Fix.png
Views:	228
Size:	7.4 KB
ID:	185083 Click image for larger version

Name:	Spellcheck.List.-.Letterspacing.Fix.2.png
Views:	220
Size:	6.2 KB
ID:	185084

All merged words right there in a simple list.

Although the period will bring a few other minor issues (like "a.m." or "p.m."), but the amount of time you'll save is massive.

Last edited by Tex2002ans; 01-29-2021 at 07:13 PM.
Tex2002ans is offline   Reply With Quote
Old 01-29-2021, 08:51 PM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,512
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
The spellcheck lists is a very good idea.
JSWolf is offline   Reply With Quote
Old 01-30-2021, 03:31 AM   #14
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by Tex2002ans View Post
Although the period will bring a few other minor issues (like "a.m." or "p.m."), but the amount of time you'll save is massive.
Maybe convert all periods to ¬ before dealing with the spaces (and back to periods after)? [That's what you use protecting groups for in chemistry ]
Jellby is offline   Reply With Quote
Old 01-30-2021, 04:44 AM   #15
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Jellby View Post
Maybe convert all periods to ¬ before dealing with the spaces (and back to periods after)?
Yes yes, or any sort of similarly "rare" character would work.

Although I'm thinking a mass replace of . may cause the book to explode if you forget to flip back!

Because . is probably commonly be used outside of text (within class names, filenames, etc.):

Code:
<link href="../Styles/Style0001¬css" type="text/css" rel="stylesheet"/>
<a href="http://www¬mobileread¬com">
<img src="image¬jpg" />
Quote:
Originally Posted by Jellby View Post
[That's what you use protecting groups for in chemistry ]
I knew ¬ is "not" in Logic.

Or "a gun":

· ¬<(o.o<)

Didn't know the chemistry usage though. Looks like I have something new to read about.

Last edited by Tex2002ans; 01-30-2021 at 04:48 AM.
Tex2002ans is offline   Reply With Quote
Reply

Tags
blank characters, epub 2, regular expression


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What regular expression Eugeen Reading and Management 0 11-29-2019 12:38 PM
Is this possible using a regular expression? unabatedshagie Library Management 2 03-17-2016 09:47 AM
Please help me with regular expression :help: Tatjana Library Management 2 05-30-2014 05:41 PM
Regular Expression Help smartmart Calibre 5 10-17-2010 05:19 AM
Help with the regular expression Dysonco Calibre 9 03-22-2010 10:45 PM


All times are GMT -4. The time now is 02:49 AM.


MobileRead.com is a privately owned, operated and funded community.