Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-08-2013, 05:01 AM   #241
abeonis
eBook DIYer
abeonis began at the beginning.
 
abeonis's Avatar
 
Posts: 111
Karma: 10
Join Date: Oct 2012
Location: Europe
Device: K4, KF HD 8.9, Readium
Quote:
Originally Posted by DiapDealer View Post
Code:
\p{Lu}
Will catch all upper-case letters (including unicode characters), if that's what you're looking for. Add parentheses to make it a capture group if desired, of course.
I didn´t know this one. I have twice this problem as I write in french and spanish (2x exotic characters) and want to use the same regex.
  • The fist method I used was the looooonnnnnngggggggg enumeration of all possible characters in both languages
  • Then I started using the interval [€-˙] that contains only the exotic characters and added it to other intervals
Your method is still more elegant

Last edited by abeonis; 06-08-2013 at 05:19 AM. Reason: I speak english as a spanish cow
abeonis is offline   Reply With Quote
Old 06-11-2013, 06:22 AM   #242
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 162
Karma: 86115
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by abeonis View Post
I didn´t know this one. I have twice this problem as I write in french and spanish (2x exotic characters) and want to use the same regex.
  • The fist method I used was the looooonnnnnngggggggg enumeration of all possible characters in both languages
  • Then I started using the interval [€-˙] that contains only the exotic characters and added it to other intervals
Your method is still more elegant
i also needed this periodically, good to have it brought up again.

a question for anyone who might know: my main tool is a text editor with a python regex interpreter. regular-expressions.info says that

"The only significant features missing from Python's regex syntax are atomic grouping, possessive quantifiers and Unicode properties."

am i right to understand that this pattern \p{L} falls under the category of unicode properties, and is not supported by python interpreters?

as an aside, i realize it's been mentioned in other threads, but it really would be great to have a forum for regex on MR, or a place to organize regex snippets to avoid having to look through this monster thread.
mzmm is offline   Reply With Quote
 
Advertisement
Old 06-11-2013, 08:01 AM   #243
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 2,029
Karma: 4836606
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by mzmm View Post
am i right to understand that this pattern \p{L} falls under the category of unicode properties, and is not supported by python interpreters?
I don't know which Unicode properties they referred to, but you could easily find out whether \p{L} is supported by the regex engine of your editor by actually using it.

For example the following regex, which works in Sigil, will find Greek text (due to its simple design, it'll also find double spaces).

[\p{Greek}| ]{2,}

To test it, just copy any Greek text (e.g. μὴ μοῦ τοὺς κύκλους τάραττε.) in a text file and use the above regex. If the regex engine of your Editor supports \p{L}, it should find the complete phrase (and the space before it).
Doitsu is offline   Reply With Quote
Old 06-11-2013, 09:51 AM   #244
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 228
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
A special problem of which I don't really know if it's to be solved with regex:
Due to justified setting in Sigil, there appear - more or less ugly - white spaces between some words in a paragraph. As I nevertheless like this setting, I would accept this. But is it possible to keep at least some characters together that really should not be separated, such as the simple quote signs, represented here: › verrückt ‹ - in relation to the ensuing/preceding word?

The problem is that the spaces don't appear as such in code view/book view, and also the preview shows the text correctly. Only on the device signs and words become - sometimes - separated. As far as I can see, "find/replace" doesn't recognize this issue, because it is not "set" by style or such, but the consequence of justifying the text.

Thankful for any suggestions.
Leonatus is offline   Reply With Quote
Old 06-11-2013, 10:01 AM   #245
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,059
Karma: 5939999
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by Leonatus View Post
A special problem of which I don't really know if it's to be solved with regex:
Due to justified setting in Sigil, there appear - more or less ugly - white spaces between some words in a paragraph. As I nevertheless like this setting, I would accept this. But is it possible to keep at least some characters together that really should not be separated, such as the simple quote signs, represented here: › verrückt ‹ - in relation to the ensuing/preceding word?

The problem is that the spaces don't appear as such in code view/book view, and also the preview shows the text correctly. Only on the device signs and words become - sometimes - separated. As far as I can see, "find/replace" doesn't recognize this issue, because it is not "set" by style or such, but the consequence of justifying the text.

Thankful for any suggestions.
If I understand
you are saying your device is breaking (padding) between characters?
Code:
"Nice to know becomes " Nice   to   know
when viewed (BV or device?)?
theducks is offline   Reply With Quote
Old 06-11-2013, 10:11 AM   #246
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 2,029
Karma: 4836606
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Leonatus View Post
But is it possible to keep at least some characters together that really should not be separated, such as the simple quote signs, represented here: › verrückt ‹ - in relation to the ensuing/preceding word?
Non-breaking spaces should work: Insert > Special Character > nbsp

You might also occasionally see extra spaces if the value of the paragraph's text-align property is set to justify.

Try changing it to left to see if it makes a difference. E.g.

Code:
<p style="text-align: left;">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
If it works, change the corresponding paragraph style in the stylesheet to:

Code:
p { text-align: left; }

Last edited by Doitsu; 06-11-2013 at 10:31 AM.
Doitsu is offline   Reply With Quote
Old 06-11-2013, 03:47 PM   #247
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 228
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Quote:
Originally Posted by theducks View Post
If I understand
you are saying your device is breaking (padding) between characters?
Code:
"Nice to know becomes " Nice   to   know
when viewed (BV or device?)?
Exactly, only that - strange enough - no "normal" (double) quotation marks are concerned (as far as I can see), but only the single ones, and - don't know if it matters - of this type that I showed (›...‹). But it looks on the device just as you demonstrated.

Quote:
Originally Posted by Doitsu View Post
Non-breaking spaces should work: Insert > Special Character > nbsp
I shall try this. But: with this option I would have to have a look at each case where the issue happens? No regex?

Quote:
Originally Posted by Doitsu View Post
You might also occasionally see extra spaces if the value of the paragraph's text-align property is set to justify.

Try changing it to left to see if it makes a difference. E.g.

Code:
<p style="text-align: left;">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
If it works, change the corresponding paragraph style in the stylesheet to:

Code:
p { text-align: left; }
Well, as I said, I would principally prefer the text jusificated (and rather accept issues of lower importance).

I have a Kobo Touch, and I'm always trying to format the text alignment with Sigil, and not so by the internal options of the reader. Perhapsd, I should look if there are any changes.

BTW: I'm using the Kobo Touch Extended plugin, but not the "Soft Hyphenate"-option, for it dissolves, in my case, any kind of text justification.

Anyway, thanks a lot! I appreciate!

Last edited by Leonatus; 06-11-2013 at 03:51 PM.
Leonatus is offline   Reply With Quote
Old 06-11-2013, 03:57 PM   #248
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 228
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
I just see that in the stylesheet the text justification does not appear as: text-align:, but as "display: block;". Does this matter?
Leonatus is offline   Reply With Quote
Old 06-11-2013, 07:12 PM   #249
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 162
Karma: 86115
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by Doitsu View Post
I don't know which Unicode properties they referred to, but you could easily find out whether \p{L} is supported by the regex engine of your editor by actually using it.
i mean, i tried the regex out before posting, and my text editor didn't find it. i was actually asking because i was more wondering if it was a deficiency with my text editor or a limitation of the engine (i really have no idea how a regex engine works, but i'm learning).

anyway, did some research, turns out i was wrong about the interpreter (i'm using sublime text 2 which, by the way, is really, really great, and it uses Boost, which i guess is a 'flavour'(?) of Perl).

boost does support this kind of expression but the syntax is slightly different. linking to the docs in case anyone else uses ST and wants the reference:


http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
mzmm is offline   Reply With Quote
Old 06-11-2013, 10:26 PM   #250
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,059
Karma: 5939999
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by Leonatus View Post
I just see that in the stylesheet the text justification does not appear as: text-align:, but as "display: block;". Does this matter?
Display: block is more of a container (box), not a text style

If you don't want big spaces then you can't use 'Justify' PERIOD as spaces is how it makes it fill the line (there is letter spacing, but many readers don't support that, so AVOID.)

& NBSP; is typically used to keep ONE Word pair from splitting (forces the spaces elsewhere). You can't just say DON'T Split between ALL words and expect Justify not to barf all over
theducks is offline   Reply With Quote
Old 06-12-2013, 04:44 AM   #251
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 228
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Quote:
Originally Posted by theducks View Post
& NBSP; is typically used to keep ONE Word pair from splitting (forces the spaces elsewhere). You can't just say DON'T Split between ALL words and expect Justify not to barf all over

I see. Spaces seem to be almost inevitable with the 'justify' setting (and without 'busy' hyphenation). I would accept that in relation to spaces between words. But punctuatiion marks should be close to the relative word. So, my question was only about the strange behaviour concerning the ›...‹ marks.

However, I fetched the book completely new, formatted it again in Sigil, loaded it up in Calibre and sent it to the device, and the result was that there are still some of these issues, but pretty much fewer, and where the spaces appear, they are much smaller than before. Don't know what it was. Remaining problems I hope to resolve with & NBSP;.

But, I admit, there are also spaces missing after punctuation marks.
Leonatus is offline   Reply With Quote
Old 06-12-2013, 09:39 AM   #252
PeterT
Taking a break; Fed up
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 7,182
Karma: 45264785
Join Date: Nov 2007
Location: Toronto
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
@Leonatus: I see your device is Kobo. Which calibre driver ARE you using? Kobo OR Kobo Extended?

If Kobo Extended, I'd suggest you try sending it with the basic Kobo driver and see if that makes a difference.
PeterT is offline   Reply With Quote
Old 06-12-2013, 12:00 PM   #253
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 228
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Quote:
Originally Posted by PeterT View Post
If Kobo Extended, I'd suggest you try sending it with the basic Kobo driver and see if that makes a difference.
I'm using, indeed, the Extended driver, and am, at present, largely satisfied with it (and guess I shall even be more after the release of the FW 2.6.1, because I love the header and footer, showing title and chapter!). Only a pity that the hyphenation option often doesn't work properly.

With the newly converted book I am so far happy, but out of curiosity, I'll obey to your suggestion.
Leonatus is offline   Reply With Quote
Old 06-12-2013, 12:51 PM   #254
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 228
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Hm, just coming from testing, but even more odd:
After switching on the standard driver (of course switching off the extended), I first fetched the book again, converted it and - just to try - applied the Hyphenate-This!-Plugin. Then sent to device.
Result: Issues showed above vanished, but instead, there are many, many punctuation marks of any kind (.,.,;,!, even quotation marks) that don't rest immediately after the last word of a line, but stand alone in the new line. Very ugly!

Removed the book, either from device as from bibliotheque. Fetched and converted the book again, now without using the Hyphenate-This!-Plugin.
Result: The same!

My God! What is this? Until now, I considered the basic driver as 'asset of the last resort' (please don't mind!) in the case that the Extended driver wouldn't work properly. But this time, the Extended works more consistent. Nevertheless, my impression is that's me who did something wrong.
Leonatus is offline   Reply With Quote
Old 06-12-2013, 04:31 PM   #255
PeterT
Taking a break; Fed up
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 7,182
Karma: 45264785
Join Date: Nov 2007
Location: Toronto
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
But what happens wiith the standard driver and NO hyphenate-this?

Each and every plugin is adding an extra degree of complexity.
PeterT is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 07:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 04:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 09:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 04:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 05:23 AM


All times are GMT -4. The time now is 09:48 PM.


MobileRead.com is a privately owned, operated and funded community.