06-16-2012, 05:14 PM | #76 |
Not who you think I am...
Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
|
It should, of course, be optional -- and consistent, no mixing. Often, in the past, I have had the impression that Sigil is abstracting extended characters somehow, which helped make regex unstable.
I prefer the named entities, myself -- I like to easily distinguish between ' and ‘/’, for instance, or hyphen-–-—, •-·, etc. Of course, there are characters, but visually difficult. Regex is no more difficult for these than for characters; don't have to open Character Map or type ALT-NUMPAD codes, so it might be easier. I guess my take is that, for me, if it's not on the keyboard, it should be an entity. And there are even a few on the keyboard that make life easier for me. (>, <, ', ˜, & # 96 ; ,[forum is eating the numeric entity for the grave accent (backtick)! which has no named entity, sadly], etc.) This comes in large part from dealing with badly-formed source files, and slowly working via regex to get them consistent throughout. The named entities are emphatically expressive of content, not leaving it up to visual interpretation on my part. Also, generally, the ereaders have a method of expressing most entities -- but the characters are more problematic, leading to ugly replacements or errors. My 2 ¢ Aloha, Last edited by capidamonte; 06-16-2012 at 05:24 PM. Reason: grave accent eaten by forum |
06-16-2012, 05:49 PM | #77 | |
Grand Sorcerer
Posts: 27,572
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
I also use a lot of the unicode regex classes: \p{P} doesn't know what html entities are and won't match them. Neither will \p{Pd} or my favorite... \p{Po}. My custom tailored regexps are polluted with unicode classes like that. I guess I don't understand why this even has to be an issue. People should be able to make their own decision with regard to entity vs character. That's the way 0.5.3 works for me: if I enter the mdash entity it stays an entity... if I enter the mdash character it stays a character. Beautiful. Last edited by DiapDealer; 06-16-2012 at 09:30 PM. |
|
Advert | |
|
06-16-2012, 08:32 PM | #78 |
Not who you think I am...
Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
|
More or less agreed, pal. I have a full set, myself. I could probably stand to learn more unicode regex, honestly.
I think I jumped in here because I'm always afraid that things are going to go away that I use. I find that a lot of folks prefer to think about stuff that I prefer to just perceive, like named entities. I suspect that you perceive the characters themselves more clearly than I do. Back to regularly scheduled discussion. Aloha, |
06-16-2012, 09:40 PM | #79 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
In regard to entities. Sigil has since the 0.4 series replaced em, en, and shy with entities. All other entities would be replaced with unicode characters due to how BV worked. Now the only automatic replacement is em, en and shy. Everything else is now left as is.
The above three entities were chosen for replacement for key reasons. em and en look so similar that it makes it easier to differentiate. shy, well you can't see it so you don't know if it's there or not. Also, a new beta will be available once I get the unicode filename saving ironed out. Minizip is not very easy to understand. Last edited by user_none; 06-16-2012 at 09:43 PM. |
06-17-2012, 03:21 AM | #80 | |
Zealot
Posts: 114
Karma: 5246
Join Date: Jul 2010
Device: none
|
Quote:
Code:
--- sigil-0.5.0/src/Sigil/ResourceObjects/HTMLResource.cpp.orig 2012-02-02 04:00:34.000000000 +0200 +++ sigil-0.5.0/src/Sigil/ResourceObjects/HTMLResource.cpp 2012-02-02 06:43:11.293174051 +0200 @@ -473,8 +473,8 @@ QString newsource = source; newsource = newsource.replace( QString::fromUtf8( "\u00ad" ), "" ); - newsource = newsource.replace( QString::fromUtf8( "\u2014" ), "—" ); - newsource = newsource.replace( QString::fromUtf8( "\u2013" ), "–" ); + newsource = newsource.replace( "—", QString::fromUtf8( "\u2014" ) ); + newsource = newsource.replace( "–", QString::fromUtf8( "\u2013" ) ); return newsource; } |
|
Advert | |
|
06-17-2012, 04:09 AM | #81 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I had created an issue (316) which is now reported as "fixed", but maybe it can be reopened.
In any case, is also something you probably want as an entity. |
06-17-2012, 05:34 AM | #82 | ||
Grand Sorcerer
Posts: 27,572
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Either way, I still think entity vs character should ultimately be an end user decision. Just two cents worth of whatever. Quote:
EDIT: Works a treat! I chose to leave the source "as is" with the exception of the shy, zwsp, zwnj, zwj, and thinsp characters. I make sure those are all converted to some sort of visible entity Last edited by DiapDealer; 06-17-2012 at 09:05 AM. Reason: typo |
||
06-17-2012, 07:41 AM | #83 |
Member
Posts: 23
Karma: 10
Join Date: Apr 2011
Device: none
|
Hello user_none and meme, do you think that something can be done for the spell checking problem in French for 0.6 version? Indeed if one uses ' (straight apostrophes), spell check works properly but as soon as one uses ’ (curly apostrophes), spell check makes false positive errors. Cheers.
|
06-17-2012, 09:06 AM | #84 | |
Grand Sorcerer
Posts: 5,585
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
IMHO, entities for otherwise invisible characters are fine, but mandatory entities for dashes are not, since those who use em dashes and en dashes usually can tell them apart from each other and hyphens. |
|
06-17-2012, 09:21 AM | #85 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Sigil 4.0 beta
I have now tried 4.0 beta now on three machines running Win 7. This version does not load HTML files. The OS reports that the program has failed.
Has anyone else tried Win 7? Have I done something dumb. |
06-17-2012, 10:14 AM | #86 |
Sigil developer
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
When you say load HTML files - how are you loading them - Open, Add Existing, drag and drop, etc.?
|
06-17-2012, 10:18 AM | #87 | |
Sigil developer
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
Quote:
|
|
06-17-2012, 10:21 AM | #88 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
|
06-17-2012, 10:51 AM | #89 |
Grand Sorcerer
Posts: 5,585
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
The beta works fine on x32 XP machines. Were your Windows 7 machines all 64 bit systems?
|
06-17-2012, 11:15 AM | #90 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
0.4.903 (0.5 beta) Avaliable | user_none | Sigil | 77 | 01-03-2012 09:24 PM |
0.4.902 (0.5 beta) Avaliable | user_none | Sigil | 65 | 12-18-2011 11:58 AM |
No Avaliable format ? ? ? | Janette55 | Library Management | 5 | 04-16-2011 04:09 PM |
901 | reymund | PocketBook | 3 | 12-16-2010 07:09 PM |