| 
			
			 | 
		#76 | 
| 
			
			
			
			 Not who you think I am... 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 374 
				Karma: 30283 
				Join Date: Jan 2010 
				Location: Honolulu 
				
				
				Device: PocketBook 360 -- Ivory 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			It should, of course, be optional -- and consistent, no mixing. Often, in the past, I have had the impression that Sigil is abstracting extended characters somehow, which helped make regex unstable. 
		
	
		
		
		
		
		
		
		
		
		
		
		
			I prefer the named entities, myself -- I like to easily distinguish between ' and ‘/’, for instance, or hyphen-–-—, •-·, etc. Of course, there are characters, but visually difficult. Regex is no more difficult for these than for characters; don't have to open Character Map or type ALT-NUMPAD codes, so it might be easier. I guess my take is that, for me, if it's not on the keyboard, it should be an entity. And there are even a few on the keyboard that make life easier for me. (>, <, ', ˜, & # 96 ; ,[forum is eating the numeric entity for the grave accent (backtick)! which has no named entity, sadly], etc.) This comes in large part from dealing with badly-formed source files, and slowly working via regex to get them consistent throughout. The named entities are emphatically expressive of content, not leaving it up to visual interpretation on my part. Also, generally, the ereaders have a method of expressing most entities -- but the characters are more problematic, leading to ugly replacements or errors. My 2 ¢ Aloha, Last edited by capidamonte; 06-16-2012 at 06:24 PM. Reason: grave accent eaten by forum  | 
| 
		 | 
	
	
| 
			
			 | 
		#77 | |
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 ![]() I also use a lot of the unicode regex classes: \p{P} doesn't know what html entities are and won't match them. Neither will \p{Pd} or my favorite... \p{Po}. My custom tailored regexps are polluted with unicode classes like that. I guess I don't understand why this even has to be an issue. People should be able to make their own decision with regard to entity vs character. That's the way 0.5.3 works for me: if I enter the mdash entity it stays an entity... if I enter the mdash character it stays a character. Beautiful. Last edited by DiapDealer; 06-16-2012 at 10:30 PM.  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#78 | 
| 
			
			
			
			 Not who you think I am... 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 374 
				Karma: 30283 
				Join Date: Jan 2010 
				Location: Honolulu 
				
				
				Device: PocketBook 360 -- Ivory 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			More or less agreed, pal.  I have a full set, myself.  I could probably stand to learn more unicode regex, honestly. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I think I jumped in here because I'm always afraid that things are going to go away that I use. I find that a lot of folks prefer to think about stuff that I prefer to just perceive, like named entities. I suspect that you perceive the characters themselves more clearly than I do. Back to regularly scheduled discussion. Aloha,  | 
| 
		 | 
	
	
| 
			
			 | 
		#79 | 
| 
			
			
			
			 Sigil & calibre developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487 
				Karma: 1063785 
				Join Date: Jan 2009 
				Location: Florida, USA 
				
				
				Device: Nook STR 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			In regard to entities. Sigil has since the 0.4 series replaced em, en, and shy with entities. All other entities would be replaced with unicode characters due to how BV worked. Now the only automatic replacement is em, en and shy. Everything else is now left as is. 
		
	
		
		
		
		
		
		
		
		
		
		
		
			The above three entities were chosen for replacement for key reasons. em and en look so similar that it makes it easier to differentiate. shy, well you can't see it so you don't know if it's there or not. Also, a new beta will be available once I get the unicode filename saving ironed out. Minizip is not very easy to understand. Last edited by user_none; 06-16-2012 at 10:43 PM.  | 
| 
		 | 
	
	
| 
			
			 | 
		#80 | |
| 
			
			
			
			 Zealot 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114 
				Karma: 5246 
				Join Date: Jul 2010 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Code: 
	--- sigil-0.5.0/src/Sigil/ResourceObjects/HTMLResource.cpp.orig	2012-02-02 04:00:34.000000000 +0200
+++ sigil-0.5.0/src/Sigil/ResourceObjects/HTMLResource.cpp	2012-02-02 06:43:11.293174051 +0200
@@ -473,8 +473,8 @@
     QString newsource = source;
 
     newsource = newsource.replace( QString::fromUtf8( "\u00ad" ), "" );
-    newsource = newsource.replace( QString::fromUtf8( "\u2014" ), "—" );
-    newsource = newsource.replace( QString::fromUtf8( "\u2013" ), "–" );
+    newsource = newsource.replace( "—", QString::fromUtf8( "\u2014" ) );
+    newsource = newsource.replace( "–", QString::fromUtf8( "\u2013" ) );
 
     return newsource;
 }
 | 
|
| 
		 | 
	
	
| 
			
			 | 
		#81 | 
| 
			
			
			
			 frumious Bandersnatch 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,570 
				Karma: 20150435 
				Join Date: Jan 2008 
				Location: Spaniard in Sweden 
				
				
				Device: Cybook Orizon, Kobo Aura 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I had created an issue (316) which is now reported as "fixed", but maybe it can be reopened. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	In any case,   is also something you probably want as an entity.  | 
| 
		 | 
	
	
| 
			
			 | 
		#82 | ||
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 ![]() Either way, I still think entity vs character should ultimately be an end user decision. Just two cents worth of whatever. ![]() Quote: 
	
 EDIT: Works a treat! I chose to leave the source "as is" with the exception of the shy, zwsp, zwnj, zwj, and thinsp characters. I make sure those are all converted to some sort of visible entity Last edited by DiapDealer; 06-17-2012 at 10:05 AM. Reason: typo  | 
||
| 
		 | 
	
	
| 
			
			 | 
		#83 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 23 
				Karma: 10 
				Join Date: Apr 2011 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hello user_none and meme, do you think that something can be done for the spell checking problem in French for 0.6 version? Indeed if one uses ' (straight apostrophes), spell check works properly but as soon as one uses ’ (curly apostrophes), spell check makes false positive errors. Cheers.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
| 
			
			 | 
		#84 | |
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,763 
				Karma: 24088559 
				Join Date: Dec 2010 
				
				
				
				Device: Kindle PW2 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 IMHO, entities for otherwise invisible characters are fine, but mandatory entities for dashes are not, since those who use em dashes and en dashes usually can tell them apart from each other and hyphens.  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#85 | 
| 
			
			
			
			 eBook FANatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301 
				Karma: 16078357 
				Join Date: Apr 2008 
				Location: Alabama, USA 
				
				
				Device: HP ipac RX5915 Wife's Kindle 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Sigil 4.0 beta
			 
			
			
			I have now tried 4.0 beta now on three machines running Win 7. This version does not load HTML files. The OS reports that the program has failed. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Has anyone else tried Win 7? Have I done something dumb.  | 
| 
		 | 
	
	
| 
			
			 | 
		#86 | 
| 
			
			
			
			 Sigil developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,274 
				Karma: 1101600 
				Join Date: Jan 2011 
				Location: UK 
				
				
				Device: Kindle PW, K4 NT, K3, Kobo Touch 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			When you say load HTML files - how are you loading them - Open, Add Existing, drag and drop, etc.?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
| 
			
			 | 
		#87 | |
| 
			
			
			
			 Sigil developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,274 
				Karma: 1101600 
				Join Date: Jan 2011 
				Location: UK 
				
				
				Device: Kindle PW, K4 NT, K3, Kobo Touch 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#88 | 
| 
			
			
			
			 eBook FANatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301 
				Karma: 16078357 
				Join Date: Apr 2008 
				Location: Alabama, USA 
				
				
				Device: HP ipac RX5915 Wife's Kindle 
				
				
				 | 
	
	|
| 
		 | 
	
	
| 
			
			 | 
		#89 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,763 
				Karma: 24088559 
				Join Date: Dec 2010 
				
				
				
				Device: Kindle PW2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			The beta works fine on x32 XP machines. Were your Windows 7 machines all 64 bit systems?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
| 
			
			 | 
		#90 | 
| 
			
			
			
			 eBook FANatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301 
				Karma: 16078357 
				Join Date: Apr 2008 
				Location: Alabama, USA 
				
				
				Device: HP ipac RX5915 Wife's Kindle 
				
				
				 | 
	
	|
| 
		 | 
	
	
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| 0.4.903 (0.5 beta) Avaliable | user_none | Sigil | 77 | 01-03-2012 10:24 PM | 
| 0.4.902 (0.5 beta) Avaliable | user_none | Sigil | 65 | 12-18-2011 12:58 PM | 
| No Avaliable format ? ? ? | Janette55 | Library Management | 5 | 04-16-2011 05:09 PM | 
| 901 | reymund | PocketBook | 3 | 12-16-2010 08:09 PM |