|  01-31-2012, 05:06 PM | #1 | 
| Berti            Posts: 1,197 Karma: 4985964 Join Date: Jan 2012 Location: Zischebattem Device: Acer Lumiread | 
				
				How to get rid of HTML Character representations
			 
			
			When importing from a HTML-source there are often some (whats the correct term ?)  national characters in the text, i.e. "Ä" represents the german Character "Ä". - sigils spell checker can't handle these characters (correct spelled words are marked red) - the toc-generator seems to ignore them (<h3>So ein Ärger</h3> will result in a toc-entry "So ein rger") Should Sigil translate them automatically ?? Is there an easy way to translate then ? (aside from search and replace one by one) | 
|   |   | 
|  01-31-2012, 05:31 PM | #2 | 
| Grand Sorcerer            Posts: 28,862 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Are you saying: Code: <h3>So ein Ärger</h3> Or are you saying that: Code: <h3>So ein Ärger</h3> The first would surprise me, but the second example makes perfect sense. You can't have html entities ( Ä ) in the toc.ncx file. You should either replace the html entities (that will be used in Sigil's toc generation) with their unicode equivalents or add a title attribute that uses the correct unicode character: Code: <h3 title="So ein Ärger">So ein Ärger</h3> Or just fix the errors manually in the toc.ncx file after Sigil generates one. Last edited by DiapDealer; 01-31-2012 at 05:34 PM. | 
|   |   | 
|  01-31-2012, 06:04 PM | #3 | 
| ♫            Posts: 661 Karma: 506380 Join Date: Aug 2010 Location: Germany Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color | 
			
			Both  <h3>So ein Ärger</h3> and <h3>So ein Ärger</h3> work fine on my (native German) Computer, for both the normal text and too the TOC entry. And the Sigil spell checker does not complain neither. mmat, I guess something with your settings is wrong... | 
|   |   | 
|  01-31-2012, 07:13 PM | #4 | 
| Wizard            Posts: 2,592 Karma: 4290425 Join Date: Jun 2009 Location: Foristell, Missouri, USA Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3, Kobo Libra 2 | 
			
			In a similar instance, I'm getting a lot of   instead of a space. I'll go through a page, and the spaces I added while typing, in the code end up as   instead. It's kinda random.
		 | 
|   |   | 
|  01-31-2012, 07:23 PM | #5 | 
| Sigil & calibre developer            Posts: 2,487 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR | 
			
			Go to the about screen and post what it says for the loaded Qt version.
		 | 
|   |   | 
|  01-31-2012, 08:19 PM | #6 | 
| Wizard            Posts: 2,592 Karma: 4290425 Join Date: Jun 2009 Location: Foristell, Missouri, USA Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3, Kobo Libra 2 | 
			
			0.5.0 with QT4.7.4 here
		 | 
|   |   | 
|  01-31-2012, 08:26 PM | #7 | |
| Well trained by Cats            Posts: 31,240 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | Quote: 
  a spacebar in CV gets a space Been that way at least V3 onwards XP SP3 | |
|   |   | 
|  01-31-2012, 09:27 PM | #8 | |
| Wizard            Posts: 2,592 Karma: 4290425 Join Date: Jun 2009 Location: Foristell, Missouri, USA Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3, Kobo Libra 2 | Quote: 
 I've been editing this one book. Original book is 288 pages and 28 chapters. I've got each chapter as its own file, and when I start each chapter it has no  , and when I'm done with the chapter, it'll have anywhere from 150 to 450 of them. | |
|   |   | 
|  01-31-2012, 09:57 PM | #9 | |
| Well trained by Cats            Posts: 31,240 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | Quote: 
  The problem is there are times and places that a NBSP is wanted. Bulk kills those also   | |
|   |   | 
|  01-31-2012, 10:09 PM | #10 | 
| Wizard            Posts: 2,592 Karma: 4290425 Join Date: Jun 2009 Location: Foristell, Missouri, USA Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3, Kobo Libra 2 | 
			
			Yeah, but the number of times that is the case out numbers the times that Sigil inserts it, so much so, that it is easier to bulk replace and add them back in by hand.
		 | 
|   |   | 
|  02-01-2012, 04:03 AM | #11 | 
| Guru            Posts: 657 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin) | 
			
			Bacause of this nbsp problem, if I'm going to be editing a file in BV, I go into CV and bulk replace the   with xzxzxz, then when finished editing, do the reverse, which works out easier than not doing it and adding them again by hand.
		 | 
|   |   | 
|  02-01-2012, 08:58 AM | #12 | 
| Wizard            Posts: 2,592 Karma: 4290425 Join Date: Jun 2009 Location: Foristell, Missouri, USA Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3, Kobo Libra 2 | 
			
			I do something similar. I was just saying the other for sake of argument.
		 | 
|   |   | 
|  02-04-2012, 09:56 AM | #13 | 
| Berti            Posts: 1,197 Karma: 4985964 Join Date: Jan 2012 Location: Zischebattem Device: Acer Lumiread | 
			
			Hello, I've an update First: DiapDealer is right, I gave the wrong code, only <h3>So ein Ärger</h3> will result in a wrong toc-entry (sorry for this mistake) Second: This will only happen, if toc is build while the code-view of the editor is active. In book mode, I've never seen this. Third: it doesn't happen always. My major problem is not having some misspelled toc-entries, or having a   somewhere. The "Umlauts" are in the middle of the word and the spellchecker (which is really good) is quite useless while they are present. QT 4.7.4 | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| replacing html character | Olger | Recipes | 0 | 01-25-2012 05:19 AM | 
| Epub format, B & N PubIt!, and HTML character entities | jlandahl | ePub | 3 | 04-07-2011 04:38 AM | 
| Calibre Recipe HTML content differs from raw html of index.html. | krunk | Calibre | 4 | 09-20-2010 09:48 PM | 
| Access to local HTML files and content, HTML ebooks, annotation on HTML ebooks | leo315 | enTourage Archive | 2 | 05-10-2010 02:40 PM | 
| get rid of blank line html code | Blurr | Calibre | 4 | 12-28-2009 09:20 PM |