01-31-2012, 05:06 PM | #1 |
Berti
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
How to get rid of HTML Character representations
When importing from a HTML-source there are often some (whats the correct term ?) national characters in the text, i.e. "Ä" represents the german Character "Ä".
- sigils spell checker can't handle these characters (correct spelled words are marked red) - the toc-generator seems to ignore them (<h3>So ein Ärger</h3> will result in a toc-entry "So ein rger") Should Sigil translate them automatically ?? Is there an easy way to translate then ? (aside from search and replace one by one) |
01-31-2012, 05:31 PM | #2 |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Are you saying:
Code:
<h3>So ein Ärger</h3> Or are you saying that: Code:
<h3>So ein Ärger</h3> The first would surprise me, but the second example makes perfect sense. You can't have html entities ( Ä ) in the toc.ncx file. You should either replace the html entities (that will be used in Sigil's toc generation) with their unicode equivalents or add a title attribute that uses the correct unicode character: Code:
<h3 title="So ein Ärger">So ein Ärger</h3> Or just fix the errors manually in the toc.ncx file after Sigil generates one. Last edited by DiapDealer; 01-31-2012 at 05:34 PM. |
Advert | |
|
01-31-2012, 06:04 PM | #3 |
♫
Posts: 660
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color
|
Both
<h3>So ein Ärger</h3> and <h3>So ein Ärger</h3> work fine on my (native German) Computer, for both the normal text and too the TOC entry. And the Sigil spell checker does not complain neither. mmat, I guess something with your settings is wrong... |
01-31-2012, 07:13 PM | #4 |
Wizard
Posts: 2,549
Karma: 3799999
Join Date: Jun 2009
Location: O'Fallon, Missouri, USA
Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3
|
In a similar instance, I'm getting a lot of instead of a space. I'll go through a page, and the spaces I added while typing, in the code end up as instead. It's kinda random.
|
01-31-2012, 07:23 PM | #5 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Go to the about screen and post what it says for the loaded Qt version.
|
Advert | |
|
01-31-2012, 08:19 PM | #6 |
Wizard
Posts: 2,549
Karma: 3799999
Join Date: Jun 2009
Location: O'Fallon, Missouri, USA
Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3
|
0.5.0 with QT4.7.4 here
|
01-31-2012, 08:26 PM | #7 | |
Well trained by Cats
Posts: 29,812
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
a spacebar in CV gets a space Been that way at least V3 onwards XP SP3 |
|
01-31-2012, 09:27 PM | #8 | |
Wizard
Posts: 2,549
Karma: 3799999
Join Date: Jun 2009
Location: O'Fallon, Missouri, USA
Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3
|
Quote:
I've been editing this one book. Original book is 288 pages and 28 chapters. I've got each chapter as its own file, and when I start each chapter it has no , and when I'm done with the chapter, it'll have anywhere from 150 to 450 of them. |
|
01-31-2012, 09:57 PM | #9 | |
Well trained by Cats
Posts: 29,812
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
The problem is there are times and places that a NBSP is wanted. Bulk kills those also |
|
01-31-2012, 10:09 PM | #10 |
Wizard
Posts: 2,549
Karma: 3799999
Join Date: Jun 2009
Location: O'Fallon, Missouri, USA
Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3
|
Yeah, but the number of times that is the case out numbers the times that Sigil inserts it, so much so, that it is easier to bulk replace and add them back in by hand.
|
02-01-2012, 04:03 AM | #11 |
Guru
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Bacause of this nbsp problem, if I'm going to be editing a file in BV, I go into CV and bulk replace the with xzxzxz, then when finished editing, do the reverse, which works out easier than not doing it and adding them again by hand.
|
02-01-2012, 08:58 AM | #12 |
Wizard
Posts: 2,549
Karma: 3799999
Join Date: Jun 2009
Location: O'Fallon, Missouri, USA
Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3
|
I do something similar. I was just saying the other for sake of argument.
|
02-04-2012, 09:56 AM | #13 |
Berti
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
Hello, I've an update
First: DiapDealer is right, I gave the wrong code, only <h3>So ein Ärger</h3> will result in a wrong toc-entry (sorry for this mistake) Second: This will only happen, if toc is build while the code-view of the editor is active. In book mode, I've never seen this. Third: it doesn't happen always. My major problem is not having some misspelled toc-entries, or having a somewhere. The "Umlauts" are in the middle of the word and the spellchecker (which is really good) is quite useless while they are present. QT 4.7.4 |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
replacing html character | Olger | Recipes | 0 | 01-25-2012 05:19 AM |
Epub format, B & N PubIt!, and HTML character entities | jlandahl | ePub | 3 | 04-07-2011 04:38 AM |
Calibre Recipe HTML content differs from raw html of index.html. | krunk | Calibre | 4 | 09-20-2010 09:48 PM |
Access to local HTML files and content, HTML ebooks, annotation on HTML ebooks | leo315 | enTourage Archive | 2 | 05-10-2010 02:40 PM |
get rid of blank line html code | Blurr | Calibre | 4 | 12-28-2009 09:20 PM |