Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-08-2010, 02:23 PM   #1
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Encoding declaration in OPF and TOC?

I've made a lot of books with Sigil and sometimes I import them in Calibre. After having done that, I see that often accented characters are shown in the wrong way (e.g. Chinese characters instead of Latin ones) in the TOC and in the meta data.

About the first problem (TOC) I wrote to the programmer of Calibre and his response was "Bug fixed: When decoding NCX toc files, if no encoding is declared and detection has less that 100% confidence, assume UTF-8."

So I understand that the TOC should have an encoding declaration. Can this be added so that Sigil does that automatically? As I understand Sigil delivers perfect utf-8 but doesn't declare so.

Also about the second problem (errors in the meta data) I wrote to Calibre, and the answer was similar: stick an encoding declaration in the OPF.

Hence my similar question: Can Sigil add an encoding declaration to the OPF?

Thanks!
paulpeer is offline   Reply With Quote
Old 03-08-2010, 02:55 PM   #2
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by paulpeer View Post
Hence my similar question: Can Sigil add an encoding declaration to the OPF?
The XML standard states that an XML document without an "encoding" attribute in the XML declaration is encoded in either UTF-8 or UTF-16. If it is encoded in UTF-16, then it MUST have a Byte Order Mark. Therefore, if it doesn't have an encoding attribute and it doesn't have a BOM, it must be UTF-8.

In plain English, UTF-8 is the default character encoding for XML. I thought everyone knew that.

But I'll add the attribute, it can't hurt.

EDIT: And here's the source. Just invert the negatives.

Quote:
... it is a fatal error [...] for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.

Last edited by Valloric; 03-08-2010 at 03:11 PM.
Valloric is offline   Reply With Quote
Advert
Old 03-08-2010, 03:18 PM   #3
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Quote:
Originally Posted by Valloric View Post
In plain English, UTF-8 is the default character encoding for XML. I thought everyone knew that.
I had the same opinion, but if a famous programmer says that I have to declare that my books are utf-8, I start doubting ;-)
Quote:
Originally Posted by Valloric View Post
But I'll add the attribute, it can't hurt.
Thanks! It will save a lot of trouble in many cases.
paulpeer is offline   Reply With Quote
Old 03-08-2010, 03:21 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Oh if only everyone knew that and no one produced XML files encoded in encoding other than UTF-8 with no encoding declaration.
kovidgoyal is offline   Reply With Quote
Old 03-08-2010, 03:32 PM   #5
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by kovidgoyal View Post
Oh if only everyone knew that and no one produced XML files encoded in encoding other than UTF-8 with no encoding declaration.
You should have said Kovid that this was causing you problems. It's no problem to add the encoding attribute.

But you really should fall back to the standard when byte stream fingerprinting isn't 100% sure of the encoding.
Valloric is offline   Reply With Quote
Advert
Old 03-08-2010, 03:39 PM   #6
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
This is now in trunk.
Valloric is offline   Reply With Quote
Old 03-08-2010, 03:47 PM   #7
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Quote:
Originally Posted by Valloric View Post
This is now in trunk.
You're marvellous, guys!
paulpeer is offline   Reply With Quote
Old 03-08-2010, 03:48 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Valloric View Post
You should have said Kovid that this was causing you problems. It's no problem to add the encoding attribute.

But you really should fall back to the standard when byte stream fingerprinting isn't 100% sure of the encoding.
The problem is that byte stream fingerprinting is almost never a hundred percent certain.
kovidgoyal is offline   Reply With Quote
Reply

Tags
declaration, encoding, epub, utf-8

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] calibre not creating content.opf or toc.ncx files during conversion foxxywith2xs Calibre 7 12-16-2012 07:49 PM
Proper Unicode Declaration Fabe Sigil 9 10-13-2010 01:42 PM
Namespace declaration ChrisI Sigil 1 08-22-2010 06:02 AM
Declaration of Independence bill the smith News 140 10-02-2009 05:01 PM
Making a TOC for LRFs? Issues with Calibre + LRF TOC editor not working Magitek LRF 0 05-06-2009 01:25 PM


All times are GMT -4. The time now is 05:04 AM.


MobileRead.com is a privately owned, operated and funded community.