![]() |
#16 | |
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Quote:
I put great emphasis on preserving the user's original code. |
|
![]() |
![]() |
![]() |
#17 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Quote:
But it has the possibility to introduce errors. I've attached 2 tiny epubs I made to demonstrate this. This first ('Sigil test Original.epub') was deliberately created using ANSI encoding in Notepad++ and (correctly) specifies ISO-8859-1 encoding in the xml specification. The accented 'é' appears correctly in both ADE and Calibre's epub reader. The second ('Sigil test opened in Sigil.epub') is the same file which has simply been opened in Sigil and immediately saved without any editing. the 'é' has now become a '?' in ADE and Calibre, because Sigil assumed that the encoding was utf-8, disregarding the encoding specified in the file, and changed the encoding attribute in the specification. I don't know what you'd call this, but I'd say that was a significant change in the code. I don't think it's something that necessarily needs to be fixed, and as I said before, this behaviour can be used to fix sloppy mistakes without the user needing to know much about what they're doing. But it is something that Sigil users need to be aware of - Sigil rigidly assumes that all the text it processes is UTF-8, and any edits need to be made with that in mind. For Western languages this isn't a issue, and in fact the use of UTF-8 should be encouraged - there's no reason for people to be using ancient ANSI encoding in epubs. But it might be a problem for those who need to use UTF-16. Sigil also strips out metadata elements in the body text xhtml that are irrelevant. Again, not a big problem for most users, though if you have a workflow that uses custom metadata fields it's something you really need to know about. If you look at the html inside the two epubs you'll see that's happened here, the custom metadata has been stripped. |
|
![]() |
![]() |
Advert | |
|
![]() |
#18 | |||
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Quote:
Quote:
But bugs are always possible. I'll check what's going on with this file and report back. In general, you should report any problems on the tracker so they get scheduled and fixed. You're not doing anyone any favors (least of all yourself) by not reporting bugs. I've heard people feel like reporting bugs or missing features is "undue criticism": couldn't be farther from the truth. The more bug reports, the better. Quote:
|
|||
![]() |
![]() |
![]() |
#19 | ||
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Quote:
Try to directly load the XHTML in the first epub file. The accented "e" is preserved, since the encoding is correctly detected and the files converted to UTF-8, just like I said. If you load the epub file, the accented "e" becomes a question mark. Why? Because what you have is an ISO-8859-1 encoded XHTML file inside and epub file, and that's against the epub specification. The only encoding allowed in XHTML files present in the epub specification are UTF-8 and UTF-16. You are not allowed to use something else (like ISO-8859-1). Quote:
But I'm going to change that. I'm going to perform the same encoding detection analysis on XHTML files in the epubs as I do when an (X)HTML file is loaded directly. Why? Because someone not familiar with the epub spec will do the same thing you did and expect everything to work. Sigil should be able to detect this error and correct it, as it can for markup. And it will, next version onwards. EDIT: This is now in trunk. Last edited by Valloric; 01-27-2010 at 09:28 AM. |
||
![]() |
![]() |
![]() |
#20 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Just calm down Valloric. This is getting ridiculous. As far as I'm concerned, this behaviour is a feature, and I presented it as such.
If you want to fix the issue with stripping metadata, fine, but I'd regard that as a very low priority. And yes, you're right about the spec, the file passed epubcheck 1.0.4, but that version contains a bug that doesn't spot encoding errors (which has been fixed, but not released, doh). I just did a quick test using an epub with some xhtml encoded in (and specified as) UTF-16 (displayed fine in ADE) and Sigil just stripped all the text and output a UTF-8 file. So yeah, that probably needs to be fixed, and if I'd confirmed the UTF-16 problem before now I'd certainly have listed it on your issues tracker. In the circumstances of the issue which this thread concerns, Sigil's assumption that the contents of an ePub are UTF-8 is a useful attribute that makes it easy to fix an apparently common problem. Both my points were correct, but you've chosen to reply in an overly aggressive manner. I know how easy it is to get worked up over personal projects, but frankly I didn't expect to get attacked for recommending Sigil as a tool to solve people's problems. I regard this thread as closed, I won't be looking at it again. |
![]() |
![]() |
Advert | |
|
![]() |
#21 | ||
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Quote:
But I've reread my responses now a few times and I still can't see any "aggression". I can't agree here. Quote:
|
||
![]() |
![]() |
![]() |
#22 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2010
Location: Hampshire, United Kingdom
Device: Sony Reader PRS505
|
Hi guys,
Thank you for all your comments and assistance. It is very much appreciated. However these replies just go to illustrate, in my mind at least, that we still have a long way to go before e-books are a reliable medium suitable for anyone. If I've understood the problem correctly the issue seems to be with the encoding, so presumably no matter where I buy the book from it will have the same encoding errors which stem from the master copy put out by the publisher? I would imagine that most people who buy e-books just want to read them. The fact that it becomes necessary to dismantle the files, download and install various other programs to overcome the DRM and correct the errors, and then reassemble the files seems vastly overcomplicated. The publisher should simply acknowledge their errors and supply the customer with a correctly formatted file. Waterstones, the online bookstore I bought the novel from were hopeless when it came to providing technical support. In fact after several weeks of waiting for a reply all they did was reimburse my money without a single word of explanation. So I still have no idea if it would be safe to download again either from them or from another website. Contrast this with how easy it is to exchange a book in a real shop which is done in minutes. Mcmillan clearly have a quality control problem if they can release books where every page has encoding errors. I think my next step will be to write to Mcmillan and explained the situation and see what their response will be (if any). I will then return to this thread and update it with their reply. Kind regards Mark |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Light/thin character block on opening a book | trampas | Amazon Kindle | 4 | 09-15-2010 01:29 AM |
Strange text in homemade theme | ArchCarrier | PocketBook | 9 | 03-26-2010 07:48 PM |
Strange behaviour of TOC for one character | paulpeer | Calibre | 6 | 03-07-2010 12:03 PM |
Strange pagination in 1 book in Stanza | ChristopherTD | Apple Devices | 3 | 11-25-2009 02:59 AM |
Strange Book Designer Problem | dordale | Workshop | 2 | 01-16-2009 08:53 AM |