Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 03-27-2014, 02:55 AM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
About UTF-16 parsing mistake

Hi

I did a curious and fairly reproductible experiment using the broken EPUB I presented to you yesterday. As I am more careful today, I'll let you decide if this is an Editor bug or an advanced feature...

I unchecked this box in the Editor preferences (screenshot 1). This means, if I understand correctly, that no UTF-16 character will be created to replace a named entity like nbsp. (of which there are 611 in the book).

Then I just modified one word on chapter 2 with the Editor (just change one word and back) and saved the file.

1. - If I open this same chapter 2 file with the Editor, there will be no reading problem but if I check the book, the Editor reports now an error for this modified chapter: "Parsing failed: Document labelled UTF-16 but has UTF-8 content, line 1..." (scr 4 - far right).

2. - Opening this file with Sigil 0.7.4, things are even more gloomy: Sigil gives a warning (scr 2). Looking at the files, I observed that the DOCTYPE and the nbsp have indeed been logically maintained but the modified chapter 2 file is declared unreadable on Sigil without any reason given (in fact it's unreadable because it's declared as UTF-16). If I try to open the chapter 2 xhtml file, it will look a little like Chinese but written by me (scr 3).

Changing UTF-16 with UTF-8 in the declaration solves all problems for both editors.

If the Editor cannot parse, if Sigil is bewildered by this change, then why do it?

Proposal. When the user unchecks the preferences checkbox alluded above (scr 1), not only the nbsp and DOCTYPE should be preserved like now, but the file should stay declared as UTF-8.
Attached Thumbnails
Click image for larger version

Name:	Préférences Editor.png
Views:	324
Size:	62.6 KB
ID:	120854   Click image for larger version

Name:	Sigil 0.7.4. report.png
Views:	320
Size:	19.0 KB
ID:	120855   Click image for larger version

Name:	Sigil 0.7.4 - sweet UTF-16.png
Views:	1193
Size:	647.9 KB
ID:	120856   Click image for larger version

Name:	Editor report.png
Views:	329
Size:	84.7 KB
ID:	120857  

Last edited by roger64; 03-27-2014 at 03:25 AM.
roger64 is offline   Reply With Quote
Old 03-27-2014, 03:00 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That option only control replacement of entities as they are typed. The editor never uses UTF-16 or generates UTF-16 encoded files, ever. The editor always outputs UTF-8.
kovidgoyal is offline   Reply With Quote
Old 03-27-2014, 03:31 AM   #3
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by kovidgoyal View Post
That option only control replacement of entities as they are typed. The editor never uses UTF-16 or generates UTF-16 encoded files, ever. The editor always outputs UTF-8.
OK. I found the reason is a wrongly selected UTF-16 writer2xhtml preference (scr) that again Sigil blissfully automatically corrects. On the other hand, the Editor has a sharper reporting capability than Flightcrew or Epubcheck.

So since the Editor outputs UTF-8, and using this option maintains DOCTYPE and nbsp (in this particular checkbox case), there should be no reading problems anymore with Sigil. Excellent.

Sorry for this. It's again my mistake. One day my EPUB will be unbroken...I learnt another thing today.

And here to conclude a final -but not perfect- EPUB with subsetted fonts and 'traditional' nbsp. All other reported mistakes suppressed. Going to and from Sigil without problem.
(Sigil used namely for splitting two chapters).
Attached Thumbnails
Click image for larger version

Name:	writer2xhtml.png
Views:	302
Size:	45.2 KB
ID:	120860  
Attached Files
File Type: epub Lettres de Saint-Arnaud - 1.epub (771.5 KB, 240 views)

Last edited by roger64; 03-27-2014 at 06:08 AM.
roger64 is offline   Reply With Quote
Old 03-27-2014, 09:34 AM   #4
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Good Info, but...

Quote:
Originally Posted by roger64 View Post
OK. I found the reason
One way to get your post seemingly ignored is to completely rewrite a post 2.5 hours after you initially post it. No one is pinged by the board that you updated a post. It is possible Kovid read your initial post and will never read your completely rewritten post because there is no mechanism to let anyone know you completely rewrote the post. It is best to just write an additional new post. That way folks following the conversation will have an opportunity to read what you learned.
DoctorOhh is offline   Reply With Quote
Old 03-27-2014, 09:49 AM   #5
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@DoctorOhh

Thank you for following the thread.

The main point of my message is that I found the reason of this behaviour. It was quickly given and made a second reply useless.

Then I added some minor edits and I finally posted much later the "unbroken" EPUB. I put it in the same message because I did not wish to disturb people with a new post. But, I agree that I may be wrong about it.
roger64 is offline   Reply With Quote
Old 03-27-2014, 09:56 AM   #6
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by roger64 View Post
The main point of my message is that I found the reason of this behaviour. It was quickly given and made a second reply useless.
Maybe I was the only one that read your initial post directly after you posted it. When I reopened my browser later I was still on the thread and realized the post was completely different and made my own hasty conclusions.

Thanks for providing updated info.
DoctorOhh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing data from feed atordo Recipes 1 01-23-2014 03:50 PM
HTML5 parsing nickredding Conversion 8 08-09-2012 09:50 AM
Parsing Index Steven630 Recipes 0 07-06-2012 04:53 AM
iPad PageList parsing using Javascript. Oh.Danny.Boy Apple Devices 0 05-17-2012 05:24 PM
Parsing Titles cgraving Calibre 3 01-17-2011 02:52 AM


All times are GMT -4. The time now is 11:54 AM.


MobileRead.com is a privately owned, operated and funded community.