08-15-2013, 09:00 PM | #46 | |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
This in itself is a major issue with HTML5. Since it doesn't define a doctype (not required by the spec) you can't use named entities. In the case of HTML5 a set of named entities is required to be supported (the same entities required by XML). Other entities not being defined will cause issues. Basically, if you use named entities (other than the ones supported by XML) you must have them declared. They're typically declared via the doctype. Hence the need in this case for a fully valid doctype for the nbsp in the file to parse correctly. |
|
08-15-2013, 10:11 PM | #47 |
Grand Sorcerer
Posts: 27,903
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
That's sort of my point. Bob's test ePub didn't use any named entities (with the exception of the & amp; which is supported), only the unicode non-breaking space character. So since the ePub didn't contain any unsupported entities, the DOCTYPE should have been unnecessary. The file really only needed the DOCTYPE after Sigil changed the unicode characters to named entities upon opening (to avoid Qt changing it to a normal space). That's the part that feels a bit wonky to me. Not that I know how else it could be handled to preserve the intent of the non-breaking space character when cleaning is turned off, mind you.
|
Advert | |
|
08-16-2013, 01:57 AM | #48 |
creator of calibre
Posts: 44,351
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@user_none: I'm assuming there's some reason you cant just use numeric entities?
|
08-16-2013, 04:59 AM | #49 |
Sigil developer
Posts: 1,274
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
The nbsp named entity was used as it was commonly known and consistent, and, well, it didn't seem to be an issue.
If in the code we change conversion of the character to the numeric entity instead of nbsp named entity when loading the file then we also need to change the code that converts the character to the nbsp entity when you switch back from Book View (otherwise you'll just get an error if you have no DOCTYPE defined). This should work. But now anyone who was using a DOCTYPE and nbsp entities could have some of those nbsp entities turned into numeric entities (e.g. on pages they edit in Book View) leaving a mix of named and numeric entities. Alternatively we change all nbsp entities into their numeric form everywhere in the code (loading, empty paragraphs, insert special character, etc.). That might affect quite a few find&replace statements and quite a few people so if such a change were made it would probably wait until a major release. |
08-16-2013, 11:03 PM | #50 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Constancy and expectations mostly. Other characters are auto converted to entities (not just non-breaking spaces) and named entities are used for those. Also, people know what nbsp does while many people don't know what #160 is.
|
Advert | |
|
08-16-2013, 11:42 PM | #51 |
creator of calibre
Posts: 44,351
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Another idea, replace the nbsp with
Code:
<span style="white-space:pre"> </span> |
08-17-2013, 03:47 AM | #52 | |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
The proposed code above is way too long and awkward. We use regex to insert nbsp or their numeric equivalents, and I am pretty sure none (pun not intended) in France would ever consider this. Last edited by roger64; 08-17-2013 at 04:52 AM. |
|
08-17-2013, 08:56 AM | #53 |
Well trained by Cats
Posts: 30,377
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
There is a reason for named entities: they are easy to remember (at least the common ones). No lookup chart needed. With number only
Is this (HTML5) number only requirement a result of the mindset of those who make frequent use WEB authoring tools? |
08-17-2013, 09:31 AM | #54 |
creator of calibre
Posts: 44,351
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
HTML 5 works fine with named entities in all browsers/readers. Technically, the XHTML variant of HTML 5 is not valid without a DTD in the presence of named entities. However, the only things that actually dont work with named entities are the various pointless validity checking tools like epubcheck.
|
08-17-2013, 10:14 AM | #55 | |
Well trained by Cats
Posts: 30,377
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
08-17-2013, 05:17 PM | #56 | |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Well, they may be pointless, but some of us are seriously stuck with them. I have to suffer through Lulu's epubcheck 1.x or some crap like that (which constantly rejects the DC "Creator," in case anyone here has a client going that route), and almost every variant along the way since then, depending on which distributor goes to which retailer and how "tecchy" their staffs are. Add this to Smashword's latest boggles (which I don't understand, don't they use your API?) and it's enough to make any real bookmaker weep. Do we all need ePUBcheck? Yes, for myriad reasons, but still....however pointless you find them, many of us can't ignore them. H |
|
08-17-2013, 08:10 PM | #57 | |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
I've changed to using numeric entities for nbsp in Sigil so the issue won't exist in the next release. |
|
08-17-2013, 11:15 PM | #58 |
creator of calibre
Posts: 44,351
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@Hitch: I am well aware of the idiocy of the various epub retailers that insist on using epubcheck. I keep getting bug reports about it, after all. The fact that retailers have blindly jumped on the epubcheck bandwagon does not change the fact of it's pointlessness.
@user_none: Really, you have html files with named entities that work in HTML 4 that dont work when you add an HTML 5 doctype? I would be very interested in seeing an example. |
08-19-2013, 05:07 AM | #59 | |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Hitch |
|
09-11-2013, 09:45 PM | #60 | |
Grand Sorcerer
Posts: 6,111
Karma: 34000001
Join Date: Mar 2008
Device: KPW1, KA1
|
Quote:
On some older files, I get errors with regard to nbsp, image links, stylesheet links, and sometimes other links, where 0.72 just happily opened the file, saved it, and it worked everywhere; ADE, Calibre, and on my reader. Basically, almost every file I open has errors, especially if they are converted to EPUB by Calibre. Often even books that are bought as EPUB's have (many) errors. (I mostly open them to remove whitespace between paragraphs and make indents smaller before sticking them into Calibre.) If there really are errors that Sigil can fix reliably, such as missing doc-types, would it be possible to suppress the error messages and just fix the files silently; maybe add something like [Fixed] behind the file name in the address bar, to mark that it was fixed? Last edited by Katsunami; 09-11-2013 at 09:50 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil 0.7.2 Released | user_none | Sigil | 40 | 06-24-2013 11:35 PM |
Sigil 0.7.1 Released | user_none | Sigil | 64 | 03-26-2013 10:02 PM |
Sigil 0.6.0 Released | user_none | Sigil | 93 | 11-24-2012 06:50 PM |
Sigil 0.5.3 Released | user_none | Sigil | 85 | 05-13-2012 05:29 AM |
Sigil 0.4.2 Released | user_none | Sigil | 41 | 10-26-2011 06:03 AM |