Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 08-03-2020, 08:19 AM   #1
mtck
Junior Member
mtck began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Aug 2020
Device: Kindle
ASCII or HTML

Hi,
I'm compiling my first ePub to submit to Kindle.

I used the website, word to clean html dot com, which converted curly quotes to html.

Later, in Sigil, I used the Mend & Prettify all HTML files tool, which stripped the HTML for punctuation and quotes.

I was under the impression that HTML should be used for most, if not all such punctuation. And that, if it's not HTML, then such characters would be in ASCII format.

I don't mean to question Sigil's methods but, will my ePub be ok to submit to Kindle like this?

At a glance, the only HTML formatting is for paragraphs, headings... There is no html for any punctuation or 'special' characters (not that there are many special characters).

Apologies for the newbie question. I did search the forum, and read the 'NotJohn Guide...' but if there's a clear answer I couldn't find it.

Sincerely appreciate any clarification here!
Attached Thumbnails
Click image for larger version

Name:	Screenshot 2020-08-03 at 13.14.57.png
Views:	244
Size:	263.0 KB
ID:	181095  
mtck is offline   Reply With Quote
Old 08-03-2020, 09:07 AM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
A few things:
- Epub xhtml files support full unicode for text content. No restriction to ascii for text content at all

- Many people use what are called numeric or named entities to encode special characters but this is not required. Things like smart quotes, non-breaking spaces etc.
Unless you tell Sigil via its Preserve Entities preference settings to keep them, Sigil will simply use the actual unicode character.

Nothing is lost. If you want the entities to be put back, just add the entities you want to use to Sigil's Preserve Entities list in Preferences and run Mend again.

Note named entities are not permitted in epub3 which requires numeric entities if you decide to keep them.

If "entities" are what you mean by special characters, there is no need to use them for Kindle or for epub but they do make some special white space chars more easy to see.
KevinH is offline   Reply With Quote
Advert
Old 08-03-2020, 10:09 AM   #3
mtck
Junior Member
mtck began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Aug 2020
Device: Kindle
Thanks KevinH, much appreciated.

To correct my terminology (for own sake, and hopefully benefit others):

Unicode is a superset of ASCII. I should have referred to Unicode instead of ASCII. ePub 2.0.1 requires Unicode UTF-8 or UTF-16.(Source)

Yes, entity is what I should have said, not 'special character':

"An HTML entity is a piece of text ("string") that begins with an ampersand (&) and ends with a semicolon ( . Entities are frequently used to display reserved characters (which would otherwise be interpreted as HTML code), and invisible characters (like non-breaking spaces). You can also use them in place of other characters that are difficult to type with a standard keyboard. " (Source)

One thing I don't quite understand:

Quote:
Note named entities are not permitted in epub3 which requires numeric entities if you decide to keep them.
Is it right to say:

Named entities are HTML, and Numbered entities are Unicode, as listed here?

Again, sincere thanks, much appreciated.
mtck is offline   Reply With Quote
Old 08-03-2020, 10:50 AM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
Named entities use a mnemonic character string instead of a numeric code. So the following is a named entity (ignore the spaces).

& n b s p ;

The equivalent numeric entity can be written in hexadecimal or decimal notation as follows:

& # 1 6 0 ;

or

& # X A 0 ;


The only named entities allowed in html5/epub3 are the original xml entities.

& a m p ;

& l t ;

& g t ;

and a few others.

All others must be in numeric form.
KevinH is offline   Reply With Quote
Old 08-03-2020, 11:19 AM   #5
mtck
Junior Member
mtck began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Aug 2020
Device: Kindle
Fantastic, many thanks.
I didn't realise HTML entities could be written in HEX. And I'll need to spend some time getting my head around the following:

According to this website, possibly a useful resource, ePub aside (!), the choice for character codes:
- Unicode
- HTML Code
- HTML Entity (so this is different to HTML code)
- HEX code
- CSS Code

I've just realised that, the website word2cleanhtml.com has a box to tick where it says "Replace non-ascii with HTML entities", but it's actually replacing them with HTML code.

& # 8220 ;

Who knew!?

I'm up and running anyway. Thanks so much for your responses.
mtck is offline   Reply With Quote
Advert
Old 08-03-2020, 11:25 AM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
Not quite. Their "html code" is actually a numeric entity and their "entity" is actually a "named entity". For html5 and therefore epub3, no named entities are allowed. Numeric entities are allowed but not needed.

Under epub2, both named and numeric entities are allowed.

The file itself should be utf-8 encoded but Sigil handles that conversion for you in both ways.
KevinH is offline   Reply With Quote
Old 08-03-2020, 01:22 PM   #7
exaltedwombat
Guru
exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.
 
Posts: 878
Karma: 2457540
Join Date: Nov 2011
Device: none
Your code example looks OK. Are you saying special characters or formatting have been stripped out?

I suspect you might want to check your use of punctuation though. There's a couple of things there which just MIGHT be ok had you shown us the whole paragraph for context, but I'm afraid they probably aren't.
exaltedwombat is offline   Reply With Quote
Old 08-03-2020, 05:58 PM   #8
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
Quote:
Originally Posted by exaltedwombat View Post
Your code example looks OK. Are you saying special characters or formatting have been stripped out?
I'm guessing by "stripped out" he means that the left curly quotes and right curly quotes were originally named (more likely is my guess) or numeric entities and Sigil's mend and prettify converted them to the Unicode characters. When he says punctuation he may mean dashes, ellipses, and such; I have Sigil preserve those so I can tell what's what.
hobnail is offline   Reply With Quote
Old 08-04-2020, 05:48 AM   #9
mtck
Junior Member
mtck began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Aug 2020
Device: Kindle
Apologies, my saying Sigil 'stripped' the entities isn't fair onSigil.

To clarify :

Sigil correctly converted the HTML numeric entities for curly quotes, and more, into standard (Unicode, I guess) characters.


Word2cleanhtmal.com's "Replace non-ascii with HTML entities" option had previously converted curly brackets into HTML numeric entities.
mtck is offline   Reply With Quote
Old 08-13-2020, 06:31 AM   #10
Notjohn
mostly an observer
Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.
 
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
I've uploaded twenty or so books using the route you describe (Word to Word2Clean to Sigil) and never had a problem. (Yes, it was surprising when Sigil began rendering quotes as quotes, but neither on Amazon nor any other bookseller have I ever encountered a problem. Perhaps I should add that I use epub2.)
Notjohn is offline   Reply With Quote
Reply

Tags
ascii, html, quotes

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Non-ASCII File Names Hopkins Editor 5 01-18-2018 08:02 AM
Image to ascii crutledge ePub 9 10-29-2014 04:29 PM
Calibre Recipe HTML content differs from raw html of index.html. krunk Calibre 4 09-20-2010 09:48 PM
Ascii file ProDigit Lounge 1 12-25-2008 10:08 PM
WM Live Video in ASCII! TadW Lounge 1 06-22-2006 07:14 PM


All times are GMT -4. The time now is 03:48 PM.


MobileRead.com is a privately owned, operated and funded community.