MobileRead Forums - View Single Post

Tex2002ans · 11-06-2019, 03:16 AM

Quote:

Originally Posted by JSWolf

I have read 11/22/63. I know why that was done. It was done to simulate the way it was done in the pBook. The thing is, 34 complete fonts were embedded. S&S should have subset all of those fonts. That would have saved a lot of space.

The version I had they were subset (~44KBs each). Not as bad as it could've been, but still extra bloat.

Quote:

Originally Posted by JSWolf

But the way Penguin does it is 100% useless. It's just an advert.

Yes, a font just for advertisements... that's definitely overboard (and then when they forget to subset...).

Similarly, I've run across books with embedded fonts for those preview chapters of other/next books.

Quote:

Originally Posted by JSWolf

There are times when some extended characters are used and yes, it better to make sure they are available. But why do some eBooks have a span around these characters? That's just silly.

I think it might be a leftover from conversion, or the tool treating accented characters as a "foreign" character.

Example, let's say in the original source document (DOCX, InDesign, etc. etc.):

Code:

I went to the caf<span lang="fr">é</span>.

On the print-book surface, it looks "normal"... but when exported to ebook, let's say their workflow strips lang:

Code:

I went to the caf<span>é</span>.

And lots of times it occurs from copying/pasting from other sources, etc. Like a designer might copy/paste HTML from a site into InDesign, and all this other cruft gets carried over. They wipe it and reformat it in InDesign, but some of the hidden junk is still there.

I was scratching my head over a similar "bug" for years. One of the clients I work for exports from InDesign, and every so often I would just get very strange "Arabic-Indic Numbers":

InDesign EPUB Export:

Quote:

For more information, see Book title (١٩٣٢, Publisher), p. 123.

Actual Correct:

Quote:

For more information, see Book title (1932, Publisher), p. 123.

You couldn't search the InDesign file for the string "١٩٣٢", such characters didn't exist!

Turns out, wherever they copied/pasted from occasionally had random teeny sections of text marked as lang="ar-SA".

So in PDF, it exports and looks fine.
In the PDF tags, it exports and looks fine (English).
In other formats, it exports and looks fine...
In EPUB, InDesign automatically converted the 1932 into indic characters, because it thought it was written in Arabic.

I think some of this might also get introduced when the main OS/program language is different from the book language. Let's say you send yours over to a French typesetter who has their InDesign set to French. Well, some of that crap might sneak in, even if they try their best to change everything to English. (I recently saw this with a Homeschooling book I read for book club...)

Sometimes you run spellcheck, make a few corrections, and in the backend, the program "helpfully" adds lang information to that specific word.

Quote:

Originally Posted by JSWolf

Role, epub:type, figure, section and other stuff that's actually not needed is what I get rid of.

That stuff's more helpful for Accessibility.