View Single Post
Old 07-21-2022, 03:11 PM   #249
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by bookman156 View Post
Yes, the Chinese characters plus spaces. Unicode revisions don't matter here [...]. Then the GREP search styles characters in those ranges with a Chinese font.
Looking a bit closer, it looks like there's been ~10,000 new CJK characters added to Unicode since then.

(~5,700 in Unicode 8.0 + ~5,000 in Unicode 13.0.)

(And ~4,200 more CJK characters are going to be added in Unicode 15.0, which will be coming out later this year.)

That \x{} numbers method would fail, if it doesn't cover all those new cases.

Where \p{Han} would detect all characters, as long as the program understands the latest Unicode.

Quote:
Originally Posted by bookman156 View Post
I haven't yet exported an InDesign file with Chinese as EPUB, but the Chinese would have a span name [...]. Not sure if InDesign would put the span around individual characters or phrases.
I'm unsure too, but if it's anything like what I've seen, it'll be ugly! lol.

But if you want to make your life easier...

Make sure you create a Character Style.

You can:
  • Give it an easy name, like "chinese".
  • Assign it a CJK font.
  • Mark as "Chinese" language.
    • If exporting to PDF (or HTML), this is very important.

That will make it much easier to convert to clean HTML <span>+classes.

(InDesign also has this great thing called "Style Mapping" which is an enormous help too... if you use your Styles properly!)

- - - - -

Complete Side Note: One really ugly thing I just learned in Microsoft Word.

If you type a link, like:

Code:
http://www.example.com/
Then come back to it at a later date and add text between:

Code:
http://www.exa123mple.com/
in the internals, Word splits it into 3 chunks:

and points all 3 pieces to the same exact URL.

So instead of this in your HTML:

Code:
<a href="http://www.exa123mple.com/">http://www.exa123mple.com/</a>
you would have this:

Code:
<a href="http://www.exa123mple.com/">http://www.exa</a><a href="http://www.exa123mple.com/">123</a><a href="http://www.exa123mple.com/">mple.com/</a>
(LibreOffice recently fixed this for 7.5.)

I'm betting InDesign has all sorts of mess like that too.

And this could explain some of the real disastrous documents I've gotten, where there are millions of overlapping <span>s which seem to all be the same code.

Last edited by Tex2002ans; 07-21-2022 at 03:18 PM.
Tex2002ans is offline   Reply With Quote