Quote:
Originally Posted by bookman156
Yes, the Chinese characters plus spaces. Unicode revisions don't matter here [...]. Then the GREP search styles characters in those ranges with a Chinese font.
|
Looking a bit closer, it looks like there's been ~10,000 new CJK characters added to Unicode since then.
(~5,700 in Unicode 8.0 + ~5,000 in Unicode 13.0.)
(And ~4,200 more CJK characters are going to be added in Unicode 15.0, which will be coming out later this year.)
That
\x{} numbers method would fail, if it doesn't cover all those new cases.
Where
\p{Han} would detect all characters, as long as the program understands the latest Unicode.
Quote:
Originally Posted by bookman156
I haven't yet exported an InDesign file with Chinese as EPUB, but the Chinese would have a span name [...]. Not sure if InDesign would put the span around individual characters or phrases.
|
I'm unsure too, but if it's anything like what I've seen, it'll be ugly! lol.
But if you want to make your life easier...
Make sure you
create a Character Style.
You can:
- Give it an easy name, like "chinese".
- Assign it a CJK font.
- Mark as "Chinese" language.
- If exporting to PDF (or HTML), this is very important.
That will make it much easier to convert to clean HTML <span>+classes.
(InDesign also has this great thing called
"Style Mapping" which is an enormous help too... if you use your Styles properly!)
- - - - -
Complete Side Note: One really ugly thing I just learned in Microsoft Word.
If you type a link, like:
Code:
http://www.example.com/
Then come back to it at a later date and add text between:
Code:
http://www.exa123mple.com/
in the internals, Word splits it into 3 chunks:
and points all 3 pieces to the same exact URL.
So instead of this in your HTML:
Code:
<a href="http://www.exa123mple.com/">http://www.exa123mple.com/</a>
you would have this:
Code:
<a href="http://www.exa123mple.com/">http://www.exa</a><a href="http://www.exa123mple.com/">123</a><a href="http://www.exa123mple.com/">mple.com/</a>
(LibreOffice recently fixed this for 7.5.)
I'm betting InDesign has all sorts of mess like that too.
And this could explain some of the real disastrous documents I've gotten, where there are millions of overlapping <span>s which seem to all be the same code.