View Single Post
Old 06-10-2021, 01:53 PM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Greg Anos View Post
Is there a hex guide for defining these character sets (as hex strings)? (Like em dashes being defined a a 3 hex character string.)
Why, exactly, are you trying to use hex codes instead of just using the actual character?

In EPUB, the only special entity you have to worry about is the Non-Breaking Space (  or  ).

Everything else can use the actual Unicode characters:

— = Em Dash

There's no need to clutter your code with —.

Quote:
Originally Posted by Greg Anos View Post
I am doing a long project of scanning and converting to ePub 30 years of a specialty hobby journal (pro bono publico).
Great. What tools are you using to scan + OCR?

Quote:
Originally Posted by Greg Anos View Post
I need to use the occasional non-english character (letter with tilde, umlat, and French letter characters).
Only because the OCR isn't recognizing these characters?

OCR outputs:
  • facade
  • ninos

but your actual article says:
  • façade
  • niños

Usually, if you enable the proper OCR languages, these accented characters will be recognized.

Side Note: I wrote a bit about OCR + German/Spanish/French accents in "Abbyy Finereader 15 gothic/Fraktur Altdeutsch/Oldgerman" (Post #5).

For example, English only recognizes A-Z while Spanish will recognize A-Z + a few more:
  • ÁÉÍÑÓÚÜáéíñóúü

So if you have a 98% English book with 2% Spanish names/words, you'd tell OCR this is an English AND Spanish book. This would catch all the little accents on the ñ and á and é.

Last edited by Tex2002ans; 06-10-2021 at 01:59 PM.
Tex2002ans is offline   Reply With Quote