MobileRead Forums - View Single Post - pacify.py (Text reformatter / RTF extractor)

ekaser · 09-03-2009, 12:18 PM

Quote:

Originally Posted by ahi

Next consideration: how to architect this for internationalization.

Maybe have the main program contain a "skeleton" of all processing functions, which would then (based on command line options and/or imported metadata) in turn call language-specific versions of the processing questions on the fly at runtime?

FixParagraphs(text) would load FixParagraphs.hu.py or FixParagraphs.en.py depending on language and do an eval("FixParagraphs_"+curlang+"(text)")

Truly, a nasty problem.

Formatting of the text is DEFINITELY something that can vary greatly from language to language. For example, in English, a question always has a question mark at the END of the sentence. In Spanish, there's a mark at the START and end of the sentence.

I'm afraid I'm not multi-lingual aware/talented enough to help much with this one. MY thought would be that you'd almost need completely separate "reformatting modules" for each language, as trying to come up with any kind of scripting or rule-based structure seems incredibly complex and difficult. You almost have to custom-create it for each language. But then, there are other folks MUCH smarter and more educated in that subject than me!