MobileRead Forums - View Single Post - Best workflow for data->database->epub?

Tex2002ans · 12-13-2014, 11:19 PM

Quote:

Originally Posted by clemens14

Basically a bigger picture on the left and smaller pictures on the right, various ways of reading the ideogram, the meanings and various compound words that use the ideogram in question, together with their meanings

Now, you are saying the word "images", but do you really mean CHARACTERS?

For example, here is a site that I found that lists many kanji characters in UTF-8:

http://www.rikai.com/library/kanjita....unicode.shtml

Or here is a list of something similar to what you want in HTML (kanji, unicode codepoint, Henshall number, meanings):

http://www.aule-browser.com/kanji/he...y-unicode.html

Each Kanji on that site looked to be split like this:

Code:

<span class="kanji">傍</span>
<span class="UCS">&nbsp;&nbsp;&nbsp;508D&nbsp;&nbsp;</span>
<span class="kid">1815&nbsp;&nbsp;</span>
<span class="m1">bystander</span>
<span class="mngs">&nbsp;&nbsp;&middot;&nbsp;&nbsp;side, besides, while, nearby, 3rd person</span><br />

For the EPUB code itself, I personally would avoid a hideous nest of tables (tables will most likely break when fonts get extremely large). So I would just do something along these lines for each Kanji:

Code:

<div class="whole">
	<p class="kanji">傍</p>
	<p class="altKanji">侀 侁 侂</p>
	<p class="mainmeaning">bystander</p>
	<p class="altmeaning">side, besides, while, nearby, 3rd person</p>
	<p class="thoroughexplanation">Blah blah blah, blah blah blah, this Kanji was used from the time period of ABCD-WXYZ.</p>
	<p class="thoroughexplanation">This is commonly used in business terms.</p>
	<p class="examplesentence">"This is an example sentence with this word."</p>
</div>

Using the actual UTF-8 characters, and then embedding a Unicode Font (for example, Droid Sans Fallback (font used in Android)) will then allow you to scale to any size, with zero loss (vastly superior compared to you using GIFs/PNGs of each character):

侀侁侂侃侄侅來侇侈侉侊例侌侍侎侏
偐偑偒偓偔偕偖偗偘偙做偛停偝偞偟
劐劑劒劓劔劕劖劗劘劙劚力劜劝办功

I would probably split each Kanji into its own file, and do my own organizing/combining elsewhere.

For example, if I then wanted to create a giant HTML file of all of the words dealing with "numbers", I would just be able to create an outside program, which would say: merge the HTML files for:

一 (one), 二 (two), 三 (three), 四 (four), 五 (five), 六 (six), 七 (seven), 八 (eight), 九 (nine), 十 (ten).

Quote:

Originally Posted by Hitch

What about something in XML? Wouldn't that work for him? Put all the data into XML, and then use an XSLT to transform it?

Depends on how much coding, or how familiar with XSLT, and depends again, on what sort of stuff clemens14 was meaning to actually DO with the database.

If the entire book was just to match the look of the images linked in Post #1, I don't see a problem with merging individual HTML files together... If you wanted to do crazy cross-references + other madness... that might be a different story.

Sadly, I don't know enough about Kanji, to know how exactly best this could be organized... all I know is UTF-8 codepoints.

Are these books organized in some sort of "alphabetical" order? Or do they organize by themes (numbers, weather, business, etc. etc.)?