Best workflow for data->database->epub?

clemens14 · 12-07-2014, 01:17 PM

Hello,

I am planning on doing a free epub (and perhaps a related website) on Japanese writing (with a lot of detailed information for each ideogram taken from various books) and I would like to work in a way that I can collect all the information in a consistent way, create a database and then easily put the information together to create an epub with a proper layout (and perhaps also a website online), with an entry for each ideogram (perhaps adding also specific tags for the website).

I was wondering if you could give me advice on how to do it in a way that I can minimize copying and pasting the information many times, since I would like to dedicate most of the time to the research proper.

Specifically, how to construct a "simple" database (including pictures of the various versions of the ideograms) that can be easily exported in a way that can go easily/directly on the epub (with "factsheets" for each ideogram, whether each one of these is on one page depends on the reader).

The layout I was thinking is quite standard (for East Asian languages) and you can have an idea looking at these pictures:
http://myweb.facstaff.wwu.edu/yusa/basickanji.shtml
http://www.neilsattin.com/wp-content...anjiuni001.JPG (that's not me)
Basically a bigger picture on the left and smaller pictures on the right, various ways of reading the ideogram, the meanings and various compound words that use the ideogram in question, together with their meanings

Thank you very much in advance for your kind advice.

Cheers,

Clemens

odedta · 12-09-2014, 12:45 PM

Well, the most popular language today is PHP and the most popular content management system (CMS) is Wordpress. However, what you're asking for is specific, do you have an knowledge in coding whatsoever?

I recently created a project called Social DRM where people could upload their epubs and get some stuff embedded in it, it is about 80% complete from initial prototype and has some of the code you're looking for as far as I could understand from your message.

If you elaborated more on this matter maybe I could help you out.

clemens14 · 12-10-2014, 10:00 AM

Hi,

thank you for your kind reply.
Indeed the situation is very specific.
The main idea is that I want to go through various books, collecting data (both in text format and image format) related to ideograms and put them together in a coherent way, this is why I was thinking about a database, which doesn't necessarily have to be online. The idea is to have a well defined input mask, so data is uniform

Then, from the data inserted in the database, I would like to make a digital text that can be easily formatted (CSS?).
I was thinking about epub, since I have worked with it quite a bit, although not with the latest standards, but other formats would be fine as well.

With the same data, I would also like to make a website (this reinforces my idea that perhaps epub is the ideal format to consider for the ebook).

I would like to accomplish this the smoothest way possible, i.e not having to insert data and then copy and paste around the various information many times (for example once for the database, once when making the epub and once more for the website).

I know a little bit of coding, or better, I can follow and use a bit the code others have made, although my knowledge is very limited. I have used a bit of php (written by others) to batch fix (mainly footnote problems) various epubs made with Indesign 5.5, since it didn't export too well to epub, then fixed some more with Sigil, mainly by modifying the css files.

If you wish you can PM me, but it is also fine to have the discussion public if you prefer.

Thank you again,

Clemens

p.s. another option I was suggested was through MultiMarkDown, making one "factsheet" fot each ideogram considered, but I am a bit worried about the images and the internal references, therefore I though perhaps a proper database might be more solid (I am talikng about 5,6 thousand ideograms)

Hitch · 12-13-2014, 06:08 PM

Hey, guys:

What about something in XML? Wouldn't that work for him? Put all the data into XML, and then use an XSLT to transform it?

Or...can anyone think of a way to slap it into a DB, and export THAT into XML? From which he could then use an XSLT to get to ePUB?

Thoughts, gang?

Hitch

Tex2002ans · 12-13-2014, 11:19 PM

Quote:

Originally Posted by clemens14

Basically a bigger picture on the left and smaller pictures on the right, various ways of reading the ideogram, the meanings and various compound words that use the ideogram in question, together with their meanings

Now, you are saying the word "images", but do you really mean CHARACTERS?

For example, here is a site that I found that lists many kanji characters in UTF-8:

http://www.rikai.com/library/kanjita....unicode.shtml

Or here is a list of something similar to what you want in HTML (kanji, unicode codepoint, Henshall number, meanings):

http://www.aule-browser.com/kanji/he...y-unicode.html

Each Kanji on that site looked to be split like this:

Code:

<span class="kanji">傍</span>
<span class="UCS">&nbsp;&nbsp;&nbsp;508D&nbsp;&nbsp;</span>
<span class="kid">1815&nbsp;&nbsp;</span>
<span class="m1">bystander</span>
<span class="mngs">&nbsp;&nbsp;&middot;&nbsp;&nbsp;side, besides, while, nearby, 3rd person</span><br />

For the EPUB code itself, I personally would avoid a hideous nest of tables (tables will most likely break when fonts get extremely large). So I would just do something along these lines for each Kanji:

Code:

<div class="whole">
	<p class="kanji">傍</p>
	<p class="altKanji">侀 侁 侂</p>
	<p class="mainmeaning">bystander</p>
	<p class="altmeaning">side, besides, while, nearby, 3rd person</p>
	<p class="thoroughexplanation">Blah blah blah, blah blah blah, this Kanji was used from the time period of ABCD-WXYZ.</p>
	<p class="thoroughexplanation">This is commonly used in business terms.</p>
	<p class="examplesentence">"This is an example sentence with this word."</p>
</div>

Using the actual UTF-8 characters, and then embedding a Unicode Font (for example, Droid Sans Fallback (font used in Android)) will then allow you to scale to any size, with zero loss (vastly superior compared to you using GIFs/PNGs of each character):

侀侁侂侃侄侅來侇侈侉侊例侌侍侎侏
偐偑偒偓偔偕偖偗偘偙做偛停偝偞偟
劐劑劒劓劔劕劖劗劘劙劚力劜劝办功

I would probably split each Kanji into its own file, and do my own organizing/combining elsewhere.

For example, if I then wanted to create a giant HTML file of all of the words dealing with "numbers", I would just be able to create an outside program, which would say: merge the HTML files for:

一 (one), 二 (two), 三 (three), 四 (four), 五 (five), 六 (six), 七 (seven), 八 (eight), 九 (nine), 十 (ten).

Quote:

Originally Posted by Hitch

What about something in XML? Wouldn't that work for him? Put all the data into XML, and then use an XSLT to transform it?

Depends on how much coding, or how familiar with XSLT, and depends again, on what sort of stuff clemens14 was meaning to actually DO with the database.

If the entire book was just to match the look of the images linked in Post #1, I don't see a problem with merging individual HTML files together... If you wanted to do crazy cross-references + other madness... that might be a different story.

Sadly, I don't know enough about Kanji, to know how exactly best this could be organized... all I know is UTF-8 codepoints.

Are these books organized in some sort of "alphabetical" order? Or do they organize by themes (numbers, weather, business, etc. etc.)?

clemens14 · 12-14-2014, 05:08 PM

Thank you for the repies and the links.
By images, I actually intended images, in the sense that I would use the unicode characters (and their variants), but I would also like to add scanned versions of some calligraphic styles.
I didn't know about Droid Sans Fallback, but it looks like an interesting font to embed.
XML+XSLT sounds interesting although I am not too familiar with it (understand the concept, but never directly worked with it). How should it be implemented (ad specifically through what software)? I am on Debian, but Windows 7 is also fine.
I chatted with Odedta on the problem (the need of a way to input the data, both text and images, in a coherent way and then do both an epub and a website) and he suggested a mysql database + a CMS (Wordpress) that can allow the needed data to be inserted (though some sort of form/input mask) and export capabilities through plugins, and he is kindly looking into that path.
Any idea of what could be used in this regard?

Cheers,

Clemens

Freehunter · 12-14-2014, 10:02 PM

Just stumbled on a couple of sites you might find useful, I was trying to interpret some characters in a different forum. The first interprets unicode:

http://www.fileformat.info/info/unic...57fa/index.htm

the second includes meanings and the strokes to create the chracter:

http://www.jisho.org/kanji/details/%E5%9F%BA

Thought might be useful for reference, good luck with your project.

clemens14 · 12-20-2014, 06:55 AM

Thank you for the links

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OCR to EPUB Best Workflow	Pumpkin Soup	Workshop	19	04-22-2014 03:05 PM
Workflow: Converting to Kindle and EPUB	slowsmile	Workshop	0	05-15-2013 01:35 AM
Saving my own data in Calibre's database	Pepin33	Development	3	10-05-2012 10:57 AM
Opinion on workflow (and enhancing it) - research-type workflow	TheDarkTrumpet	Which one should I buy?	8	03-02-2009 10:41 AM

12-07-2014, 01:17 PM	#1
clemens14 Enthusiast Posts: 26 Karma: 10 Join Date: Jul 2010 Device: none	Best workflow for data->database->epub? Hello, I am planning on doing a free epub (and perhaps a related website) on Japanese writing (with a lot of detailed information for each ideogram taken from various books) and I would like to work in a way that I can collect all the information in a consistent way, create a database and then easily put the information together to create an epub with a proper layout (and perhaps also a website online), with an entry for each ideogram (perhaps adding also specific tags for the website). I was wondering if you could give me advice on how to do it in a way that I can minimize copying and pasting the information many times, since I would like to dedicate most of the time to the research proper. Specifically, how to construct a "simple" database (including pictures of the various versions of the ideograms) that can be easily exported in a way that can go easily/directly on the epub (with "factsheets" for each ideogram, whether each one of these is on one page depends on the reader). The layout I was thinking is quite standard (for East Asian languages) and you can have an idea looking at these pictures: http://myweb.facstaff.wwu.edu/yusa/basickanji.shtml http://www.neilsattin.com/wp-content...anjiuni001.JPG (that's not me) Basically a bigger picture on the left and smaller pictures on the right, various ways of reading the ideogram, the meanings and various compound words that use the ideogram in question, together with their meanings Thank you very much in advance for your kind advice. Cheers, Clemens Last edited by clemens14; 12-07-2014 at 01:19 PM.

12-09-2014, 12:45 PM	#2
odedta Addict Posts: 398 Karma: 96448 Join Date: Dec 2013 Device: iPad	Well, the most popular language today is PHP and the most popular content management system (CMS) is Wordpress. However, what you're asking for is specific, do you have an knowledge in coding whatsoever? I recently created a project called Social DRM where people could upload their epubs and get some stuff embedded in it, it is about 80% complete from initial prototype and has some of the code you're looking for as far as I could understand from your message. If you elaborated more on this matter maybe I could help you out.

12-10-2014, 10:00 AM	#3
clemens14 Enthusiast Posts: 26 Karma: 10 Join Date: Jul 2010 Device: none	Hi, thank you for your kind reply. Indeed the situation is very specific. The main idea is that I want to go through various books, collecting data (both in text format and image format) related to ideograms and put them together in a coherent way, this is why I was thinking about a database, which doesn't necessarily have to be online. The idea is to have a well defined input mask, so data is uniform Then, from the data inserted in the database, I would like to make a digital text that can be easily formatted (CSS?). I was thinking about epub, since I have worked with it quite a bit, although not with the latest standards, but other formats would be fine as well. With the same data, I would also like to make a website (this reinforces my idea that perhaps epub is the ideal format to consider for the ebook). I would like to accomplish this the smoothest way possible, i.e not having to insert data and then copy and paste around the various information many times (for example once for the database, once when making the epub and once more for the website). I know a little bit of coding, or better, I can follow and use a bit the code others have made, although my knowledge is very limited. I have used a bit of php (written by others) to batch fix (mainly footnote problems) various epubs made with Indesign 5.5, since it didn't export too well to epub, then fixed some more with Sigil, mainly by modifying the css files. If you wish you can PM me, but it is also fine to have the discussion public if you prefer. Thank you again, Clemens p.s. another option I was suggested was through MultiMarkDown, making one "factsheet" fot each ideogram considered, but I am a bit worried about the images and the internal references, therefore I though perhaps a proper database might be more solid (I am talikng about 5,6 thousand ideograms)

12-13-2014, 06:08 PM	#4
Hitch Bookmaker & Cat Slave Posts: 11,462 Karma: 158448243 Join Date: Apr 2010 Location: Phoenix, AZ Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2	Hey, guys: What about something in XML? Wouldn't that work for him? Put all the data into XML, and then use an XSLT to transform it? Or...can anyone think of a way to slap it into a DB, and export THAT into XML? From which he could then use an XSLT to get to ePUB? Thoughts, gang? Hitch

12-14-2014, 05:08 PM	#6
clemens14 Enthusiast Posts: 26 Karma: 10 Join Date: Jul 2010 Device: none	Thank you for the repies and the links. By images, I actually intended images, in the sense that I would use the unicode characters (and their variants), but I would also like to add scanned versions of some calligraphic styles. I didn't know about Droid Sans Fallback, but it looks like an interesting font to embed. XML+XSLT sounds interesting although I am not too familiar with it (understand the concept, but never directly worked with it). How should it be implemented (ad specifically through what software)? I am on Debian, but Windows 7 is also fine. I chatted with Odedta on the problem (the need of a way to input the data, both text and images, in a coherent way and then do both an epub and a website) and he suggested a mysql database + a CMS (Wordpress) that can allow the needed data to be inserted (though some sort of form/input mask) and export capabilities through plugins, and he is kindly looking into that path. Any idea of what could be used in this regard? Cheers, Clemens

12-14-2014, 10:02 PM	#7
Freehunter Connoisseur Posts: 68 Karma: 786508 Join Date: Aug 2014 Location: Great Lakes Device: K4PC, PW2, HD7, calibre	Just stumbled on a couple of sites you might find useful, I was trying to interpret some characters in a different forum. The first interprets unicode: http://www.fileformat.info/info/unic...57fa/index.htm the second includes meanings and the strokes to create the chracter: http://www.jisho.org/kanji/details/%E5%9F%BA Thought might be useful for reference, good luck with your project.

12-20-2014, 06:55 AM	#8
clemens14 Enthusiast Posts: 26 Karma: 10 Join Date: Jul 2010 Device: none	Thank you for the links