Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 12-07-2014, 01:17 PM   #1
clemens14
Enthusiast
clemens14 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Jul 2010
Device: none
Best workflow for data->database->epub?

Hello,

I am planning on doing a free epub (and perhaps a related website) on Japanese writing (with a lot of detailed information for each ideogram taken from various books) and I would like to work in a way that I can collect all the information in a consistent way, create a database and then easily put the information together to create an epub with a proper layout (and perhaps also a website online), with an entry for each ideogram (perhaps adding also specific tags for the website).

I was wondering if you could give me advice on how to do it in a way that I can minimize copying and pasting the information many times, since I would like to dedicate most of the time to the research proper.

Specifically, how to construct a "simple" database (including pictures of the various versions of the ideograms) that can be easily exported in a way that can go easily/directly on the epub (with "factsheets" for each ideogram, whether each one of these is on one page depends on the reader).

The layout I was thinking is quite standard (for East Asian languages) and you can have an idea looking at these pictures:
http://myweb.facstaff.wwu.edu/yusa/basickanji.shtml
http://www.neilsattin.com/wp-content...anjiuni001.JPG (that's not me)
Basically a bigger picture on the left and smaller pictures on the right, various ways of reading the ideogram, the meanings and various compound words that use the ideogram in question, together with their meanings

Thank you very much in advance for your kind advice.

Cheers,

Clemens

Last edited by clemens14; 12-07-2014 at 01:19 PM.
clemens14 is offline   Reply With Quote
Old 12-09-2014, 12:45 PM   #2
odedta
Addict
odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.
 
Posts: 398
Karma: 96448
Join Date: Dec 2013
Device: iPad
Well, the most popular language today is PHP and the most popular content management system (CMS) is Wordpress. However, what you're asking for is specific, do you have an knowledge in coding whatsoever?

I recently created a project called Social DRM where people could upload their epubs and get some stuff embedded in it, it is about 80% complete from initial prototype and has some of the code you're looking for as far as I could understand from your message.

If you elaborated more on this matter maybe I could help you out.
odedta is offline   Reply With Quote
Advert
Old 12-10-2014, 10:00 AM   #3
clemens14
Enthusiast
clemens14 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Jul 2010
Device: none
Hi,

thank you for your kind reply.
Indeed the situation is very specific.
The main idea is that I want to go through various books, collecting data (both in text format and image format) related to ideograms and put them together in a coherent way, this is why I was thinking about a database, which doesn't necessarily have to be online. The idea is to have a well defined input mask, so data is uniform

Then, from the data inserted in the database, I would like to make a digital text that can be easily formatted (CSS?).
I was thinking about epub, since I have worked with it quite a bit, although not with the latest standards, but other formats would be fine as well.

With the same data, I would also like to make a website (this reinforces my idea that perhaps epub is the ideal format to consider for the ebook).

I would like to accomplish this the smoothest way possible, i.e not having to insert data and then copy and paste around the various information many times (for example once for the database, once when making the epub and once more for the website).

I know a little bit of coding, or better, I can follow and use a bit the code others have made, although my knowledge is very limited. I have used a bit of php (written by others) to batch fix (mainly footnote problems) various epubs made with Indesign 5.5, since it didn't export too well to epub, then fixed some more with Sigil, mainly by modifying the css files.

If you wish you can PM me, but it is also fine to have the discussion public if you prefer.

Thank you again,

Clemens

p.s. another option I was suggested was through MultiMarkDown, making one "factsheet" fot each ideogram considered, but I am a bit worried about the images and the internal references, therefore I though perhaps a proper database might be more solid (I am talikng about 5,6 thousand ideograms)
clemens14 is offline   Reply With Quote
Old 12-13-2014, 06:08 PM   #4
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Hey, guys:

What about something in XML? Wouldn't that work for him? Put all the data into XML, and then use an XSLT to transform it?

Or...can anyone think of a way to slap it into a DB, and export THAT into XML? From which he could then use an XSLT to get to ePUB?

Thoughts, gang?

Hitch
Hitch is offline   Reply With Quote
Old 12-13-2014, 11:19 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by clemens14 View Post
Basically a bigger picture on the left and smaller pictures on the right, various ways of reading the ideogram, the meanings and various compound words that use the ideogram in question, together with their meanings
Now, you are saying the word "images", but do you really mean CHARACTERS?

For example, here is a site that I found that lists many kanji characters in UTF-8:

http://www.rikai.com/library/kanjita....unicode.shtml

Or here is a list of something similar to what you want in HTML (kanji, unicode codepoint, Henshall number, meanings):

http://www.aule-browser.com/kanji/he...y-unicode.html

Each Kanji on that site looked to be split like this:

Code:
<span class="kanji">傍</span>
<span class="UCS">&nbsp;&nbsp;&nbsp;508D&nbsp;&nbsp;</span>
<span class="kid">1815&nbsp;&nbsp;</span>
<span class="m1">bystander</span>
<span class="mngs">&nbsp;&nbsp;&middot;&nbsp;&nbsp;side, besides, while, nearby, 3rd person</span><br />
For the EPUB code itself, I personally would avoid a hideous nest of tables (tables will most likely break when fonts get extremely large). So I would just do something along these lines for each Kanji:

Code:
<div class="whole">
	<p class="kanji">傍</p>
	<p class="altKanji">侀 侁 侂</p>
	<p class="mainmeaning">bystander</p>
	<p class="altmeaning">side, besides, while, nearby, 3rd person</p>
	<p class="thoroughexplanation">Blah blah blah, blah blah blah, this Kanji was used from the time period of ABCD-WXYZ.</p>
	<p class="thoroughexplanation">This is commonly used in business terms.</p>
	<p class="examplesentence">"This is an example sentence with this word."</p>
</div>
Using the actual UTF-8 characters, and then embedding a Unicode Font (for example, Droid Sans Fallback (font used in Android)) will then allow you to scale to any size, with zero loss (vastly superior compared to you using GIFs/PNGs of each character):

侀 侁 侂 侃 侄 侅 來 侇 侈 侉 侊 例 侌 侍 侎 侏
偐 偑 偒 偓 偔 偕 偖 偗 偘 偙 做 偛 停 偝 偞 偟
劐 劑 劒 劓 劔 劕 劖 劗 劘 劙 劚 力 劜 劝 办 功

I would probably split each Kanji into its own file, and do my own organizing/combining elsewhere.

For example, if I then wanted to create a giant HTML file of all of the words dealing with "numbers", I would just be able to create an outside program, which would say: merge the HTML files for:

一 (one), 二 (two), 三 (three), 四 (four), 五 (five), 六 (six), 七 (seven), 八 (eight), 九 (nine), 十 (ten).

Quote:
Originally Posted by Hitch View Post
What about something in XML? Wouldn't that work for him? Put all the data into XML, and then use an XSLT to transform it?
Depends on how much coding, or how familiar with XSLT, and depends again, on what sort of stuff clemens14 was meaning to actually DO with the database.

If the entire book was just to match the look of the images linked in Post #1, I don't see a problem with merging individual HTML files together... If you wanted to do crazy cross-references + other madness... that might be a different story.

Sadly, I don't know enough about Kanji, to know how exactly best this could be organized... all I know is UTF-8 codepoints.

Are these books organized in some sort of "alphabetical" order? Or do they organize by themes (numbers, weather, business, etc. etc.)?
Tex2002ans is offline   Reply With Quote
Advert
Old 12-14-2014, 05:08 PM   #6
clemens14
Enthusiast
clemens14 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Jul 2010
Device: none
Thank you for the repies and the links.
By images, I actually intended images, in the sense that I would use the unicode characters (and their variants), but I would also like to add scanned versions of some calligraphic styles.
I didn't know about Droid Sans Fallback, but it looks like an interesting font to embed.
XML+XSLT sounds interesting although I am not too familiar with it (understand the concept, but never directly worked with it). How should it be implemented (ad specifically through what software)? I am on Debian, but Windows 7 is also fine.
I chatted with Odedta on the problem (the need of a way to input the data, both text and images, in a coherent way and then do both an epub and a website) and he suggested a mysql database + a CMS (Wordpress) that can allow the needed data to be inserted (though some sort of form/input mask) and export capabilities through plugins, and he is kindly looking into that path.
Any idea of what could be used in this regard?

Cheers,

Clemens
clemens14 is offline   Reply With Quote
Old 12-14-2014, 10:02 PM   #7
Freehunter
Connoisseur
Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.Freehunter ought to be getting tired of karma fortunes by now.
 
Freehunter's Avatar
 
Posts: 68
Karma: 786508
Join Date: Aug 2014
Location: Great Lakes
Device: K4PC, PW2, HD7, calibre
Just stumbled on a couple of sites you might find useful, I was trying to interpret some characters in a different forum. The first interprets unicode:

http://www.fileformat.info/info/unic...57fa/index.htm

the second includes meanings and the strokes to create the chracter:

http://www.jisho.org/kanji/details/%E5%9F%BA

Thought might be useful for reference, good luck with your project.
Freehunter is offline   Reply With Quote
Old 12-20-2014, 06:55 AM   #8
clemens14
Enthusiast
clemens14 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Jul 2010
Device: none
Thank you for the links
clemens14 is offline   Reply With Quote
Reply

Tags
database, epub creation, workflow


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
OCR to EPUB Best Workflow Pumpkin Soup Workshop 19 04-22-2014 03:05 PM
Workflow: Converting to Kindle and EPUB slowsmile Workshop 0 05-15-2013 01:35 AM
Saving my own data in Calibre's database Pepin33 Development 3 10-05-2012 10:57 AM
Opinion on workflow (and enhancing it) - research-type workflow TheDarkTrumpet Which one should I buy? 8 03-02-2009 10:41 AM


All times are GMT -4. The time now is 02:06 PM.


MobileRead.com is a privately owned, operated and funded community.