View Single Post
Old 03-18-2013, 04:07 PM   #1
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 501
Karma: 560981
Join Date: Sep 2010
Location: Norway
Device: prs-t1, phone/Cool Reader, tablet/BlueFire, Nook Simple
Making P.Clark library database

I believe the cardinal virtues of a programmer are held to be laziness, impatience, and hubris. I suppose according to those criteria I may consider myself qualified to that epithet; I cannot be bothered to search through the Patricia Clark Library for books because of the limited search criteria, I don't want to wait till the green-lettered masters improve things, and how hard can it be to make a half-decent database anyway?
So I've downloaded the current status of the library, and set to work (re)learning SQL. To make it at all probable I will achieve my goal, I've limited myself to mobi and epub, and try to extract the following information about the books from the posting head:
  • Author(s)
  • Title
  • Language
  • Version
  • Date
I have some vague concept about what database normalization is, and try to implement it.
It seemed fitting to start with Patricia Clark's opus, the meta-data have been cleansed and had rudimentary QA, and are being imported and linked in the database.
I use MySQL.
覧覧覧覧
Update: Just to prove I'm not bluffing, I attach a snapshot of the database. Still very pre-alpha and messy. Also attached: A forlorn hope that somebody who actually knows about databases will give it a glance.
Most important lesson learnt so far: Stuffing information into a database is much easier than prying it out again.

20.03.13:New snapshot. Shaping up. Included the output of the half-dozen most productive contributors. So all you have to do now to get a list of titles in German of authors whose name includes "Doyle" is to write
Code:
select name from titles 
where id in 
(select title_id from 
books inner join book_author 
on book_author.book_id=books.id 
and book_author.author_id in 
(select id from authors where surname like '%Doyle') 
and books.language_id=(select id from languages where name='german'));
Hmm. Well, some small sacrifices in usability had to be made in order to make the search functionality more powerful
------
24.03: New snapshot. All entries as of a week ago are included. I attach the text file I use as a starting point; it's spreadsheet-friendly.
table books is the main table, one row per book, link_id refers to the id on the mobileread web site. titles are stored in the titles table, linked through books.title_id <-> titles.id. authors and books can have many-to-many relations (ditto languages), therefore the table book_author lists book_ids with corresponding author_ids, both fields non-unique.
Attached Files
File Type: gz mobileread.mysql.20130324.gz (520.9 KB, 62 views)
File Type: gz mobileread.txt.20130324.gz (256.3 KB, 64 views)

Last edited by SBT; 03-24-2013 at 06:26 PM. Reason: attached new db snapshot
SBT is offline   Reply With Quote