MobileRead Forums - View Single Post

SBT · 03-18-2013, 04:07 PM

I believe the cardinal virtues of a programmer are held to be laziness, impatience, and hubris. I suppose according to those criteria I may consider myself qualified to that epithet; I cannot be bothered to search through the Patricia Clark Library for books because of the limited search criteria, I don't want to wait till the green-lettered masters improve things, and how hard can it be to make a half-decent database anyway?
So I've downloaded the current status of the library, and set to work (re)learning SQL. To make it at all probable I will achieve my goal, I've limited myself to mobi and epub, and try to extract the following information about the books from the posting head:

Author(s)
Title
Language
Version
Date

I have some vague concept about what database normalization is, and try to implement it.
It seemed fitting to start with Patricia Clark's opus, the meta-data have been cleansed and had rudimentary QA, and are being imported and linked in the database.
I use MySQL.
————————
Update: Just to prove I'm not bluffing, I attach a snapshot of the database. Still very pre-alpha and messy. Also attached: A forlorn hope that somebody who actually knows about databases will give it a glance.
Most important lesson learnt so far: Stuffing information into a database is much easier than prying it out again.
–––
20.03.13:New snapshot. Shaping up. Included the output of the half-dozen most productive contributors. So all you have to do now to get a list of titles in German of authors whose name includes "Doyle" is to write

Code:

select name from titles 
where id in 
(select title_id from 
books inner join book_author 
on book_author.book_id=books.id 
and book_author.author_id in 
(select id from authors where surname like '%Doyle') 
and books.language_id=(select id from languages where name='german'));

Hmm. Well, some small sacrifices in usability had to be made in order to make the search functionality more powerful

------
24.03: New snapshot. All entries as of a week ago are included. I attach the text file I use as a starting point; it's spreadsheet-friendly.
table books is the main table, one row per book, link_id refers to the id on the mobileread web site. titles are stored in the titles table, linked through books.title_id <-> titles.id. authors and books can have many-to-many relations (ditto languages), therefore the table book_author lists book_ids with corresponding author_ids, both fields non-unique.