Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 12-16-2013, 07:08 PM   #1
devils_add
Member
devils_add began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2013
Device: none
Database Fork

Hi,
I am in an early stages of planning to fork Calibre-eBooks Database.
The first part is the redesign of database storage/organization. The way I envision it is that each instance of a book record, from "<author>\<Title>\" will be just some almost random numerical archived zip file <some number>.zip.
Inside the file I will have the ebooks and other data files associated with it.

The second part is the data itself. I am thinking of moving it to almost html style formatting. Like this
<book>
<file>filename</file>
<format>
<format:1>pdf</format:1>
<format:2>djvu</format:2>
</format>
<title>Some Title</title>
<author>
<author:1>first middle last</author:1>
<author:2>first middle last</author:2>
</author>
</book>
So, that it will be easier to append functionality in the future and will be easy to make it backwards and forward compatible just by ignoring unknown parts.
This will also allow for nesting tags and nesting other titles and for future expansion of functionality.
devils_add is offline   Reply With Quote
Old 12-17-2013, 03:46 AM   #2
aleyx
Addict
aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.
 
Posts: 245
Karma: 20386
Join Date: Sep 2010
Location: France
Device: Cybook Diva
Hm. I don't quite understand.

Are you trying to develop an alternate, drop-in replacement for the metadata.db + filesystem that Calibre uses for storage?
aleyx is offline   Reply With Quote
Advert
Old 12-17-2013, 01:56 PM   #3
devils_add
Member
devils_add began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2013
Device: none
Quote:
Originally Posted by aleyx View Post
Hm. I don't quite understand.

Are you trying to develop an alternate, drop-in replacement for the metadata.db + filesystem that Calibre uses for storage?
To some extent yes. The reason is that with the correct redesign of filesystem, Calibre will be able to organize almost everything. Therefore, it will need a more robust metadata.db. In addition, the reason for archiving is that it will allow to create an easy way to transfer items between libraries without having to worry that you will import it wrong and will have to edit metadata again, as everything will be inside that archive.
devils_add is offline   Reply With Quote
Old 12-17-2013, 02:24 PM   #4
aleyx
Addict
aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.
 
Posts: 245
Karma: 20386
Join Date: Sep 2010
Location: France
Device: Cybook Diva
Ah.

You do realize that with a segmented XML database (what you call "almost html style formatting") hidden away in .zip files, perfs will take pummelling not seen since Wile E. Coyote still tried to get himself a side serving of roasted roadrunner?

See, changing the filesystem hierarchy is one thing. In the end, it's just strings. But getting away from an RDBMS? That is not, I repeat, _not_, something you want to do.
aleyx is offline   Reply With Quote
Old 12-17-2013, 03:49 PM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by devils_add View Post
To some extent yes. The reason is that with the correct redesign of filesystem, Calibre will be able to organize almost everything. Therefore, it will need a more robust metadata.db. In addition, the reason for archiving is that it will allow to create an easy way to transfer items between libraries without having to worry that you will import it wrong and will have to edit metadata again, as everything will be inside that archive.
I'm pretty sure we already have that with the currently working system.
eschwartz is offline   Reply With Quote
Advert
Old 12-17-2013, 04:28 PM   #6
aleyx
Addict
aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.
 
Posts: 245
Karma: 20386
Join Date: Sep 2010
Location: France
Device: Cybook Diva
I think he wants to make it the main (and only) database. Now I'm no DBA, but it has me very scared.

Because you see, devils_add, if your DB is scattered into thousands of little XML files inside thousands of .zip, then you'll have to open all of those .zip and read all of those XML files every time you want to do anything, like, say, list titles. If you want to _search_, it's even worse, because then you'll have to open it all up again, _then_ make cross-references for basically every single field of every single XML file.

That's pure insanity. There's a reason DBMSs have been around since the '70. It's because it _works_.

Now XML/OPF files have their use, but database queries ain't it.

As someone who once had to convert an old, OLD flat-file-based DB to Access (which is still not a real database but less wrong), I beg you: spare yourself the pain.
aleyx is offline   Reply With Quote
Old 12-17-2013, 04:42 PM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
We already have xml backups saved with the book. As backups, which is where metadata xml belongs. Why on earth should the database be replaced to use this instead, purely for the purpose of fixing an imaginary problem?

What do you think databases were invented for anyway?
eschwartz is offline   Reply With Quote
Old 12-17-2013, 05:46 PM   #8
devils_add
Member
devils_add began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2013
Device: none
Guys, you are forgetting about the DMG file extension in Apple. Where everything the program needs to run is inside that file (which is an archive). So, what I am proposing is to have just the met file associated with record and record itself inside the archive which will be added into the main database. The only time the archive is written to is when files are added or when metadata is changed, and all other times it is opened is to extract a needed file to read it or to send it to the device.
Therefore, the main database file will be outside as it is right now. Also, you can have the main database link to virtual libraries databases for different, incompatible formats. In addition, this will allow for creating a single database for everything, with different iteration on front-end. So that you can have all your collections managed by just one database.
Sorry, for wordiness.

Also, with the archive architecture, you can keep some pdf books broken by chapter, and combine them on the fly as requested (resources available), so you don't have to download the full book, but just the chapters you need.

Last edited by devils_add; 12-17-2013 at 05:48 PM.
devils_add is offline   Reply With Quote
Old 12-17-2013, 06:17 PM   #9
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,457
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by devils_add View Post
Calibre will be able to organize almost everything. Therefore, it will need a more robust metadata.db.
Calibre can already do that to some degree

Calibre could be more general purpose in a user-friendly sense, if the user could define the labels for the Author/Book Title entities to whatever suits their purpose - e.g. Architect/Building; Software Package/Program; Director/Movie Name; Producer/Game etc.

If one could also add a third entity into the hierarchy then that would probably be enough to cover 90% of potential uses.

==================

I'm puzzled by what you mean by a 'more robust metadata.db'.

I've been using calibre for 2-3 years, and I use it one way or another on most days of the week. It has never crashed, I've never had to rebuild a database nor have I ever had to reinstall calibre.

I wish I could say the same for some other programs that I use - eg web browsers, editors, IDE's, photo and music library managers - even the file manager I use crashes at least once a week.

The only time performance has been an issue, was related to a custom column based on a union of 4 other custom columns - each of which was a list of Names. I intuited at the time I did it that I was pushing the edge of the envelope, so I had a Plan B for what to do when the envelope tore.

====================

I'm also intrigued as to how you would envisage implementing a multi-user server based implementation of your schema on different server platforms.

BR

Addenda : @devils_add - I missed seeing your most recent post before I posted this, vagaries of phone interruptus

Last edited by BetterRed; 12-17-2013 at 11:36 PM. Reason: addenda
BetterRed is online now   Reply With Quote
Old 12-18-2013, 04:40 AM   #10
aleyx
Addict
aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.
 
Posts: 245
Karma: 20386
Join Date: Sep 2010
Location: France
Device: Cybook Diva
Quote:
Originally Posted by devils_add View Post
Guys, you are forgetting about the DMG file extension in Apple.
Dude, you're forgetting the concept of platform-agnosticism. I can install Calibre on pretty much anything. You're suggesting that only MacOS is worthy of Calibre?

Quote:
Originally Posted by devils_add View Post
Where everything the program needs to run is inside that file (which is an archive). So, what I am proposing is to have just the met file associated with record and record itself inside the archive which will be added into the main database.
Soooo... Everything in one file? That's even _worse_. To access your bit of database (Hah! bit! Get it? ^_^), you'll have to look into a big proprietary file then into a small .zip then into an XML? And that big proprietary file can be a small as a few dozen MB for a few books, up to several GB for consequent libraries? My own library is ~950MB and I only have about 1500 books. There's libraries out there with tens of thousands. I/O will be a nightmare.

Quote:
Originally Posted by devils_add View Post
The only time the archive is written to is when files are added or when metadata is changed,
Which is basically every time you use Calibre.

Quote:
Originally Posted by devils_add View Post
and all other times it is opened is to extract a needed file to read it or to send it to the device. Therefore, the main database file will be outside as it is right now. Also, you can have the main database link to virtual libraries databases for different, incompatible formats. In addition, this will allow for creating a single database for everything, with different iteration on front-end. So that you can have all your collections managed by just one database.
Sorry, I really can't picture it. Is it one database, or databases with "links to virtual library databases"? What are you calling "database" in this context? The way you talk about it, I think you mean "one big Apple-format archive in which there's one .zip per book, and in each .zip there's all the formats and one XML file describing the metadata for the book".

That's not a database, that's a .tar backup.

If that's not it, you really really need to do some kind of ASCII art of your file hierarchy, because right now I'm in the dark.

Quote:
Originally Posted by devils_add View Post
Sorry, for wordiness.
I'm really sorry, but the problem here is not the words, it's the concept.

Quote:
Originally Posted by devils_add View Post
Also, with the archive architecture, you can keep some pdf books broken by chapter, and combine them on the fly as requested (resources available), so you don't have to download the full book, but just the chapters you need.
First rule of arch rewrite: aim for feature equality, _then_ build upon it. Calibre's granularity doesn't go down to the chapter. That's why it can't do that. If it did, it would.

Also, the ressources you may (I say _may_) save with that system are utterly dwarfed by the ressources you'll use just to read your database.

There's no two ways of doing database-driven file management on consumer hardware. There's only one. There's one system, made of an RDBMS on one side and a filesystem on the other. Some of those files, if they're text files (as opposed to binary files) can be compressed, but that's as far as it can go.

I know that because I've already tried it all, ever since I've first discovered databases twenty years ago. Your system? I made one like that, more or less, when I was 16. At the time it was a VB4-based management system for the fanfiction I downloaded from R.A.A.C. (ah, those were the times...), and trying to make Eyrie Production's Undocumented Features into some sort of reading order with bookmarks, because even back then it was _huge_. It took me a few weeks before I scrapped the idea of text files-based DB and turned to Access (There was no SQLite in those far away times...). I had much better results.

So, learn from the mistakes of an old (well, 36-years old) pro and use an RDBMS. That's why they're made for. They're good at it.
aleyx is offline   Reply With Quote
Old 12-18-2013, 04:56 AM   #11
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by devils_add View Post
To some extent yes. The reason is that with the correct redesign of filesystem, Calibre will be able to organize almost everything. Therefore, it will need a more robust metadata.db.
Whether I fully understand your goal or not makes no difference. When you're ready for folks to test it I'll be happy to give a try. Even if you decide that your idea wasn't the cat's pajamas you may end up writing code that can be merged into the current codebase with new features or speed enhancements. Quality contributors are always welcome and you have to cut your teeth on the code somehow.

Quote:
Originally Posted by devils_add View Post
In addition, the reason for archiving is that it will allow to create an easy way to transfer items between libraries without having to worry that you will import it wrong and will have to edit metadata again, as everything will be inside that archive.
Calibre already has this capability in the Copy to library feature. There is no "worry that you will import it wrong" since it is a direct copy of the record from one library to another.

Good Luck with your fork.
DoctorOhh is offline   Reply With Quote
Old 12-28-2013, 10:08 PM   #12
At_Libitum
Addict
At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.
 
Posts: 265
Karma: 724240
Join Date: Aug 2013
Device: KyBook
I can see where the idea comes from. Using one big container file with it's own internal 'filesystem' was/is still used for a lot of games. and most of these games also got released on several platforms. So in that respect the idea is not that strange. The only thing that is different here, the dynamic nature of a library compared to the static environment of game resources. You'd have to go the direction of virtual hd files or something similar and let the current rdbms use the container file to write-to/read-from instead of trying to recreate the rdbms. But...like physical hd's, virtual file systems tend to get fragmented the same way, with the same side effects. Which means, reorganization is needed, which means needing at least as much free diskspace as the size of the container file, preferable double that.

It may look like a good idea, but it has one helluva drawback. If something, how small even, breaks in the container file, it's bye-bye- library. At least in the current situation, all books stay intact. Which means you probably want to maintain some kind of parity system for repairs if worst comes to worst. In the end, the risk some mishap occurring to a virtual file system is much higher than to a physical one. Files get damaged much more often than HD's

Last edited by At_Libitum; 12-28-2013 at 10:19 PM.
At_Libitum is offline   Reply With Quote
Old 01-21-2014, 06:25 PM   #13
devils_add
Member
devils_add began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2013
Device: none
Quote:
Originally Posted by At_Libitum View Post
I can see where the idea comes from. Using one big container file with it's own internal 'filesystem' was/is still used for a lot of games. and most of these games also got released on several platforms. So in that respect the idea is not that strange. The only thing that is different here, the dynamic nature of a library compared to the static environment of game resources. You'd have to go the direction of virtual hd files or something similar and let the current rdbms use the container file to write-to/read-from instead of trying to recreate the rdbms. But...like physical hd's, virtual file systems tend to get fragmented the same way, with the same side effects. Which means, reorganization is needed, which means needing at least as much free diskspace as the size of the container file, preferable double that.

It may look like a good idea, but it has one helluva drawback. If something, how small even, breaks in the container file, it's bye-bye- library. At least in the current situation, all books stay intact. Which means you probably want to maintain some kind of parity system for repairs if worst comes to worst. In the end, the risk some mishap occurring to a virtual file system is much higher than to a physical one. Files get damaged much more often than HD's
Actually I am not proposing to do a one big encapsulation of the whole database, just of the items it it. So if you look at the database as a tree and the data inside final folder as a leaf, I am proposing to encapsulate that final folder (ok I will probably add some file-structure to it so for example you could store a pdf book broken by chapters and it will combine as needed, audio books could be kept in one place, or a folder for supplementary material which sometimes comes with a book).
The only thing I will have in a big file, will be the general database, so that I don't have to re-scan it again. However, even that might not be true, as I am planning for the database to be locate-able on different hard-drives, not just different folders. Therefore, each location will have a local backup database, which the main database will load from and reference to.
devils_add is offline   Reply With Quote
Old 01-22-2014, 03:20 AM   #14
aleyx
Addict
aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.
 
Posts: 245
Karma: 20386
Join Date: Sep 2010
Location: France
Device: Cybook Diva
So, if I understand correctly, you have:
- One directory with as many .zip as you have books,
- One .zip with a "general database".

What's in the latter?
aleyx is offline   Reply With Quote
Reply

Tags
database, developement

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Librerator - multi-format e-reader, fork of KPV Kai771 Kindle Developer's Corner 432 10-06-2017 12:20 PM
Free Book (Kindle) - The Tiny Fork Diet [UK] koland Deals and Resources (No Self-Promotion or Affiliate Links) 0 12-20-2011 02:22 PM
Walk softly and carry a big fork. kennyc Lounge 6 07-15-2011 01:41 PM
Calibre Database cp Kindle Database mitch13 Library Management 1 05-22-2011 07:33 PM


All times are GMT -4. The time now is 05:59 AM.


MobileRead.com is a privately owned, operated and funded community.