![]() |
#1 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: May 2014
Device: many
|
Is it possible: support another db backend?
Good day. I have some experience in Python. Before starting to learn materials on Calibre plugins I want to ask if the thing I want to create is possible to implement at all, without rewriting too much.
There is a free database of articles. It is published by goverenment. Currently it contains 300 000 documents totaling 200Gb. Some have covers - the logo of department they originate from.The format of db is following: a text file with basic metadata (except covers, comments and so on) and a hundred of 2Gb zip archives, containing TXT and FB2. I want to browse the db with Calibre. First, I tried a script that imports files directly to Calibre, but that accomplished hardly 15% in 5 days 24/7. What's worse is that this database is updated monthly, and I don't want to be constantly importing something. (And extra 200Gb for two copies of data isn't a trifle, too). I want to try to make Calibre understand this db in place, in read-only mode. I want searching, reading locally, sending to email and maybe ebooks. Calibre developers, please tell me your opinion: Is this possible to do with reasonable effort, or I should start my own application from scratch? Any advises on where to start would be appreciated. |
![]() |
![]() |
![]() |
#2 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 250
Karma: 20386
Join Date: Sep 2010
Location: France
Device: Bookeen Diva, Kobo Clara BW
|
Right. As much as I love Calibre, I don't think it's the right tool for this particular job. I'm afraid that your best bet is to make your own application.
If you have more experience in Python than, say, PHP, you could start by looking at CherryPy, which is the Python-based webserver framework I use for small to medium custom projects. There's others, of course. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
I think processing 200GB worh of information is the real problem, that is never going to be fast.
plaintext will not make for a very fast database either. I'd recommend writing a script to determine when and where articles have been updated/added, then importing the changes into calibre. After the first import, the new material will not be as much, and will take less time. |
![]() |
![]() |
![]() |
#4 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
If you do end up writing your own application, calibre is licensed under the GPL, so you can use its code wherever useful. calibre has a lot of mature code for its ebook-viewer.exe component that may save you lots of time, for instance.
![]() |
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: May 2014
Device: many
|
Thank you for the feedback.
About plaintext being bad for metadata - sure, it can be converted to anything as long as this process takes hours, not weeks. The thing I want to achieve is not to unpack original article archives. By the way, I was trying to speed up Calibre importing at first. I find out that the process is HDD IO bound, that moving it to tmpfs speeds things up three times at least. May be there is a way to do "fast" import of files, avoiding the usual import procedure? That would be a workaround about problem. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
You could use an SSD, even just for the calibre db alone.
|
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: May 2014
Device: many
|
I do already, that doesn't help much. The best I could achieve is 1.5 files per second at start and slowing down.
Setting aside the import variant, and the "Write my own app" variant (which I can always fall back to), there is one idea left. May be I can create some virtual device that will pretend to be a reader with all these books on it? I can create a standalone script that would present the metadata in any form. If only I could teach Calibre to take actual books (and, preferably, covers) from archives… BTW, that format for text DBs is rather popular in ex-USSR. Government publications, advertisements, books often come in that form. Its support plus recent movement to ban Windows from many organizations( including schools) would make this addition to Calibre rather popular. P.S. And instead of my own app I can use LibreOffice Base + set of scripts. At least I hope to. Calibre offers book conversions and ereader support, however, and other neat things. |
![]() |
![]() |
![]() |
#8 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,421
Karma: 8012664
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
One way to do this would be to build an application that constructs a complete calibre library from the database without using calibre. You would use the same schema that calibre uses along with compatible file naming conventions. Matching the schema would be made easier by starting with an empty calibre library.
There is no doubt that doing this would be a lot of work, but it is orders of magnitude less work than reinventing calibre. On the other hand, is calibre the right target? Perhaps an academic bibliography manager would be more appropriate? I suspect that generating bibtek would be a lot easier and possibly more useful. |
![]() |
![]() |
![]() |
#9 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,708
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I'm wondering if you can populate the library with content progressively.
A Calibre book folder does not have to contain a format file; some users use this feature as a means of recording 'books to get', or recording their paper books. Perhaps you could create a viable (albeit empty) library of 300,000 articles from the text file you mention, which I assume is an index that includes a reference to the archive in which each article is located. Maybe that reference could be 'munged' into a file uri (file:\\\thearchives\archive_002.zip) and popped to a custom comments-like column that you display in the Book Details area (normally to the right of the 'book' list). Then you could click on it to open the archive as required. Most (all ?) archive utilities will let you open an archive member from within an archive - they extract it to a temporary folder and hand that file off the relevant program. And of course you could drag the article (text file, fb2, cover) content out of the archive and drop onto the Book details (probably indirectly - archive->scratchpad->calibre) You might be able to create that initial database using the Import List Plugin. BR Last edited by BetterRed; 05-11-2014 at 06:54 PM. |
![]() |
![]() |
![]() |
#10 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,319
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You definitely should do this in two steps:
1) Create the empty records. Write a small script in python to do that, using calibre apis and run it with calibre-debug script.py 2) Write a script to transfer the book files, that avoids the calibre apis and uses file renames (as opposed to file copies/moves) + direct access to the data table in the calibredb to populate it with the entries. Should be about a days work, and should finish importing all 300K books in a few hours. Do it first with a few thousand books to get a sense for the performance and feasability. Sample code for (1) Code:
from calibre.library import db from calibre.metadata.books.base import Metadata books = [Metadata('title1', ['author1']), Metadata('title2', 'author2'), ...] db = db('path to library folder').new_api for book in books: db.create_book_entry(mi, apply_import_tags=False) However, running calibre with 300K entries is not going to be very performant. I suggest splitting up your archive into 5-10 libraries. |
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: May 2014
Device: many
|
I guess I could use FUSE to present the archives as folders. But I will not be able to create metadata.opf for every book this way.
OK, I will read the manuals on db, try some experiments and be back in a few days. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
New database backend - testers needed | kovidgoyal | Calibre | 314 | 08-23-2013 06:09 AM |
calibre V0.9.41 released, includes new database backend for testing | Alexander Turcic | Calibre | 0 | 07-28-2013 02:47 AM |
NewsBeamer Android App that uses calibre as a backend | duluoz | Related Tools | 6 | 05-23-2013 08:19 AM |
shared backend database? | perler | Calibre | 4 | 01-26-2012 05:37 AM |
Building calibre backend only? | jesse | Calibre | 2 | 03-15-2009 05:32 PM |