Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 05-11-2014, 12:44 PM   #1
Barafu
Junior Member
Barafu began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2014
Device: many
Is it possible: support another db backend?

Good day. I have some experience in Python. Before starting to learn materials on Calibre plugins I want to ask if the thing I want to create is possible to implement at all, without rewriting too much.
There is a free database of articles. It is published by goverenment. Currently it contains 300 000 documents totaling 200Gb. Some have covers - the logo of department they originate from.The format of db is following: a text file with basic metadata (except covers, comments and so on) and a hundred of 2Gb zip archives, containing TXT and FB2.
I want to browse the db with Calibre. First, I tried a script that imports files directly to Calibre, but that accomplished hardly 15% in 5 days 24/7. What's worse is that this database is updated monthly, and I don't want to be constantly importing something. (And extra 200Gb for two copies of data isn't a trifle, too).
I want to try to make Calibre understand this db in place, in read-only mode. I want searching, reading locally, sending to email and maybe ebooks. Calibre developers, please tell me your opinion: Is this possible to do with reasonable effort, or I should start my own application from scratch? Any advises on where to start would be appreciated.
Barafu is offline   Reply With Quote
Old 05-11-2014, 02:52 PM   #2
aleyx
Addict
aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.aleyx can self-interpret dreams as they happen.
 
Posts: 250
Karma: 20386
Join Date: Sep 2010
Location: France
Device: Bookeen Diva, Kobo Clara BW
Right. As much as I love Calibre, I don't think it's the right tool for this particular job. I'm afraid that your best bet is to make your own application.

If you have more experience in Python than, say, PHP, you could start by looking at CherryPy, which is the Python-based webserver framework I use for small to medium custom projects. There's others, of course.
aleyx is offline   Reply With Quote
Advert
Old 05-11-2014, 02:52 PM   #3
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
I think processing 200GB worh of information is the real problem, that is never going to be fast.

plaintext will not make for a very fast database either.

I'd recommend writing a script to determine when and where articles have been updated/added, then importing the changes into calibre. After the first import, the new material will not be as much, and will take less time.
eschwartz is offline   Reply With Quote
Old 05-11-2014, 02:58 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
If you do end up writing your own application, calibre is licensed under the GPL, so you can use its code wherever useful. calibre has a lot of mature code for its ebook-viewer.exe component that may save you lots of time, for instance.
eschwartz is offline   Reply With Quote
Old 05-11-2014, 03:35 PM   #5
Barafu
Junior Member
Barafu began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2014
Device: many
Thank you for the feedback.
About plaintext being bad for metadata - sure, it can be converted to anything as long as this process takes hours, not weeks. The thing I want to achieve is not to unpack original article archives.
By the way, I was trying to speed up Calibre importing at first. I find out that the process is HDD IO bound, that moving it to tmpfs speeds things up three times at least. May be there is a way to do "fast" import of files, avoiding the usual import procedure? That would be a workaround about problem.
Barafu is offline   Reply With Quote
Advert
Old 05-11-2014, 03:45 PM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
You could use an SSD, even just for the calibre db alone.
eschwartz is offline   Reply With Quote
Old 05-11-2014, 04:51 PM   #7
Barafu
Junior Member
Barafu began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2014
Device: many
I do already, that doesn't help much. The best I could achieve is 1.5 files per second at start and slowing down.
Setting aside the import variant, and the "Write my own app" variant (which I can always fall back to), there is one idea left. May be I can create some virtual device that will pretend to be a reader with all these books on it?
I can create a standalone script that would present the metadata in any form. If only I could teach Calibre to take actual books (and, preferably, covers) from archives…
BTW, that format for text DBs is rather popular in ex-USSR. Government publications, advertisements, books often come in that form. Its support plus recent movement to ban Windows from many organizations( including schools) would make this addition to Calibre rather popular.
P.S. And instead of my own app I can use LibreOffice Base + set of scripts. At least I hope to. Calibre offers book conversions and ereader support, however, and other neat things.
Barafu is offline   Reply With Quote
Old 05-11-2014, 05:25 PM   #8
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,421
Karma: 8012664
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
One way to do this would be to build an application that constructs a complete calibre library from the database without using calibre. You would use the same schema that calibre uses along with compatible file naming conventions. Matching the schema would be made easier by starting with an empty calibre library.

There is no doubt that doing this would be a lot of work, but it is orders of magnitude less work than reinventing calibre.

On the other hand, is calibre the right target? Perhaps an academic bibliography manager would be more appropriate? I suspect that generating bibtek would be a lot easier and possibly more useful.
chaley is offline   Reply With Quote
Old 05-11-2014, 06:48 PM   #9
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,708
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
I'm wondering if you can populate the library with content progressively.

A Calibre book folder does not have to contain a format file; some users use this feature as a means of recording 'books to get', or recording their paper books.

Perhaps you could create a viable (albeit empty) library of 300,000 articles from the text file you mention, which I assume is an index that includes a reference to the archive in which each article is located.

Maybe that reference could be 'munged' into a file uri (file:\\\thearchives\archive_002.zip) and popped to a custom comments-like column that you display in the Book Details area (normally to the right of the 'book' list). Then you could click on it to open the archive as required. Most (all ?) archive utilities will let you open an archive member from within an archive - they extract it to a temporary folder and hand that file off the relevant program.

And of course you could drag the article (text file, fb2, cover) content out of the archive and drop onto the Book details (probably indirectly - archive->scratchpad->calibre)

You might be able to create that initial database using the Import List Plugin.

BR

Last edited by BetterRed; 05-11-2014 at 06:54 PM.
BetterRed is offline   Reply With Quote
Old 05-11-2014, 10:24 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,319
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You definitely should do this in two steps:

1) Create the empty records. Write a small script in python to do that, using calibre apis and run it with calibre-debug script.py

2) Write a script to transfer the book files, that avoids the calibre apis and uses file renames (as opposed to file copies/moves) + direct access to the data table in the calibredb to populate it with the entries.

Should be about a days work, and should finish importing all 300K books in a few hours. Do it first with a few thousand books to get a sense for the performance and feasability.

Sample code for (1)
Code:
from calibre.library import db
from calibre.metadata.books.base import Metadata

books = [Metadata('title1', ['author1']), Metadata('title2', 'author2'), ...]
db = db('path to library folder').new_api

for book in books:
   db.create_book_entry(mi, apply_import_tags=False)
For (2) you just need to create entries in the data table in metadata.db which should be trivial and rename the files into the calibre library using a similar naming scheme as calibre uses for its files.

However, running calibre with 300K entries is not going to be very performant. I suggest splitting up your archive into 5-10 libraries.
kovidgoyal is offline   Reply With Quote
Old 05-12-2014, 02:18 AM   #11
Barafu
Junior Member
Barafu began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2014
Device: many
I guess I could use FUSE to present the archives as folders. But I will not be able to create metadata.opf for every book this way.
OK, I will read the manuals on db, try some experiments and be back in a few days.
Barafu is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New database backend - testers needed kovidgoyal Calibre 314 08-23-2013 06:09 AM
calibre V0.9.41 released, includes new database backend for testing Alexander Turcic Calibre 0 07-28-2013 02:47 AM
NewsBeamer Android App that uses calibre as a backend duluoz Related Tools 6 05-23-2013 08:19 AM
shared backend database? perler Calibre 4 01-26-2012 05:37 AM
Building calibre backend only? jesse Calibre 2 03-15-2009 05:32 PM


All times are GMT -4. The time now is 11:00 AM.


MobileRead.com is a privately owned, operated and funded community.