Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > More E-Book Readers > iRex > iRex Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2011, 05:33 AM   #1
Mackx
Guru
Mackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to behold
 
Posts: 999
Karma: 19985
Join Date: Dec 2008
Location: Netherlands
Device: iRex DR1000S
Handling a huge number of files on the SD Card

Some problems are reported when having a huge number of files on the SDcard. Indexing (for one) takes ages. There are people that have up to 150 000 documents on their 32GB SDCard. This problem was discussed in a 'normal' iRex forum message-thread, but I think it should be continued here in the developer's corner.

I am currently thinking of using a global.db per directory and (re-)indexing it as soon as you (want to) show the directory (only local files/directories, no sub-directories). The advantage is that not that much code needs to be changed, only an extra step to update/generate a global.db would be needed.
I am not sure how this behaves when a folder contains thousands of documents, but I think that only some experiments would show that. I do not want to start with indexing in the background right now, because it will surely make everything more complex.

Some of the disadvantages would be that only the SD Card view would work properly and also that search would not work.

See also this message for some comments.

Any comments?
Mackx is offline   Reply With Quote
Old 06-18-2011, 03:29 PM   #2
Mackx
Guru
Mackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to behold
 
Posts: 999
Karma: 19985
Join Date: Dec 2008
Location: Netherlands
Device: iRex DR1000S
Using a global.db file would probably lead to too much wearing on the SDcard. I wanted to copy the global.db file from each directory to the root so that UDS (the file viewer) could add/update thumbnail/title/author. (See also other thread.)

So next steps will be to see if writing to global.db can be skipped (this will result in more code changes).

The alternative proposal of Iņigo to write a complete new file manager is also a promising solution.
Mackx is offline   Reply With Quote
Old 06-19-2011, 03:38 AM   #3
Iņigo
Guru
Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.
 
Posts: 730
Karma: 72743
Join Date: Feb 2008
Location: Here or there
Device: iRex iLiad, iRex DR800S. K4NT. Kobo Aura, Aura One, Libra 2.
Quote:
Originally Posted by Mackx View Post
Using a global.db file would probably lead to too much wearing on the SDcard. I wanted to copy the global.db file from each directory to the root so that UDS (the file viewer) could add/update thumbnail/title/author. (See also other thread.)

So next steps will be to see if writing to global.db can be skipped (this will result in more code changes).

The alternative proposal of Iņigo to write a complete new file manager is also a promising solution.
Yesterday night I was playing wth my old lua implementation of a file browser for the Iliad.
I didn't remember it... but it looks quite good, so I will port to vala and add some new features. But this will be in next weeks, first I have to finish the new DR800+, which I hope to release in a few days.
Iņigo is offline   Reply With Quote
Old 06-20-2011, 05:50 AM   #4
Mackx
Guru
Mackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to behold
 
Posts: 999
Karma: 19985
Join Date: Dec 2008
Location: Netherlands
Device: iRex DR1000S
I have created an experiment where in SDCard mode files are directly extracted from SDCard and not from global.db. The good thing is that basic browsing seems to works OK (although a lot still needs to be done, like sorting). On the emulator I created some extra folders with files to make sure that global.db was not used. Opening a 'old' file also worked OK, however opening a 'new'-file (that was not in global.db) caused UDS to crash and show some error messages. Opening non-UDS files (xournal, notes) worked OK.
This problem will also occur when using a new file browser.
I was thinking of adding the files on-the-fly to global.db. A pruning mechanism (based on file_time_lastread) could then be applied to limit the size of global.db (to a few thousand entries?). The advantage would be that all views, search and current_page would still work on the most (recently) used documents.
There could even by an automatic mechanism that puts mdbindex and SD-Card view in a special mode when the SD-card contains more then a <threshold> number of files?

There are also some upgrading (R1.7.1 -> R2.0) issues that need to be addressed. The mdbindex program is also used to convert the content of the databases (global.db but also all metadata.db) to the correct version. Removing unwanted data, moving data between tables and adding new data.
Upgrading is always done with a official version of the firmware so one has to limit the number of files that are on the SDCard. Or is it easy to modify the original installation image with a modified mdbindex?
Mackx is offline   Reply With Quote
Old 06-21-2011, 04:12 AM   #5
yuri_b
Connoisseur
yuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enough
 
Posts: 71
Karma: 592
Join Date: Aug 2010
Device: irex dr800sg DR1000S
Hi

I also tryed to place a lot of files on SD card and index them ~3000 or more and:
1) Indexing takes a hours.
2) It takes a years to file viewer to read list of files from data base, to read pictures, to sort them in the way you ask.

I end up with this solution:
1) If I use an SD cart with huge number of files I disable automatically generation of thumbnails and browse the files on SD-card and don't use /Books to see them all.
2) Otherwise I prefer to remove files I don't need from SD-card.

I thought about enhancements of file viewer:
1) I would like to use sorting of all files, so we need all files in one big data-base.
2) A lot of time a file viewer spends on filling quick jump table (a alefbeit on right side of screen), We can add user option to disable this feature.
3) I'm sure that SQL data base works fast, the problem in file viewer. It can ask to data base return to him only data for current page, but one should redesign filling and jumping in jump table.
4) So redesign of file viewer (its integration with sql database) will speed up listing of files.
5) Creating thumbnails for files: we can add another user command in file viewer: index curent directory.

Once more: Lite SQL data base can be huge and this will not degrade performance due to large size. Dividing databse to several will not save a disk space but handling with several dbase will be difficult and take more time.

Sincerely
Yura

Last edited by yuri_b; 06-21-2011 at 04:20 AM.
yuri_b is offline   Reply With Quote
Old 06-21-2011, 07:39 AM   #6
Mackx
Guru
Mackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to behold
 
Posts: 999
Karma: 19985
Join Date: Dec 2008
Location: Netherlands
Device: iRex DR1000S
Hi Yura,

Thanks for thinking along.
Quote:
Originally Posted by yuri_b View Post
I also tryed to place a lot of files on SD card and index them ~3000 or more and:
1) Indexing takes a hours.
2) It takes a years to file viewer to read list of files from data base, to read pictures, to sort them in the way you ask.
I assume you have generate thumbnails enabled? Does it take much time everytime you start the DR or only when you added a lot of documents?
For the viewing part, the DR only has a limited amount of memory available, which is around 50MB (that is 0.05GB :-). In this memory information of all 'selected' documents is stored, including the thumbnails I am not sure if the large memory usage is the cause for the slowdown or the amount of calculation needed for the number of files?
Quote:
Originally Posted by yuri_b View Post
I end up with this solution:
1) If I use an SD cart with huge number of files I disable automatically generation of thumbnails and browse the files on SD-card and don't use /Books to see them all.
2) Otherwise I prefer to remove files I don't need from SD-card.
I think this is similar to the approach of Iņigo.
Quote:
Originally Posted by yuri_b View Post
I thought about enhancements of file viewer:
1) I would like to use sorting of all files, so we need all files in one big data-base.
The files are all in one big database (named global.db) the Books-view should show all documents on your SDcard. (But then, that view is unusable for you... :-( ) Or did you have something else in mind?
Quote:
Originally Posted by yuri_b View Post
2) A lot of time a file viewer spends on filling quick jump table (a alefbeit on right side of screen), We can add user option to disable this feature.
That could indeed be worth a try, to see how much time that saves.
Quote:
Originally Posted by yuri_b View Post
3) I'm sure that SQL data base works fast, the problem in file viewer. It can ask to data base return to him only data for current page, but one should redesign filling and jumping in jump table.
I did some experiments a while ago, see this thread message #16 and #18 for more information. Sorting seems to take most of the time.
Quote:
Originally Posted by yuri_b View Post
4) So redesign of file viewer (its integration with sql database) will speed up listing of files.
See previous point, most time is spend on sorting, which is also needed if you want to extract 20 files from 3000.
Quote:
Originally Posted by yuri_b View Post
5) Creating thumbnails for files: we can add another user command in file viewer: index curent directory.
This would then create temporally thumbnails? Storing them in global.db might be creating a too large file (after some time).
Quote:
Originally Posted by yuri_b View Post
Once more: Lite SQL data base can be huge and this will not degrade performance due to large size. Dividing databse to several will not save a disk space but handling with several dbase will be difficult and take more time.
Unfortunately UDS also writes in the global.db database, and for that program no source code is available. That would make it very difficult to split the data over several databases.

Thanks for your suggestions.

Marcel.
Mackx is offline   Reply With Quote
Old 06-22-2011, 01:20 AM   #7
yuri_b
Connoisseur
yuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enough
 
Posts: 71
Karma: 592
Join Date: Aug 2010
Device: irex dr800sg DR1000S
Hi Marcel
I've red your messages.
I'm today also works on embedded platforms ARM/ARC , firmware engineer.
I have experience in building and porting GNU compiler, developing data base, wrote several programs that uses Lite SQL (previous job as GUI developer). The speed of SELECT with ORDER must be carefully rechecked, I don't beleive, it works so slowely, it have to be other reason.

Sincerely
Yuri

-----
P.S. Maby those days I will find some time to investigate this.

P.P.S. After looking in global data base creation code I found only one index on file+ directory, so this mean that all sorting done thru creating temporary files or memory, that greatly down speed, the problem not in SQL but rather in usage of SQL. IFAIK we can add indexes on default globaldb and metadata.db, and even on existings global.db to speed up SELECT query

Last edited by yuri_b; 06-22-2011 at 03:17 AM.
yuri_b is offline   Reply With Quote
Old 06-22-2011, 03:50 AM   #8
yuri_b
Connoisseur
yuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enough
 
Posts: 71
Karma: 592
Join Date: Aug 2010
Device: irex dr800sg DR1000S
Hi

After some investigation I wrote in P.P.S. of previous post I found that sqlite dbase file lacks of indexes on all columns that we are sorts.

You can check on your test sute the differents betwenn not indexed table and table with those indexes:

create index filename_index on file_metadata(filename,tag);
create index title_index on file_metadata(title,tag);
create index author_index on file_metadata(author,tag);
create index type_index on file_metadata(file_type,tag);
create index size_index on file_metadata(file_size,tag);
create index date_added_index on file_metadata(file_time_added,tag);
create index time_viewed_index on file_metadata(file_time_lastread,tag);


You need to apply those command on global.db by using sqlite3.exe

BTW size of my old global.db was ~ 37 MB and ~ 2000 files.
If I find time I will check time by myself also.

Sincerely
Yura

P.S. I also like to bring all my books with me in one SD card. The problem is with algorithm of searching needed book and overall performance of reader

Last edited by yuri_b; 06-22-2011 at 05:14 AM. Reason: fixed create index scripts
yuri_b is offline   Reply With Quote
Old 06-22-2011, 07:42 AM   #9
Mackx
Guru
Mackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to behold
 
Posts: 999
Karma: 19985
Join Date: Dec 2008
Location: Netherlands
Device: iRex DR1000S
I am not that good with databases. So what is the down-side of adding all these indexes, will they be updated on every change of the database?
I saw that there is already one index: CREATE UNIQUE INDEX file_metadata_i1 ON file_metadata (filename, directory_path). I assumed this would speed-up the normal Books-view already?

Did you do a VACUUM before measuring the size of global.db?
Mackx is offline   Reply With Quote
Old 06-22-2011, 07:49 AM   #10
Gertjan
ex-IRX developer
Gertjan doesn't litterGertjan doesn't litterGertjan doesn't litter
 
Gertjan's Avatar
 
Posts: 158
Karma: 224
Join Date: Oct 2008
Device: Irex DR800S, DR1000S, iLiad
The system is slow on a huge number of files for a number of reasons:

1. Traversing a deep and large folder structure on FAT is slow. The indexer recurses over the complete SD card to find new and changed files. A possible speed up is to use a faster file system (ext3?) for the SD card, and/or have a separate interface to trigger re-indexing.
2. Extracting meta data (title, author, thumbnails) currently requires opening of the file through UDS and its plugins. A possible speed up is to use dedicated tool(s) to extract meta data from documents.
3. The global.db which has the details of all files on the card can grow to serious size which slow operations down. Again, a quicker file system may help, limiting the (number of) thumbnails will too. As yuri_b said, indexing the database might as well.
4. Content Browser currently prepares a full list of the content for a category (Books, Images...). With a large list, preparing it per page will speed it up considerably.

Hope this helps a bit.
Gertjan is offline   Reply With Quote
Old 06-22-2011, 09:19 AM   #11
yuri_b
Connoisseur
yuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enough
 
Posts: 71
Karma: 592
Join Date: Aug 2010
Device: irex dr800sg DR1000S
Thanks Gertjan

So indexing and limiting file viewing to fetch data for 3-4 pages will help.
And of course faster card will do magic

For now I will not consider indexing of all files, because I can do this once at night, but browsing books, if every page opens 2-5 minutes, drives me crazy

1) As new data base not created but merely copied from template we can just replace 1 template file and delete old global.db, or tell to every user how manually add indexes to db . Indexes are maintained automatically by dbase, every insert/update will update indexes. You don't need worry about this. AFAIK default index will speed browsing filename+directory without tag (no book, news or picture);

2) Fetching only few pages will require change in file viewer (ctb). Not difficult and not big change. Only one design problem I see in working with quick jump table: filling jumping. How to fill it optimally, without read all data, and how to calculate page where to jump without reading all data.

Sincerely
Yura

P.S. I didn't vacuum database, but I didn't remove any book, only add, so this is a real size.

Last edited by yuri_b; 06-22-2011 at 09:26 AM.
yuri_b is offline   Reply With Quote
Old 06-22-2011, 01:19 PM   #12
Mackx
Guru
Mackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to behold
 
Posts: 999
Karma: 19985
Join Date: Dec 2008
Location: Netherlands
Device: iRex DR1000S
Quote:
Originally Posted by Mackx View Post
On the emulator I created some extra folders with files to make sure that global.db was not used. Opening a 'old' file also worked OK, however opening a 'new'-file (that was not in global.db) caused UDS to crash and show some error messages. Opening non-UDS files (xournal, notes) worked OK.
This was a problem with the extra files that I created (they where links to a non-existing file), adding to global.db is not required to open the file.
Mackx is offline   Reply With Quote
Old 06-22-2011, 08:25 PM   #13
Iņigo
Guru
Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.Iņigo did not drink the Kool Aid.
 
Posts: 730
Karma: 72743
Join Date: Feb 2008
Location: Here or there
Device: iRex iLiad, iRex DR800S. K4NT. Kobo Aura, Aura One, Libra 2.
Not directly related with the core of this thread, but as I can't remember now where I mention, I'll comment here.

I have over 4k books on my DR800, global db is ~28.5 MB.
Some time ago mdbindex stopped adding automatically new covers to the DB.
Books view spends over 30 secs to show the files the first time, 20 secs next ones.
Recently Added view takes ~9 secs with 15 files, the same with 50 files, 7 secs next times.
I also noted that it looks like the device enters in suspend mode when waiting much time for showing the books if no screen activity is produced, so I have to wake it pressing menu key.

Today I've did some new tests removing global.db and rebuilding without images.
global.db is less than 2 MB.
Books view spends no more than 2 or 3 secs to show the files.
Similar time for Recently Added view.


These are some of my conclusions:
- main culprit of the slowness is the quantify of images in the DB
- when indexing with covers there is a moment where mdbindex stops working. I've tested removing the files the indexer was analyzing at that moment but it's the same. So I presume the problem is the lack of free memory. Would enabling a swap partition/file improve something?
- does DR DB engine cache some contents in memory?


A good but infeasible solution would be to use 2 different db files, one for metadata and another one for images.
Or better, not using a db for images, but a hidden directory with the cover files: System/_covers/medium/id.png, System/_covers/small/id.png.
This solution doesn't seem difficult. I'll study it.
- change code to avoid writing images to the DB when closing UDS, and write to a file
- CTB would only load the covers needed for current page

This would reduce the overall DB size in memory and the introduced slowness due to images file loading should not be noticeable...

Iņigo

Last edited by Iņigo; 06-22-2011 at 08:45 PM.
Iņigo is offline   Reply With Quote
Old 06-23-2011, 07:46 AM   #14
Mackx
Guru
Mackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to beholdMackx is a splendid one to behold
 
Posts: 999
Karma: 19985
Join Date: Dec 2008
Location: Netherlands
Device: iRex DR1000S
Quote:
Originally Posted by Iņigo View Post
Today I've did some new tests removing global.db and rebuilding without images.
global.db is less than 2 MB.
Books view spends no more than 2 or 3 secs to show the files.
Similar time for Recently Added view.


These are some of my conclusions:
- main culprit of the slowness is the quantify of images in the DB
- when indexing with covers there is a moment where mdbindex stops working. I've tested removing the files the indexer was analyzing at that moment but it's the same. So I presume the problem is the lack of free memory. Would enabling a swap partition/file improve something?
- does DR DB engine cache some contents in memory?
I also had the impression that the thumbnails were the problem and more specific mainly the memory usage. The DR has around 50MB free memory after it started in Home (on my DR1000). When showing Books-view, information of all books, including thumbnails, are loaded into memory.
Quote:
Originally Posted by Iņigo View Post
A good but infeasible solution would be to use 2 different db files, one for metadata and another one for images.
Or better, not using a db for images, but a hidden directory with the cover files: System/_covers/medium/id.png, System/_covers/small/id.png.
This solution doesn't seem difficult. I'll study it.
- change code to avoid writing images to the DB when closing UDS, and write to a file
- CTB would only load the covers needed for current page

This would reduce the overall DB size in memory and the introduced slowness due to images file loading should not be noticeable...
This is indeed a good idea, I was also thinking of somehow not loading the thumbnails. Putting them in a separate files would indeed be a good solution. I am not sure how many files can be in a FAT directory, should the System/_covers/ contains a directory structure?
It is probably not that difficult to change ermetadb to store thumbnails in separate files i.s.o. in global.db. And ctb to read thumbnails from file before showing.
One problem would also be how to delete the thumbnails when the documents are removed. This could probably be done by mdbindex when removing entries from global.db.

I have some time right now, so if I can help please tell me.

I am also busy working on a mechanism to limit the number of files in global.db to help with the 150 000 files problem of Viasheslav. I think that not adding the thumbnails is not enough for that many files. Extrapolating from the 4k->2MB that would give 150k->75MB of data without thumbnails.
Mackx is offline   Reply With Quote
Old 06-23-2011, 08:40 AM   #15
yuri_b
Connoisseur
yuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enoughyuri_b will become famous soon enough
 
Posts: 71
Karma: 592
Join Date: Aug 2010
Device: irex dr800sg DR1000S
Hi

I remember how long it tooks to view current directory in Norton Commander when number of files >500.
With number of books > 500 it's not good way to use FAT file system and trust me data base will search for file faster then file system, just because this is data base's job.

Another aspect is a limitation of FAT32:
a) The maximum possible number of clusters on a volume using the FAT32 file system is 268,435,445 http://support.microsoft.com/kb/184006
b) Cluster size for file on FAT32 for 32GB: 16 KB, so every thumbnail will cost you no less then 16K. So 150000*4 files will cost no less then 9.6 GB on FAT32.
http://support.microsoft.com/kb/192322
c) Every entry occupied 4 bytes in FAT32 table (up to 8 MB). http://www.pcguide.com/ref/hdd/file/partFAT32-c.html


Sincerely
Yura

P.S. A program, that I wrote for home use, a film catalog, uses sqlite for storing all data for a film, including several pictures, db file today about 60MB in size and it works as a sharm (Win32 )

Last edited by yuri_b; 06-23-2011 at 09:11 AM.
yuri_b is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mobiperl Mobiperl - Perl tools for handling MobiPocket files tompe Kindle Formats 890 01-17-2021 06:16 PM
Large number of books on memory card pwalker8 Sony Reader 8 03-24-2009 02:20 PM
huge pdf files on iliad yolle iRex 7 08-23-2008 10:59 AM
Fictionwise Handling Fee Against Credit Card Terms of Service? Gideon News 9 08-15-2008 02:13 PM
Huge RTF files after conversion TheMadBrewer Sony Reader 2 01-01-2007 02:55 AM


All times are GMT -4. The time now is 04:54 AM.


MobileRead.com is a privately owned, operated and funded community.