Order it now! Amazon prioritizes orders on a first come, first served basis.


View Full Version : Multi-directory search


daudi
04-15-2008, 11:25 AM
This is a spin-off from something else I'm playing with. The built-in iliad search will only search the current directory, and if I'm not mistaken will only search the title field, not the description.

This little script will allow you to search multiple directories and will grep the manifest file. This means you can put key words in the description field and it will match on those.

There is another script (contentlister entry) for deleting the results. Don't use the built-in delete because this deletes the original files rather than the symlinks. I think thomas posted a better way to do symlinks and I'll try using that later.

You enter the search phrase by changing the FIRST LINE of the description field of the search entry. At the moment this is "wibble", because that is what I was searching for when testing this. Use the label function to change the description, then click on the search entry.

So, in summary, these are the steps:


get shell access (http://www.mobileread.com/forums/showthread.php?t=17342) if you do not already have it
unzip search.zip somewhere, e.g. your PC
copy the whole thing to somewhere on your iliad (anywhere will do)
navigate to the search folder using the contentlister
use the config entry to select where to search
change the description field to your search expression
click on the "search" entry in the contentlister
when the search is complete you will see a "results" folder
view your results in the results folder
delete the results using the "delete" entry in the contentlister

I started doing this because I think I will soon have a way of tagging and taking snippets from PDFs and plan to store the tags in a tags file in the appropriate PDF container directory and snippets in a snippets file. I want to be able to search for snippets or tags.

The main limitations at the moment are:

no way to search within search results
you have to navigate to the search folder, you can't just get there quickly from anywhere
inelegant way of getting user input


You can set the place to start searches to any location you like. Just add entries to the file config/search.rc

I've tested this a little on my USB key with about 800Mb of files and it did not fall over. Feedback welcome.

Edit: just made a couple of updates to zip file.
Edit: the limit on the number of results is not working. I'll have to look into this.
Edit: new version---quoted search expression to allow spaces
Edit (2008-05-01): new version that I think fixes the spaces in filename problem and adds a path config tool. Also fixed hit limit.
Edit (2008-05-01): Added snippet search tool
Edit (2008-08-11): Fixed CF search problem

yokos
04-16-2008, 09:13 AM
It seems like spaces are not allowed in directory names.
Search string: Krause
grep: /mnt/cf/pdf/Foundations: No such file or directory
grep: of: No such file or directory
grep: GTK+: No such file or directory
grep: Development/manifest.xml: No such file or directory
[...]
./run.sh: 87: [[: not found

After changing dir's name from "Foundations of GTK+ Development" to "Foundations_of_GTK+_Development" it is working.
:iloveyou:

[EDIT] I had got a space bug in my jukebox mpd scripts, too.

daudi
04-16-2008, 09:45 AM
Thanks! I've now quoted the grep so hopefully fixed the space bug.

yokos
04-16-2008, 10:08 AM
Thanks! I've now quoted the grep so hopefully fixed the space bug.
Not yet. It doesn't work with spaces. :rolleyes:
[very OT] Battery of my iLiad is low & I don't have a power adapter next to mine. :eek:

daudi
04-28-2008, 06:53 AM
Not yet. It doesn't work with spaces. :rolleyes:
[very OT] Battery of my iLiad is low & I don't have a power adapter next to mine. :eek:

OK, I've looked into this a little more. To handle this nicely we need a newer version of busybox. The version on the iliad does not seem to support -print0 for find or -0 for xargs, but the current release does. This current release is version 1.10.1. Adam has released version 1.7.2 (http://www.mobileread.com/forums/showpost.php?p=113952&postcount=5) and I am not sure if that supports these options or not. I'll try to test it this evening. The alternative would be to use temporary files in the script instead. That would be a less elegant solution though.

yokos
04-29-2008, 10:03 AM
OK, I've looked into this a little more. To handle this nicely we need a newer version of busybox. The version on the iliad does not seem to support -print0 for find or -0 for xargs, but the current release does. This current release is version 1.10.1. Adam has released version 1.7.2 (http://www.mobileread.com/forums/showpost.php?p=113952&postcount=5) and I am not sure if that supports these options or not. I'll try to test it this evening. The alternative would be to use temporary files in the script instead. That would be a less elegant solution though.
Oh, interesting.
It should be no deal to update busybox. Termial apps are dream targets to port compared to GUI apps with all the odds.
I think Adam hasn't enabled all features of busybox during configure to make the binary small.

EDIT:
Using busybox:

BusyBox is extremely configurable. This allows you to include only the
components and options you need, thereby reducing binary size. Run 'make
config' or 'make menuconfig' to select the functionality that you wish to
enable. (See 'make help' for more commands.)

daudi
04-29-2008, 10:21 AM
Ahaaa. Very helpful, thanks. I'm not going to be able to look at this again for a few days, but this will save me time mucking around trying to figure out things that might not need figuring out. Thanks.

daudi
05-01-2008, 10:15 AM
It turned out to be rather easy to make this work with the standard busybox so I have decided to do that rather than require people to upgrade their version of busybox. I have tested it and think it is working now with directories with spaces in the path and would be grateful it you could try it again Yokos.

I have also changed the way that search paths are configured. Instead of having to hard code the path in the script, I have now added a config contentlister entry that reads a file called search.rc. This file consists of one line per directory, a semi-colon and a description. The default contents on this file are: /mnt/usb;USB
/mnt/cf;Compact flash
/mnt/free;All internal memories
You can easily add others to particular directories. With this you just click on the entry in the contentlister to cycle through the paths to search. The description field in shown in the contentlister is updated with the description of the current search path.

I've also fixed the number of hits limit code. This is currently set to 40 results. You can change this near the top of the script.

I'm updating the post at the top of this thread with this version.

Dabon
05-02-2008, 12:37 PM
Dear Daudi,

Good day!
Thank you for working so diligently on such great and useful tool.
I have tried to install this tool several times, but unsuccessfully. i am not getting the message " installation successful". I am able to switch the locations for research between internal memory, USB and CF, but the launching of research is unsuccessful, the iliad does not react when I click on search folder...
Any ideas or suggestions on what I could do to get it working?

Thanks a million in advance!

D.

daudi
05-02-2008, 12:50 PM
Dear Daudi,

Good day!
Thank you for working so diligently on such great and useful tool.
I have tried to install this tool several times, but unsuccessfully. i am not getting the message " installation successful". I am able to switch the locations for research between internal memory, USB and CF, but the launching of research is unsuccessful, the iliad does not react when I click on search folder...
Any ideas or suggestions on what I could do to get it working?

Thanks a million in advance!

D.

There's nothing to install so there's no need for an "installation successful" message :)
I suspect it probably is running but using the default search pattern. I think I probably need to re-write the first post or write another one to make things clearer.

A key thing from my first post is:
You enter the search phrase by changing the FIRST LINE of the description field of the search entry. At the moment this is "albert", because that is what I was searching for when testing this. Use the label function to change the description, then click on the search entry.


Have you used this to enter the search phrase? It is a kludge, I know, but seemed to me to be the simplest way of getting simple user input. Once you have entered your search phrase clicking on the search entry should run the search.

WARNING! At the moment the general search searches the entire manifest file. This means that if you are searching for "The Great Escape", for example, do not enter "esc" as the search phrase. "esc" will match "escape" okay, but it will also match "Description" which exists in (almost?) every manifest file.

BTW, I should also explain that the snippet-search relates to this thing (http://www.mobileread.com/forums/showthread.php?t=22804).

If you have entered your search phrase correctly and it is still not working we'll have to delve a little deeper. Let me know how you get on.

engunneer
08-08-2008, 02:25 AM
If you have entered your search phrase correctly and it is still not working we'll have to delve a little deeper. Let me know how you get on.

I am having the same problem as Dabon.

The search folder is loaded in Newspapers/Programs/Search.
Newspapers is on the CF card (according to iLiad Settings/Archive Locations page)
The document I am searching for has a two line description. The only thing that may be odd is that the iLiad did not generate the manifest.xml. I am using another program that I wrote to generate it. It is located in documents/publications/[folder name]/ (also on the CF card)
I am editing the first description line of the search program using the tagging tool. In today's case, the search term is banding. The config program is switching modes just fine, but the search tool runs and no change is shown in contentlister (the screen does refresh). Is there some error log I can look at?

Thanks,

daudi
08-08-2008, 09:07 AM
Hi engunneer,

There's no logging at the moment, but I'll look into it. For now, some questions that might help to track down the source of the problem:


When you click on the search entry (after setting the description to "banding"), does it seem to do anything at all? Do you get the two moving bars at the bottom of the screen that show that the iliad is thinking about something? (It did this for about 2 mins while searching 1Gb of files just now)

Do you get a "results" contentlister entry? If so, your results are in there.

What are the contents your config/search.rc? What is your currently selected config?

You could try commenting out line 86 rm match.lst match.tmp and see if those files (match.lst and match.tmp) are created.

engunneer
08-10-2008, 03:02 AM
Thanks for getting back to me.

1. The LED blinks and the White Loading bricks at the bottom do show for about 2 seconds. My CF card is 2GB and about half full. None of the bricks turn black.
2. No results at all (I do have the entry to delete results)
3. I am using the default one you have posted above. I have tried all of the settings, but Compact flash is the one that should work in this case.
4. the files are created, but both are empty.

If I can get this working, then I will try to get your snippet tool working (on windows), which looks very impressive. Then I will publish the project for the iLiad that I have been working on.

Update: Setting to "All internal memories" does at least run long enough to get one black box to fill.

At this point, I am suspecting that the manifet.xml of my target files are somehow wrong? I can try to upload an example doc tomorrow.

daudi
08-10-2008, 10:41 AM
4. the files are created, but both are empty.

[...]

At this point, I am suspecting that the manifet.xml of my target files are somehow wrong? I can try to upload an example doc tomorrow.

match.tmp is a list of manifest files to check, so if that is empty it suggests that it is not even finding any manifest files. One thing that springs to mind now that you mention windows is case sensitivity. manifest.xml != manifest.XML and I think I have seen some windows programs cause the extension to be uppercase. The find command that I use only searches manifest.xml files (lower case extension). find "$STARTDIR" -type f -name 'manifest.xml' > match.tmp
That would explain why match.tmp is empty. There's not actually much that can go wrong in this command: either the STARTDIR (starting directory) is wrong, or there are no files named 'manifest.xml'. Can you verify the case of the extension?

engunneer
08-10-2008, 08:49 PM
I can verify the extension. Extensions in uppercase bug the heck out of me, and I also wrote the program that is generating them.

I also have some documents that were nout generated by my program, and I have manually tagged them on the iLiad, and the search program still doesn't find them. I have not tried to experiment with ssh access, but theoretically could if you think it will help me diagnose this. Thanks for taking the time to try and help.

Is there something I can hardcode instead of $STARTDIR, so that it is known good? should I try running it via mrxvt?

Thanks.

daudi
08-11-2008, 03:45 AM
In the line
find "$STARTDIR" -type f -name 'manifest.xml' > match.tmp
$STARTDIR is the starting directory. That comes from the config file, and if you are using the CF entry it should be /mnt/cf. The find command looks for files (-type f) called manifest.xml (-name 'manifest.xml') starting from that directory.
You could try find /mnt/cf -type f -name 'manifest.xml' to see the file names (full paths) in the terminal.
If I get some time later today I'll add some logging code to the script so we can see what is going on at each stage.

daudi
08-11-2008, 09:25 AM
I have added some logging code (new version at the top of this thread). This produces a search-log.txt file that should appear in your contentlister. Here's an example of the results while searching the internal memory:
This search was done on: Mon Aug 11 13:06:23 GMT-01:00 2008
STARTDIR (where the search will start from): "/mnt/free"
STRING (the string we are searching manifest files for): illiam
Number of manifest.xml files found: 33
Number of manifest.xml files that contain the search string: 0
The "Delete results" entry will now delete the log file as well as the results.

Initially I found that it was working with USB and internal memory, but not CF. This baffled me for a bit, but it turns out I need to have a trailing slash at the end of the STARTDIR. I have added trailing slashes to the paths in the config file and have confirmed that it now works with CF. I cannot explain why it worked before with /mnt/usb and /mnt/free but not /mnt/cf.

Please try the new version at the top of the thread and let me know if it works. BTW, thanks for sticking with this and not giving up on it. I don't normally use CF so I had no idea it was not working.

Dabon
08-11-2008, 01:57 PM
WONDERFUL!! You are a true genius, Daudi!

I am sorry that I did not stick to the issue earlier. I was on a business trip, and tried a couple of times after your last email... unsuccessfully, so I got discouraged. I did not want to keep bugging you, as I do know that you are doing all of this as a passion, totally free, and you have already given to the iliad community SO MUCH!!!
The correction works wonderfully, Just what I needed, with my so many ebooks...and you can even access the books fro the result folder!?! This is totallly CRAZY!!!

THANKS A MILLION!!! If God did not create you, I would have created you!!!

May God bless you!!

Dabon.

daudi
08-11-2008, 02:44 PM
Dabon,

Thank you for your kind and enthusiatic words (I'm blushing). I'm glad it is now working for you. I fear you might have over-estimated my contribution just a tad, but I'll use your reply as an excuse to treat myself to a beer tonight.

engunneer
08-11-2008, 09:40 PM
I'm glad to see Dabon got it working. I'll be trying it when I get home tonight. If I'm lucky, I can soon post the windows iLiad tool I am writing, which relies on this tool, the java merger, and hopefully your snippet tool.

Update: It certainly does work now. I've been experimenting with your script (which is fun, since I have only breaking (as opposed to working) knowledge of linux commands). I see the actual search is done using grep, so I am trying to use regular expressions with it so I can search for two non-consecutive words in the descriptions. I am trying to search for thinks like (foo)+|(bar)+, but to no avail. even a simple (foo) does not find anything, even when foo would.

Alternately, if the results could be searched, then this would be fine.

Any input would be appreciated. As it is, I might try writing a separate search script that only searches the results of the other script. painful, but it might be usable.

Thanks,

daudi
08-12-2008, 03:01 AM
Regular expressions do work. I am aware that there are some implementation differences for regexprs and something like (foo)+|(bar)+ does not work for me with grep on my iliad or my PC. You need to escape the brackets (see note below) and pipe, like this:\(foo\)\|\(bar\)For those who don't know regular expressions, this code above searches for either "foo" or "bar" in the manifest. For two words that both have to exist, but not necessarily right next to each other, e.g. "foo wibble wibble bar", it would be: foo.*bar


NOTE: I should probably say "\(brackets\)\|\(parentheses\)" depending on whether you are using british or american english :)

engunneer
08-14-2008, 12:57 AM
Thanks for the late reply. The regex does work, but now I need to learn regex. (Specifically, how can I do AND instead of OR, and yet not know word order?)

Also, perhaps the script could take a comma separated list and do an and?

I will try tinkering with this on my own, so don't feel compelled to reply right away or spend a bunch of time on it. I don't mind the tinkering.

Thanks again for explaining.

daudi
08-14-2008, 03:42 AM
One idea I was thinking about was creating result sets that can be combined. So, instead of just creating a directory of results that is reused for each search, the results of each search could be stored in a separate directory, e.g. results-1, results-2, etc. Another script could then take further input to say how these result sets should be combined, e.g. (results-1 OR results-2) AND results-3. This could be written in ovid syntax just using the numbers: (1 OR 2) AND 3. The other alternative is to write something that will parse user input that is a little more friendly than regexprs, so people could enter something like google searches. To be honest I doubt I'd create something like this unless there were "many" people who want it.

how can I do AND instead of OR, and yet not know word order?)

Try this as a search for "foo wibble wibble bar" or "bar wibble wibble foo" (not tested on the iliad): \(foo.*bar\)\|\(bar.*foo\)

curbarthedog
08-14-2008, 08:55 AM
I have to say, for me as a complete novice in this kind of thing, it would be invaluable.

I guess it depends on how many others feel it would be useful!

If only Irex had thought about this kind of stuff