Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : PDF search


tribble
01-18-2008, 01:52 PM
Hi!

I just found this.
http://ruby-gnome2.sourceforge.jp/hiki.cgi?cmd=view&p=Poppler&key=poppler

poppler::Page has a find_text() function.

Can we somehow use that to get us a search function inside pdfs on our iLiad? Its some kind of poppler library, so it should somehow relate to the iLiad. But i have no clue what they are doing there.

Anyone with mor insight could maybe shed some light here.

Thanks

-Thomas-
01-18-2008, 03:37 PM
This function is also included in the poppler lib used on the iLiad. It looks like the function searches for a string within a specific page of an opened PDF file. At least I found the following in the poppler sources (glib/test-poppler-lib.c):
list = poppler_page_find_text (page, "Bitwise");
printf ("\n");
printf ("\tFound text \"Bitwise\" at positions:\n");
So to search for a string globally we would have traverse all directories, open all PDF files, go through all pages and let the function do the rest...

tribble
01-18-2008, 04:29 PM
Do you know, how well the function works with hyphenated text if at all, and if it finds hyphenated texts that span over multiple pages, and how it handles different languages? Is it UTF-8?
What info gets returned? Will it then be simple to somehow mark the found word?

It would be great if we could get a search going on the iLiad. I am willing to take a look into this aswell, but i know nothing about c++ programming and can only start mid Febuary.

A simple textsearch in a single PDF would suffice for me at the moment. A global textsearch on the iLiad could easily be very expensive.

-Thomas-
01-18-2008, 06:56 PM
According to the docs it takes UTF-8 coded input and returns a list of rectangles for each occurance of the text on the page (in PDF points).

Hyphenation doesn't work at all, I've tried it against the actual Debian version of libpoppler.

For those who are interested I've added a proof of concept for a single-file search. I couldn't compile it to run on the iLiad, but maybe someone can help out. It currently prints a list of all pages the string occurs on and exits 0 if matches were found.

tribble
01-19-2008, 01:41 AM
That looks rather easy. Now we will have to do a few things:
1) Integrate the search into the ipdf.
2) store the results in some global variable.
3) add a search icon, that starts keayboard and runs search on enter.
4) add a gui for the results. display list. on click goto page. and somehow rende a box or overlay on the searchword. (the bookmark ipdf could giv hints on this.)
5) when there is a resultset, change the search icon, to show resultset on one click. on second click open keyboard for new search.
6) rewrite poppler to find hyphenated text :D

Anyone up to the challange? ;)

PhilT
01-29-2008, 07:48 AM
I would love to see searching within PDFs working and I know many others would too. It just so happens that I know C++ too although I'm a little rusty as I tend to program in Java and Ruby these days. My Linux knowledge is also pretty limited but I'm willing to give it a try if someone could let me know what I need to set up my environment.

Regards,
Phil

mvoosten
04-16-2008, 08:13 AM
Is there any progress on this?? Searching in PDF is a #1 item for me and I assume a lot of people.. so make us happy ;)

mvoosten
04-16-2008, 08:27 AM
a possible other option to borrow from?!?:
http://sourceforge.net/projects/pdfsearch/

-Thomas-
04-16-2008, 06:31 PM
I've made a concept of a global PDF search, see attached screenshot...

Just a few things:

Results will be shown in content lister (in the future)
I don't know much about threaded programming (application freezes), so I definitely need some time
It's terribly slow, even in internal memory :(


edit: I don't know much about ipdf hacking, so has anybody an idea how to search in a single PDF file?