MobileRead Forums - View Single Post

daudi · 02-15-2008, 07:23 AM

I've been doing this for a while and it is definitely faster. I've also played with having a large area at the bottom of each page or to the side of each page because we are not restricted to A4 or letter size paper. This way I can have the article page at the top and a whole load of space at the bottom of each page (imagine a huge footer margin) and then just drag to the right place, scribble, drag back to where I was reading.

When I get around to creating the script for making indexed and hyperlinked PDF notebooks from arbitrary source PDFs/PNGs it will be possible to process a PDF journal article and add the hyperlinked index page(s). It will be possible to specify the size of the index area for each page. That way it will be possible to write notes for each page and then click on the entry to go to the actual page. Notice the extensive use of future tense.

Another idea I have for processing PDFs with hyperlinks is to create an index of key (or all) words and have them link back to the pages they are used on. This would partially get around the lack of a search function. It would work something like this:

extract the text from the PDF using pdftohtml -xml (which also gives page numbers)
record each unique occurrance of each word and its page
exclude any words in a user-defined exclude list (e.g. "and", "this", "that", etc)
OR:
only include words in a user-defined include list (e.g. "retinopathy", "neuropathy", etc)
format an alphabetical index with the page numbers where each word occurs, with each page number being a hyperlink to that page
attach the index to the original PDF

Then when you want to "search" a PDF on the iliad for a word you go to the index, look for the word and click on the link, find that's not the one you want, click on "back" to go back to the index, and click on the next one. Not ideal, but I can't program in C++ so I can't hack the ipdf code for proper searching.

Another thing to do when I get time. I don't think it will be too hard to implement though.

02-15-2008, 07:23 AM	#8
daudi Addict Posts: 281 Karma: 904 Join Date: Oct 2007 Location: Kent, UK Device: iRex iLiad, Psion 5MX, nokia n800	I've been doing this for a while and it is definitely faster. I've also played with having a large area at the bottom of each page or to the side of each page because we are not restricted to A4 or letter size paper. This way I can have the article page at the top and a whole load of space at the bottom of each page (imagine a huge footer margin) and then just drag to the right place, scribble, drag back to where I was reading. When I get around to creating the script for making indexed and hyperlinked PDF notebooks from arbitrary source PDFs/PNGs it will be possible to process a PDF journal article and add the hyperlinked index page(s). It will be possible to specify the size of the index area for each page. That way it will be possible to write notes for each page and then click on the entry to go to the actual page. Notice the extensive use of future tense. Another idea I have for processing PDFs with hyperlinks is to create an index of key (or all) words and have them link back to the pages they are used on. This would partially get around the lack of a search function. It would work something like this: extract the text from the PDF using pdftohtml -xml (which also gives page numbers) record each unique occurrance of each word and its page exclude any words in a user-defined exclude list (e.g. "and", "this", "that", etc) OR: only include words in a user-defined include list (e.g. "retinopathy", "neuropathy", etc) format an alphabetical index with the page numbers where each word occurs, with each page number being a hyperlink to that page attach the index to the original PDF Then when you want to "search" a PDF on the iliad for a word you go to the index, look for the word and click on the link, find that's not the one you want, click on "back" to go back to the index, and click on the next one. Not ideal, but I can't program in C++ so I can't hack the ipdf code for proper searching. Another thing to do when I get time. I don't think it will be too hard to implement though.