LaunchHack - an OCR-based companion to LaunchPad

vdp · 12-08-2011, 08:47 AM

This kludge can be used along with LaunchPad to initiate some kind of action on the file which is currently selected in the Kindle's stock shell. Currently it works with Kindle DXG, but I believe it can be relatively easily modified to work on the other models(although I don't own and have no experience with the smaller devices).

How it works?
The program has three main steps/parts:

1) First the framebuffer is scanned to find the bold underline of the currently selected file and the title of this file is cropped and is sent to Tesseract OCR engine

2) Tesseract then converts the image to a recognized string. For the time being I am only interested in English text and that's what the software currently assumes. I use English Tesseract model data and haven't tested how well it handles for example German umlauts or cyrillic scripts(the latter will almost certainly require using a different model).
Because the title image can in fact contain both the documents name and metadata(producer of the document etc), the word bounding boxes returned by Tesseract are used to strip the metadata. The criterion used is the distance from the end of the BB of a word to the begining of the BB of the next. If the gap exceeds a certain threshold the next word is considered part of the metadata.

3) The OCR result is not always perfect. For example errors like "Introduct1on" instead of "Introduction" sometimes occur, so some kind of approximate string matching is desirable. A standard metric to measure the similarity between two string is for example Levenshtein distance, but it runs in quadratic time so it seems too "expensive". I am also aware of Levenshtein automata , but as I understand they assume a bounded number of errors that should be known beforehand.
Then I found SimString (paper, code). It uses a weaker notion of similarity based on the number of the common letter n-grams (sequences of letters), but it is fast. Actually I reimplemented the algorithm from the paper(as I understand it) because SimString's implementation uses a persistent database, and I wanted to be able to build it in memory on demand. Anyway it seems to work OK and finds the best matching filename in under a second. Note that it is not always possible to find the true file because the titles in the shell are truncated. E.g. if you have a file "Very very very very long filename1.pdf" and "Very very very very long filename2.pdf", the shell will truncate it to "Very very very long file...".

And that is all that this software does. When started it reads the framebuffer, tries to find the selected title and prints the absolute filename of the file it thinks is the best match.

How it can be used?

For example to start the hawhill's promising PDF viewer from launchpad you can use a kpdfview.ini like this:

Code:

[Actions]
;; run kpdfviewer
P D = !cd /mnt/us/kpdfview; ./reader.lua "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" &

The tool takes three arguments - the root directory to be searched for a matching file, a comma-separated list of filters which could be used to narrow the search to specific types of files and finally a similarity coefficient. This coefficient is in fact the percentage of the letter 3-grams to be matched. The files that have a lower percentage of overlap with the query will not be considered. With my very limited experiments I have found that the values in the range 0.5-0.6 work good. If this coefficient is too low, the search will be slower, and if too high the true matching file can be rejected/not found.

Installation
To install launchhack just extract the attached archive in /mnt/us/launchpad directory. The source code is here(still doesn't have a Makefile or even a README).

Finally a note to the brave developers that might want to look at the code: If you think the code is crap I agree with you

. I admit my guilt in writing too long functions, using sloppy variable names, using classes for code that should be really just a function, lack of error conditions checks(if something go wrong it will just happily explode) and multitude of other unforgivable sins. Maybe some of these will be fixed, but frankly I don't want to spend much more time on this.

Edit:
BTW there is a potential use of this tool, which may not be immediately obvious. You can use it also to start readers for file formats not supported by the stock software. Say you want to be able to open epub files straight from the stock shell in fbKindle (I don't read much fiction and this is just what I have installed).

First modify the goqt.sh script to support passing of arguments to the reader app (add a "$2").

Code:

./"$1" -qws "$2"

Then create a script named lhindex.sh

Code:

#!/bin/sh

LHIDX=/mnt/us/documents/lhindex/$1

# in case it doesn't exist yet ...
mkdir -p $LHIDX

# remove the old entries
rm $LHIDX/*

# create a new 'index'
find /mnt/us/documents/ -name "*.$1" | sed -e "s/\.$1/\.txt/" | awk -F'/' -v I=$LHIDX '{print I"/"$NF;}' | while read f; do
  echo 1 > "$f"
done

# Force Kindle to scan the docs folder
dbus-send --system /default com.lab126.powerd.resuming int32:1

Finally create an epub.ini launchpad script

Code:

[Actions]
;; "Index" epubs
E I = !/mnt/us/launchpad/lhindex.sh epub &

;; run fbKindle
F E = !/mnt/us/fbKindle/goqt.sh FBReader "`/mnt/us/launchpad/lhack /mnt/us/documents *.epub 0.6`"&

Now you can "index" your epubs by pressing shift-E-I. This creates an empty text file(though non zero-length, because the shell doesn't show these) in /mnt/us/documents/lhindex/epub for every epub you have on your Kindle. Then you can select a title in the shell, press shift-F-E and voila fbKindle opens it's epub counterpart.

Edit 2: Added a binary for K3.

hawhill · 12-08-2011, 11:15 AM

Holy crap, you got me sitting here in a mix of admiration and disbelief. This is awesome in more than one way: The engineering thought of going this route, the guts to do so and at last the "hackery" level :-)

Well, I think you would agree if I add as a postscriptum: There just _has_ to be a simpler way :-)

Thanks for mentioning the PDF reader. There'll be updates soon, the sources already have a file choser and I'll fix a few other things and do another release then.

PoP · 12-08-2011, 07:12 PM

Brilliant. Awesome.

h1uke · 12-08-2011, 08:12 PM

vdp, do you think that a small part of Tesseract can be used to quickly
analyze page structure and return a set of word/line bounding boxes?

This could seriously simplify emulation of a pointing device in GUI programs
ported to Kindle.

vdp · 12-09-2011, 04:46 AM

@hawhill

Quote:

Originally Posted by hawhill

Holy crap, you got me sitting here in a mix of admiration and disbelief. This is awesome in more than one way: The engineering thought of going this route, the guts to do so and at last the "hackery" level :-)

Thanks

Quote:

Originally Posted by hawhill

Well, I think you would agree if I add as a postscriptum: There just _has_ to be a simpler way :-)

Absolutely! I also like the lsof-based trick that was commited by dpavlin in kpdfviewer's repo, but unfortunately it seems to require an additional step and probably takes more memory since two different pdf readers are loaded in memory.

Quote:

Originally Posted by hawhill

Thanks for mentioning the PDF reader. There'll be updates soon, the sources already have a file choser and I'll fix a few other things and do another release then.

That's great!

@h1uke

Quote:

vdp, do you think that a small part of Tesseract can be used to quickly
analyze page structure and return a set of word/line bounding boxes?

This could seriously simplify emulation of a pointing device in GUI programs
ported to Kindle.

Maybe, I am not familiar with tesseract's internals. My understanding from reading the wikipedia's articles and information given on the Tesseract's website, is that document analysis features were added after Google started supporting the project. There are also other projects like Ocropus that seem to perfom the analysis and use Tesseract as a backend recognition engine.
So perhaps there are options... I would be also very interested if we can research and use/implement lightweight techniques from computer vision and ocr communities to analyse the documents and use the infromation for improving the navigation and presentation.
Take for example the two-column mode of Duokan. It is very useful in the majority of the cases, but what it does is just splitting in two equal parts. If the text is offset however or one of the columns is wider, the user is out of luck.

vdp · 12-10-2011, 04:48 AM

Added a new section about a potential usage pattern(as if the OP wasn't long enough

).

inameiname · 12-23-2011, 11:44 PM

Unfortunately the following two Launchpad shortcuts don't work for me:

Code:

P D = !cd /mnt/us/kpdfview; ./reader.lua "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" &

Code:

F E = !/mnt/us/fbKindle/goqt.sh FBReader "`/mnt/us/launchpad/lhack /mnt/us/documents *.epub 0.6`"&

I also tried adding 'reader.lua -d k3' instead of just 'reader.lua', as I heard it helps to open up kpdfview (which it does, but doesn't resolve the issue with lhack not working):

Code:

P D = !cd /mnt/us/kpdfview; ./reader.lua -d k3 "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" &

So if anybody has any ideas, feel free to let me know. Overall, it says 'success', but then nothing happens.

vdp · 12-24-2011, 01:36 AM

Quote:

Originally Posted by inameiname

Unfortunately the following two Launchpad shortcuts don't work for me...

What device you are trying this on? As I said in the original post it's only tested on DXG (firmware 2.5.5) at the moment. For smaller form factor devices the screen layout is different, so it will fail to find the title strings. I think the only change that would be needed to work on other devices is a redefenition of these constants. If someone is interested in using this hack on other devices and send me a modified definition I will update the source code and will compile a new binary version. The measurments are in pixels and can be taken for example using GIMP.

inameiname · 12-24-2011, 01:58 AM

Quote:

Originally Posted by vdp

What device you are trying this on? As I said in the original post it's only tested on DXG (firmware 2.5.5) at the moment. For smaller form factor devices the screen layout is different, so it will fail to find the title strings. I think the only change that would be needed to work on other devices is a redefenition of these constants. If someone is interested in using this hack on other devices and send me a modified definition I will update the source code and will compile a new binary version. The measurments are in pixels and can be taken for example using GIMP.

I have a Kindle 3 Keyboard so yeah, I am sure that is the problem. I'd be happy to send you a modified definition, but I wouldn't know where to start on how to modifying it. Hehe I guess, then, I will have to ask anybody else reading this thread if they could look into it, particularly those Linux folks who also have a Kindle 3 Keyboard.

PoP · 12-24-2011, 08:40 AM

@vdp If it can help, I updated the source, the best I could for my device. I would certainly be interested in your parser supporting the Kindle Keyboard.

Spoiler:

Following are sample screen captures I used if you need to verify:

inameiname · 12-24-2011, 10:28 AM

Quote:

Originally Posted by PoP

@vdp If it can help, I updated the source, the best I could for my device. I would certainly be interested in your parser supporting the Kindle Keyboard.

Spoiler:

Following are sample screen captures I used if you need to verify:

Thanks for the update. Again, I don't know how much I can be personally with it, but I will look into it when I get the chance. I picked a bad time to want to do this, being a holiday and all.

vdp · 12-25-2011, 06:32 AM

Quote:

Originally Posted by PoP

@vdp If it can help, I updated the source, the best I could for my device.

Thanks! I made some tweaks and compiled a K3 binary. I could only test on your screenshots, so I am not completely sure it works flawlessly. You can install the files from the attachment as described in OP. If you have usbnetwork installed, you can SSH to device and run something like

Code:

lhack /mnt/us/documents '*.pdf' 0.5

while selecting different titles to see if the result makes sense. I deliberately choose a relatively lower similarity coefficient, because the English model is not a perfect match for some of your files (in French as far as I can tell). For example "Je l'aimais - Anna Gavalda" is recognized as "Ie Paimais - Anna Gavalda". As explained in the OP, higher coefficients lead to a slightly faster search, but sometimes produce false negatives.
Please let me know if it works for you.

Edit: The K3 binary is now attached to the OP.

PoP · 12-25-2011, 11:54 AM

Quote:

Originally Posted by vdp

... I deliberately choose a relatively lower similarity coefficient, because the English model is not a perfect match for some of your files (in French as far as I can tell).
...

Ack - Hopefully close enough. Should still be usable for me as most of my ebooks are in english anyway.

Quote:

Originally Posted by vdp

...
Please let me know if it works for you.

Oops, while placing the selection bar under my first pdf (actually same result on any selection) got the following problem:

Code:

[root@kindle launchpad]# ./lhack /mnt/us/documents '*.pdf' 0.5
Error opening data file /mnt/us/launchhack/share/tessdata/eng.traineddata
Segmentation fault
[root@kindle launchpad]#

vdp · 12-26-2011, 01:44 AM

Quote:

Originally Posted by PoP

Oops, while placing the selection bar under my first pdf (actually same result on any selection) got the following problem:

Code:

[root@kindle launchpad]# ./lhack /mnt/us/documents '*.pdf' 0.5
Error opening data file /mnt/us/launchhack/share/tessdata/eng.traineddata
Segmentation fault
[root@kindle launchpad]#

Oops, sorry about that. Looks like I linked against a wrong library version that searches for Tesseract models in "launchhack" rather than "launchpad" directory. You can try to move the "share" dir in a new "/mnt/us/launchhack" folder or better yet just replace the executable with the one attached here. Hopefully it will work better this time...

Edit: The K3 binary is now attached to the OP.

PoP · 12-26-2011, 07:51 AM

Quote:

Originally Posted by vdp

... or better yet just replace the executable with the one attached here. Hopefully it will work better this time...

Your gem now works as a charm. Thanks for the Christmas gift.

12-23-2011, 11:44 PM	#7
inameiname Groupie Posts: 159 Karma: 20390 Join Date: Feb 2009 Device: none	Unfortunately the following two Launchpad shortcuts don't work for me: Code: P D = !cd /mnt/us/kpdfview; ./reader.lua "`/mnt/us/launchpad/lhack /mnt/us/documents .pdf 0.6`" & Code: F E = !/mnt/us/fbKindle/goqt.sh FBReader "`/mnt/us/launchpad/lhack /mnt/us/documents .epub 0.6`"& I also tried adding 'reader.lua -d k3' instead of just 'reader.lua', as I heard it helps to open up kpdfview (which it does, but doesn't resolve the issue with lhack not working): Code: P D = !cd /mnt/us/kpdfview; ./reader.lua -d k3 "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" & So if anybody has any ideas, feel free to let me know. Overall, it says 'success', but then nothing happens.

12-24-2011, 08:40 AM	#10
PoP curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ Posts: 3,002 Karma: 50506927 Join Date: Dec 2010 Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀" Device: K3₃.₄.₃ PW3&4₅.₁₃.₃	Kindle Keyboard constants @vdp If it can help, I updated the source, the best I could for my device. I would certainly be interested in your parser supporting the Kindle Keyboard. Spoiler: /* * Copyright 2011 Vassil Panayotov <vd.panayotov@gmail.com> * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #ifndef DEVICEDEFS_H #define DEVICEDEFS_H #include <climits> namespace lhack { // Kindle Keyboard parameters struct KDXDimensions { // Whole screen dimensions static const int kScreenWidth = 600; static const int kScreenHeight = 800; // ... and bits per pixel static const int kBPP = 4; // Number of titles per page when browsing a collection static const int kEntryPerPgCol = 9; // Number of titles/page when NOT browsing a collection static const int kEntryPerPg = 10; // Vertical offset of the topmost pixel of the 1st title, // when the contents of a collection are shown static const int kOffsetYCol = 158; // Vertical offset of the topmost pixel of the 1st title, // when NOT browsing a collection static const int kOffsetY = 100; // The vertical offset of the collection's name underline static const int kOffsetUlineCol = 124; // Offset from the left margin of the first char in a title static const int kOffsetX = 61; // The gap between the underline of one title and the topmost pixels // of the next. static const int kEntryGap = 32; // Maximum length of a title in pixels static const int kEntryLen = 500; // Maximum height of the title font static const int kFontHeight = 16; // Vertical distance in pixels from the bottommost pixel in a title to // the bottommost pixel of the underlining squares static const int kUlineBaseOffset = 6; // The minimum distance from the font's baseline to the topmost pixels of the underline static const int kUlineMinOffset = 7; // The side length of the (bigger) underlining squares //static const int kUlineSquareSz = 5; // The color of the solid underline selection pointer static const unsigned kUlineColor = UINT_MAX; // Maximum gap between words in a title beyond which next words are considered to be metadata static const int kMaxBBGap = 40; }; }; #endif // DEVICEDEFS_H Following are sample screen captures I used if you need to verify: Attached Thumbnails

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Start usbNetwork with launchpad?	giorgio130	Kindle Developer's Corner	5	12-03-2011 10:19 AM
Wangdoo for launchpad	Xqtftqx	Kindle Developer's Corner	3	11-16-2011 01:04 AM
Launchpad in Duokan	frediz	Kindle Developer's Corner	1	11-01-2011 05:14 PM
How to convert an OCR file to a Non-OCR one	res9282	PDF	1	08-05-2011 05:58 AM
Can't register with Launchpad	AlexBell	Calibre	6	07-23-2011 10:40 AM

12-08-2011, 11:15 AM	#2
hawhill Wizard Posts: 1,379 Karma: 2155307 Join Date: Nov 2010 Location: Goettingen, Germany Device: Kindle Paperwhite, Kobo Mini	Holy crap, you got me sitting here in a mix of admiration and disbelief. This is awesome in more than one way: The engineering thought of going this route, the guts to do so and at last the "hackery" level :-) Well, I think you would agree if I add as a postscriptum: There just _has_ to be a simpler way :-) Thanks for mentioning the PDF reader. There'll be updates soon, the sources already have a file choser and I'll fix a few other things and do another release then.

12-08-2011, 07:12 PM	#3
PoP curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ Posts: 3,002 Karma: 50506927 Join Date: Dec 2010 Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀" Device: K3₃.₄.₃ PW3&4₅.₁₃.₃	Brilliant. Awesome.

12-08-2011, 08:12 PM	#4
h1uke Zealot Posts: 121 Karma: 82565 Join Date: Aug 2010 Location: Maryland, USA Device: dxg, k3w,k4nt,kpw	vdp, do you think that a small part of Tesseract can be used to quickly analyze page structure and return a set of word/line bounding boxes? This could seriously simplify emulation of a pointing device in GUI programs ported to Kindle.

12-10-2011, 04:48 AM	#6
vdp Enthusiast Posts: 45 Karma: 10842 Join Date: Aug 2010 Device: Kindle DXG	Added a new section about a potential usage pattern(as if the OP wasn't long enough ).