12-08-2011, 08:47 AM | #1 |
Enthusiast
Posts: 45
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
|
LaunchHack - an OCR-based companion to LaunchPad
This kludge can be used along with LaunchPad to initiate some kind of action on the file which is currently selected in the Kindle's stock shell. Currently it works with Kindle DXG, but I believe it can be relatively easily modified to work on the other models(although I don't own and have no experience with the smaller devices).
How it works? The program has three main steps/parts: 1) First the framebuffer is scanned to find the bold underline of the currently selected file and the title of this file is cropped and is sent to Tesseract OCR engine 2) Tesseract then converts the image to a recognized string. For the time being I am only interested in English text and that's what the software currently assumes. I use English Tesseract model data and haven't tested how well it handles for example German umlauts or cyrillic scripts(the latter will almost certainly require using a different model). Because the title image can in fact contain both the documents name and metadata(producer of the document etc), the word bounding boxes returned by Tesseract are used to strip the metadata. The criterion used is the distance from the end of the BB of a word to the begining of the BB of the next. If the gap exceeds a certain threshold the next word is considered part of the metadata. 3) The OCR result is not always perfect. For example errors like "Introduct1on" instead of "Introduction" sometimes occur, so some kind of approximate string matching is desirable. A standard metric to measure the similarity between two string is for example Levenshtein distance, but it runs in quadratic time so it seems too "expensive". I am also aware of Levenshtein automata , but as I understand they assume a bounded number of errors that should be known beforehand. Then I found SimString (paper, code). It uses a weaker notion of similarity based on the number of the common letter n-grams (sequences of letters), but it is fast. Actually I reimplemented the algorithm from the paper(as I understand it) because SimString's implementation uses a persistent database, and I wanted to be able to build it in memory on demand. Anyway it seems to work OK and finds the best matching filename in under a second. Note that it is not always possible to find the true file because the titles in the shell are truncated. E.g. if you have a file "Very very very very long filename1.pdf" and "Very very very very long filename2.pdf", the shell will truncate it to "Very very very long file...". And that is all that this software does. When started it reads the framebuffer, tries to find the selected title and prints the absolute filename of the file it thinks is the best match. How it can be used? For example to start the hawhill's promising PDF viewer from launchpad you can use a kpdfview.ini like this: Code:
[Actions] ;; run kpdfviewer P D = !cd /mnt/us/kpdfview; ./reader.lua "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" & Installation To install launchhack just extract the attached archive in /mnt/us/launchpad directory. The source code is here(still doesn't have a Makefile or even a README). Finally a note to the brave developers that might want to look at the code: If you think the code is crap I agree with you . I admit my guilt in writing too long functions, using sloppy variable names, using classes for code that should be really just a function, lack of error conditions checks(if something go wrong it will just happily explode) and multitude of other unforgivable sins. Maybe some of these will be fixed, but frankly I don't want to spend much more time on this. Edit: BTW there is a potential use of this tool, which may not be immediately obvious. You can use it also to start readers for file formats not supported by the stock software. Say you want to be able to open epub files straight from the stock shell in fbKindle (I don't read much fiction and this is just what I have installed). First modify the goqt.sh script to support passing of arguments to the reader app (add a "$2"). Code:
./"$1" -qws "$2" Code:
#!/bin/sh LHIDX=/mnt/us/documents/lhindex/$1 # in case it doesn't exist yet ... mkdir -p $LHIDX # remove the old entries rm $LHIDX/* # create a new 'index' find /mnt/us/documents/ -name "*.$1" | sed -e "s/\.$1/\.txt/" | awk -F'/' -v I=$LHIDX '{print I"/"$NF;}' | while read f; do echo 1 > "$f" done # Force Kindle to scan the docs folder dbus-send --system /default com.lab126.powerd.resuming int32:1 Code:
[Actions] ;; "Index" epubs E I = !/mnt/us/launchpad/lhindex.sh epub & ;; run fbKindle F E = !/mnt/us/fbKindle/goqt.sh FBReader "`/mnt/us/launchpad/lhack /mnt/us/documents *.epub 0.6`"& Edit 2: Added a binary for K3. Last edited by vdp; 12-27-2011 at 04:27 AM. |
12-08-2011, 11:15 AM | #2 |
Wizard
Posts: 1,379
Karma: 2155307
Join Date: Nov 2010
Location: Goettingen, Germany
Device: Kindle Paperwhite, Kobo Mini
|
Holy crap, you got me sitting here in a mix of admiration and disbelief. This is awesome in more than one way: The engineering thought of going this route, the guts to do so and at last the "hackery" level :-)
Well, I think you would agree if I add as a postscriptum: There just _has_ to be a simpler way :-) Thanks for mentioning the PDF reader. There'll be updates soon, the sources already have a file choser and I'll fix a few other things and do another release then. |
12-08-2011, 07:12 PM | #3 |
curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
|
Brilliant. Awesome.
|
12-08-2011, 08:12 PM | #4 |
Zealot
Posts: 121
Karma: 82565
Join Date: Aug 2010
Location: Maryland, USA
Device: dxg, k3w,k4nt,kpw
|
vdp, do you think that a small part of Tesseract can be used to quickly
analyze page structure and return a set of word/line bounding boxes? This could seriously simplify emulation of a pointing device in GUI programs ported to Kindle. |
12-09-2011, 04:46 AM | #5 | ||||
Enthusiast
Posts: 45
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
|
@hawhill
Quote:
Quote:
Quote:
@h1uke Quote:
So perhaps there are options... I would be also very interested if we can research and use/implement lightweight techniques from computer vision and ocr communities to analyse the documents and use the infromation for improving the navigation and presentation. Take for example the two-column mode of Duokan. It is very useful in the majority of the cases, but what it does is just splitting in two equal parts. If the text is offset however or one of the columns is wider, the user is out of luck. |
||||
12-10-2011, 04:48 AM | #6 |
Enthusiast
Posts: 45
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
|
Added a new section about a potential usage pattern(as if the OP wasn't long enough ).
|
12-23-2011, 11:44 PM | #7 |
Groupie
Posts: 159
Karma: 20390
Join Date: Feb 2009
Device: none
|
Unfortunately the following two Launchpad shortcuts don't work for me:
Code:
P D = !cd /mnt/us/kpdfview; ./reader.lua "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" & Code:
F E = !/mnt/us/fbKindle/goqt.sh FBReader "`/mnt/us/launchpad/lhack /mnt/us/documents *.epub 0.6`"& Code:
P D = !cd /mnt/us/kpdfview; ./reader.lua -d k3 "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" & |
12-24-2011, 01:36 AM | #8 | |
Enthusiast
Posts: 45
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
|
Quote:
|
|
12-24-2011, 01:58 AM | #9 | |
Groupie
Posts: 159
Karma: 20390
Join Date: Feb 2009
Device: none
|
Quote:
|
|
12-24-2011, 08:40 AM | #10 |
curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
|
Kindle Keyboard constants
@vdp If it can help, I updated the source, the best I could for my device. I would certainly be interested in your parser supporting the Kindle Keyboard.
Spoiler:
Following are sample screen captures I used if you need to verify: |
12-24-2011, 10:28 AM | #11 |
Groupie
Posts: 159
Karma: 20390
Join Date: Feb 2009
Device: none
|
Thanks for the update. Again, I don't know how much I can be personally with it, but I will look into it when I get the chance. I picked a bad time to want to do this, being a holiday and all.
|
12-25-2011, 06:32 AM | #12 | |
Enthusiast
Posts: 45
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
|
Quote:
Code:
lhack /mnt/us/documents '*.pdf' 0.5 Please let me know if it works for you. Edit: The K3 binary is now attached to the OP. Last edited by vdp; 12-27-2011 at 04:30 AM. |
|
12-25-2011, 11:54 AM | #13 | |
curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
|
Quote:
Oops, while placing the selection bar under my first pdf (actually same result on any selection) got the following problem: Code:
[root@kindle launchpad]# ./lhack /mnt/us/documents '*.pdf' 0.5 Error opening data file /mnt/us/launchhack/share/tessdata/eng.traineddata Segmentation fault [root@kindle launchpad]# |
|
12-26-2011, 01:44 AM | #14 | |
Enthusiast
Posts: 45
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
|
Quote:
Edit: The K3 binary is now attached to the OP. Last edited by vdp; 12-27-2011 at 04:31 AM. |
|
12-26-2011, 07:51 AM | #15 |
curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Start usbNetwork with launchpad? | giorgio130 | Kindle Developer's Corner | 5 | 12-03-2011 10:19 AM |
Wangdoo for launchpad | Xqtftqx | Kindle Developer's Corner | 3 | 11-16-2011 01:04 AM |
Launchpad in Duokan | frediz | Kindle Developer's Corner | 1 | 11-01-2011 05:14 PM |
How to convert an OCR file to a Non-OCR one | res9282 | 1 | 08-05-2011 05:58 AM | |
Can't register with Launchpad | AlexBell | Calibre | 6 | 07-23-2011 10:40 AM |