Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle > Kindle Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 12-08-2011, 09:47 AM   #1
vdp
Enthusiast
vdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watch
 
Posts: 39
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
LaunchHack - an OCR-based companion to LaunchPad

This kludge can be used along with LaunchPad to initiate some kind of action on the file which is currently selected in the Kindle's stock shell. Currently it works with Kindle DXG, but I believe it can be relatively easily modified to work on the other models(although I don't own and have no experience with the smaller devices).

How it works?
The program has three main steps/parts:

1) First the framebuffer is scanned to find the bold underline of the currently selected file and the title of this file is cropped and is sent to Tesseract OCR engine

2) Tesseract then converts the image to a recognized string. For the time being I am only interested in English text and that's what the software currently assumes. I use English Tesseract model data and haven't tested how well it handles for example German umlauts or cyrillic scripts(the latter will almost certainly require using a different model).
Because the title image can in fact contain both the documents name and metadata(producer of the document etc), the word bounding boxes returned by Tesseract are used to strip the metadata. The criterion used is the distance from the end of the BB of a word to the begining of the BB of the next. If the gap exceeds a certain threshold the next word is considered part of the metadata.

3) The OCR result is not always perfect. For example errors like "Introduct1on" instead of "Introduction" sometimes occur, so some kind of approximate string matching is desirable. A standard metric to measure the similarity between two string is for example Levenshtein distance, but it runs in quadratic time so it seems too "expensive". I am also aware of Levenshtein automata , but as I understand they assume a bounded number of errors that should be known beforehand.
Then I found SimString (paper, code). It uses a weaker notion of similarity based on the number of the common letter n-grams (sequences of letters), but it is fast. Actually I reimplemented the algorithm from the paper(as I understand it) because SimString's implementation uses a persistent database, and I wanted to be able to build it in memory on demand. Anyway it seems to work OK and finds the best matching filename in under a second. Note that it is not always possible to find the true file because the titles in the shell are truncated. E.g. if you have a file "Very very very very long filename1.pdf" and "Very very very very long filename2.pdf", the shell will truncate it to "Very very very long file...".

And that is all that this software does. When started it reads the framebuffer, tries to find the selected title and prints the absolute filename of the file it thinks is the best match.

How it can be used?

For example to start the hawhill's promising PDF viewer from launchpad you can use a kpdfview.ini like this:

Code:
[Actions]
;; run kpdfviewer
P D = !cd /mnt/us/kpdfview; ./reader.lua "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" &
The tool takes three arguments - the root directory to be searched for a matching file, a comma-separated list of filters which could be used to narrow the search to specific types of files and finally a similarity coefficient. This coefficient is in fact the percentage of the letter 3-grams to be matched. The files that have a lower percentage of overlap with the query will not be considered. With my very limited experiments I have found that the values in the range 0.5-0.6 work good. If this coefficient is too low, the search will be slower, and if too high the true matching file can be rejected/not found.

Installation
To install launchhack just extract the attached archive in /mnt/us/launchpad directory. The source code is here(still doesn't have a Makefile or even a README).

Finally a note to the brave developers that might want to look at the code: If you think the code is crap I agree with you . I admit my guilt in writing too long functions, using sloppy variable names, using classes for code that should be really just a function, lack of error conditions checks(if something go wrong it will just happily explode) and multitude of other unforgivable sins. Maybe some of these will be fixed, but frankly I don't want to spend much more time on this.

Edit:
BTW there is a potential use of this tool, which may not be immediately obvious. You can use it also to start readers for file formats not supported by the stock software. Say you want to be able to open epub files straight from the stock shell in fbKindle (I don't read much fiction and this is just what I have installed).

First modify the goqt.sh script to support passing of arguments to the reader app (add a "$2").
Code:
./"$1" -qws "$2"
Then create a script named lhindex.sh
Code:
#!/bin/sh

LHIDX=/mnt/us/documents/lhindex/$1

# in case it doesn't exist yet ...
mkdir -p $LHIDX

# remove the old entries
rm $LHIDX/*

# create a new 'index'
find /mnt/us/documents/ -name "*.$1" | sed -e "s/\.$1/\.txt/" | awk -F'/' -v I=$LHIDX '{print I"/"$NF;}' | while read f; do
  echo 1 > "$f"
done

# Force Kindle to scan the docs folder
dbus-send --system /default com.lab126.powerd.resuming int32:1
Finally create an epub.ini launchpad script
Code:
[Actions]
;; "Index" epubs
E I = !/mnt/us/launchpad/lhindex.sh epub &

;; run fbKindle
F E = !/mnt/us/fbKindle/goqt.sh FBReader "`/mnt/us/launchpad/lhack /mnt/us/documents *.epub 0.6`"&
Now you can "index" your epubs by pressing shift-E-I. This creates an empty text file(though non zero-length, because the shell doesn't show these) in /mnt/us/documents/lhindex/epub for every epub you have on your Kindle. Then you can select a title in the shell, press shift-F-E and voila fbKindle opens it's epub counterpart.

Edit 2: Added a binary for K3.
Attached Files
File Type: gz launchhack.tar.gz (5.70 MB, 81 views)
File Type: gz launchhack-k3.tar.gz (5.74 MB, 50 views)

Last edited by vdp; 12-27-2011 at 05:27 AM.
vdp is offline   Reply With Quote
Old 12-08-2011, 12:15 PM   #2
hawhill
Wizard
hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.hawhill ought to be getting tired of karma fortunes by now.
 
hawhill's Avatar
 
Posts: 1,218
Karma: 2124593
Join Date: Nov 2010
Location: Goettingen, Germany
Device: Kindle Paperwhite, Kobo Mini
Holy crap, you got me sitting here in a mix of admiration and disbelief. This is awesome in more than one way: The engineering thought of going this route, the guts to do so and at last the "hackery" level :-)

Well, I think you would agree if I add as a postscriptum: There just _has_ to be a simpler way :-)

Thanks for mentioning the PDF reader. There'll be updates soon, the sources already have a file choser and I'll fix a few other things and do another release then.
hawhill is offline   Reply With Quote
Old 12-08-2011, 08:12 PM   #3
PoP
Antonín ♯♭♪♮♫ ᵖʸᶠᵍᶜʳˡ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 519
Karma: 7391817
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3.₄, PRS-350, SGS3, Rπ, iPad Air
Brilliant. Awesome.
PoP is offline   Reply With Quote
Old 12-08-2011, 09:12 PM   #4
h1uke
Zealot
h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.h1uke can do the Funky Gibbon.
 
Posts: 121
Karma: 82565
Join Date: Aug 2010
Location: Maryland, USA
Device: dxg, k3w,k4nt,kpw
vdp, do you think that a small part of Tesseract can be used to quickly
analyze page structure and return a set of word/line bounding boxes?

This could seriously simplify emulation of a pointing device in GUI programs
ported to Kindle.
h1uke is offline   Reply With Quote
Old 12-09-2011, 05:46 AM   #5
vdp
Enthusiast
vdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watch
 
Posts: 39
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
@hawhill
Quote:
Originally Posted by hawhill View Post
Holy crap, you got me sitting here in a mix of admiration and disbelief. This is awesome in more than one way: The engineering thought of going this route, the guts to do so and at last the "hackery" level :-)
Thanks

Quote:
Originally Posted by hawhill View Post
Well, I think you would agree if I add as a postscriptum: There just _has_ to be a simpler way :-)
Absolutely! I also like the lsof-based trick that was commited by dpavlin in kpdfviewer's repo, but unfortunately it seems to require an additional step and probably takes more memory since two different pdf readers are loaded in memory.

Quote:
Originally Posted by hawhill View Post
Thanks for mentioning the PDF reader. There'll be updates soon, the sources already have a file choser and I'll fix a few other things and do another release then.
That's great!

@h1uke
Quote:
vdp, do you think that a small part of Tesseract can be used to quickly
analyze page structure and return a set of word/line bounding boxes?

This could seriously simplify emulation of a pointing device in GUI programs
ported to Kindle.
Maybe, I am not familiar with tesseract's internals. My understanding from reading the wikipedia's articles and information given on the Tesseract's website, is that document analysis features were added after Google started supporting the project. There are also other projects like Ocropus that seem to perfom the analysis and use Tesseract as a backend recognition engine.
So perhaps there are options... I would be also very interested if we can research and use/implement lightweight techniques from computer vision and ocr communities to analyse the documents and use the infromation for improving the navigation and presentation.
Take for example the two-column mode of Duokan. It is very useful in the majority of the cases, but what it does is just splitting in two equal parts. If the text is offset however or one of the columns is wider, the user is out of luck.
vdp is offline   Reply With Quote
Old 12-10-2011, 05:48 AM   #6
vdp
Enthusiast
vdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watch
 
Posts: 39
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
Added a new section about a potential usage pattern(as if the OP wasn't long enough ).
vdp is offline   Reply With Quote
Old 12-24-2011, 12:44 AM   #7
inameiname
Groupie
inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.
 
Posts: 156
Karma: 20390
Join Date: Feb 2009
Device: none
Unfortunately the following two Launchpad shortcuts don't work for me:

Code:
P D = !cd /mnt/us/kpdfview; ./reader.lua "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" &
Code:
F E = !/mnt/us/fbKindle/goqt.sh FBReader "`/mnt/us/launchpad/lhack /mnt/us/documents *.epub 0.6`"&
I also tried adding 'reader.lua -d k3' instead of just 'reader.lua', as I heard it helps to open up kpdfview (which it does, but doesn't resolve the issue with lhack not working):

Code:
P D = !cd /mnt/us/kpdfview; ./reader.lua -d k3 "`/mnt/us/launchpad/lhack /mnt/us/documents *.pdf 0.6`" &
So if anybody has any ideas, feel free to let me know. Overall, it says 'success', but then nothing happens.
inameiname is offline   Reply With Quote
Old 12-24-2011, 02:36 AM   #8
vdp
Enthusiast
vdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watch
 
Posts: 39
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
Quote:
Originally Posted by inameiname View Post
Unfortunately the following two Launchpad shortcuts don't work for me...
What device you are trying this on? As I said in the original post it's only tested on DXG (firmware 2.5.5) at the moment. For smaller form factor devices the screen layout is different, so it will fail to find the title strings. I think the only change that would be needed to work on other devices is a redefenition of these constants. If someone is interested in using this hack on other devices and send me a modified definition I will update the source code and will compile a new binary version. The measurments are in pixels and can be taken for example using GIMP.
vdp is offline   Reply With Quote
Old 12-24-2011, 02:58 AM   #9
inameiname
Groupie
inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.
 
Posts: 156
Karma: 20390
Join Date: Feb 2009
Device: none
Quote:
Originally Posted by vdp View Post
What device you are trying this on? As I said in the original post it's only tested on DXG (firmware 2.5.5) at the moment. For smaller form factor devices the screen layout is different, so it will fail to find the title strings. I think the only change that would be needed to work on other devices is a redefenition of these constants. If someone is interested in using this hack on other devices and send me a modified definition I will update the source code and will compile a new binary version. The measurments are in pixels and can be taken for example using GIMP.
I have a Kindle 3 Keyboard so yeah, I am sure that is the problem. I'd be happy to send you a modified definition, but I wouldn't know where to start on how to modifying it. Hehe I guess, then, I will have to ask anybody else reading this thread if they could look into it, particularly those Linux folks who also have a Kindle 3 Keyboard.
inameiname is offline   Reply With Quote
Old 12-24-2011, 09:40 AM   #10
PoP
Antonín ♯♭♪♮♫ ᵖʸᶠᵍᶜʳˡ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 519
Karma: 7391817
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3.₄, PRS-350, SGS3, Rπ, iPad Air
Kindle Keyboard constants

@vdp If it can help, I updated the source, the best I could for my device. I would certainly be interested in your parser supporting the Kindle Keyboard.
Spoiler:

/*
* Copyright 2011 Vassil Panayotov <vd.panayotov@gmail.com>
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#ifndef DEVICEDEFS_H
#define DEVICEDEFS_H

#include <climits>

namespace lhack {


// Kindle Keyboard parameters
struct KDXDimensions {
// Whole screen dimensions
static const int kScreenWidth = 600;
static const int kScreenHeight = 800;

// ... and bits per pixel
static const int kBPP = 4;

// Number of titles per page when browsing a collection
static const int kEntryPerPgCol = 9;

// Number of titles/page when NOT browsing a collection
static const int kEntryPerPg = 10;

// Vertical offset of the topmost pixel of the 1st title,
// when the contents of a collection are shown
static const int kOffsetYCol = 158;

// Vertical offset of the topmost pixel of the 1st title,
// when NOT browsing a collection
static const int kOffsetY = 100;

// The vertical offset of the collection's name underline
static const int kOffsetUlineCol = 124;

// Offset from the left margin of the first char in a title
static const int kOffsetX = 61;

// The gap between the underline of one title and the topmost pixels
// of the next.
static const int kEntryGap = 32;

// Maximum length of a title in pixels
static const int kEntryLen = 500;

// Maximum height of the title font
static const int kFontHeight = 16;

// Vertical distance in pixels from the bottommost pixel in a title to
// the bottommost pixel of the underlining squares
static const int kUlineBaseOffset = 6;

// The minimum distance from the font's baseline to the topmost pixels of the underline
static const int kUlineMinOffset = 7;

// The side length of the (bigger) underlining squares
//static const int kUlineSquareSz = 5;

// The color of the solid underline selection pointer
static const unsigned kUlineColor = UINT_MAX;

// Maximum gap between words in a title beyond which next words are considered to be metadata
static const int kMaxBBGap = 40;
};

};
#endif // DEVICEDEFS_H


Following are sample screen captures I used if you need to verify:
Attached Thumbnails
Click image for larger version

Name:	screen_shot-59123.gif
Views:	89
Size:	16.7 KB
ID:	80329   Click image for larger version

Name:	screen_shot-59126.gif
Views:	89
Size:	21.9 KB
ID:	80330  
PoP is offline   Reply With Quote
Old 12-24-2011, 11:28 AM   #11
inameiname
Groupie
inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.inameiname can self-interpret dreams as they happen.
 
Posts: 156
Karma: 20390
Join Date: Feb 2009
Device: none
Quote:
Originally Posted by PoP View Post
@vdp If it can help, I updated the source, the best I could for my device. I would certainly be interested in your parser supporting the Kindle Keyboard.
Spoiler:

/*
* Copyright 2011 Vassil Panayotov <vd.panayotov@gmail.com>
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#ifndef DEVICEDEFS_H
#define DEVICEDEFS_H

#include <climits>

namespace lhack {


// Kindle Keyboard parameters
struct KDXDimensions {
// Whole screen dimensions
static const int kScreenWidth = 600;
static const int kScreenHeight = 800;

// ... and bits per pixel
static const int kBPP = 4;

// Number of titles per page when browsing a collection
static const int kEntryPerPgCol = 9;

// Number of titles/page when NOT browsing a collection
static const int kEntryPerPg = 10;

// Vertical offset of the topmost pixel of the 1st title,
// when the contents of a collection are shown
static const int kOffsetYCol = 158;

// Vertical offset of the topmost pixel of the 1st title,
// when NOT browsing a collection
static const int kOffsetY = 100;

// The vertical offset of the collection's name underline
static const int kOffsetUlineCol = 124;

// Offset from the left margin of the first char in a title
static const int kOffsetX = 61;

// The gap between the underline of one title and the topmost pixels
// of the next.
static const int kEntryGap = 32;

// Maximum length of a title in pixels
static const int kEntryLen = 500;

// Maximum height of the title font
static const int kFontHeight = 16;

// Vertical distance in pixels from the bottommost pixel in a title to
// the bottommost pixel of the underlining squares
static const int kUlineBaseOffset = 6;

// The minimum distance from the font's baseline to the topmost pixels of the underline
static const int kUlineMinOffset = 7;

// The side length of the (bigger) underlining squares
//static const int kUlineSquareSz = 5;

// The color of the solid underline selection pointer
static const unsigned kUlineColor = UINT_MAX;

// Maximum gap between words in a title beyond which next words are considered to be metadata
static const int kMaxBBGap = 40;
};

};
#endif // DEVICEDEFS_H


Following are sample screen captures I used if you need to verify:
Thanks for the update. Again, I don't know how much I can be personally with it, but I will look into it when I get the chance. I picked a bad time to want to do this, being a holiday and all.
inameiname is offline   Reply With Quote
Old 12-25-2011, 07:32 AM   #12
vdp
Enthusiast
vdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watch
 
Posts: 39
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
Quote:
Originally Posted by PoP View Post
@vdp If it can help, I updated the source, the best I could for my device.
Thanks! I made some tweaks and compiled a K3 binary. I could only test on your screenshots, so I am not completely sure it works flawlessly. You can install the files from the attachment as described in OP. If you have usbnetwork installed, you can SSH to device and run something like
Code:
lhack /mnt/us/documents '*.pdf' 0.5
while selecting different titles to see if the result makes sense. I deliberately choose a relatively lower similarity coefficient, because the English model is not a perfect match for some of your files (in French as far as I can tell). For example "Je l'aimais - Anna Gavalda" is recognized as "Ie Paimais - Anna Gavalda". As explained in the OP, higher coefficients lead to a slightly faster search, but sometimes produce false negatives.
Please let me know if it works for you.

Edit: The K3 binary is now attached to the OP.

Last edited by vdp; 12-27-2011 at 05:30 AM.
vdp is offline   Reply With Quote
Old 12-25-2011, 12:54 PM   #13
PoP
Antonín ♯♭♪♮♫ ᵖʸᶠᵍᶜʳˡ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 519
Karma: 7391817
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3.₄, PRS-350, SGS3, Rπ, iPad Air
Quote:
Originally Posted by vdp View Post
... I deliberately choose a relatively lower similarity coefficient, because the English model is not a perfect match for some of your files (in French as far as I can tell).
...
Ack - Hopefully close enough. Should still be usable for me as most of my ebooks are in english anyway.

Quote:
Originally Posted by vdp View Post
...
Please let me know if it works for you.
Oops, while placing the selection bar under my first pdf (actually same result on any selection) got the following problem:
Code:
[root@kindle launchpad]# ./lhack /mnt/us/documents '*.pdf' 0.5
Error opening data file /mnt/us/launchhack/share/tessdata/eng.traineddata
Segmentation fault
[root@kindle launchpad]#
PoP is offline   Reply With Quote
Old 12-26-2011, 02:44 AM   #14
vdp
Enthusiast
vdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watchvdp is clearly one to watch
 
Posts: 39
Karma: 10842
Join Date: Aug 2010
Device: Kindle DXG
Quote:
Originally Posted by PoP View Post
Oops, while placing the selection bar under my first pdf (actually same result on any selection) got the following problem:
Code:
[root@kindle launchpad]# ./lhack /mnt/us/documents '*.pdf' 0.5
Error opening data file /mnt/us/launchhack/share/tessdata/eng.traineddata
Segmentation fault
[root@kindle launchpad]#
Oops, sorry about that. Looks like I linked against a wrong library version that searches for Tesseract models in "launchhack" rather than "launchpad" directory. You can try to move the "share" dir in a new "/mnt/us/launchhack" folder or better yet just replace the executable with the one attached here. Hopefully it will work better this time...

Edit: The K3 binary is now attached to the OP.

Last edited by vdp; 12-27-2011 at 05:31 AM.
vdp is offline   Reply With Quote
Old 12-26-2011, 08:51 AM   #15
PoP
Antonín ♯♭♪♮♫ ᵖʸᶠᵍᶜʳˡ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 519
Karma: 7391817
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3.₄, PRS-350, SGS3, Rπ, iPad Air
Quote:
Originally Posted by vdp View Post
... or better yet just replace the executable with the one attached here. Hopefully it will work better this time...
Your gem now works as a charm. Thanks for the Christmas gift.
PoP is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Start usbNetwork with launchpad? giorgio130 Kindle Developer's Corner 5 12-03-2011 11:19 AM
Wangdoo for launchpad Xqtftqx Kindle Developer's Corner 3 11-16-2011 02:04 AM
Launchpad in Duokan frediz Kindle Developer's Corner 1 11-01-2011 06:14 PM
How to convert an OCR file to a Non-OCR one res9282 PDF 1 08-05-2011 06:58 AM
Can't register with Launchpad AlexBell Calibre 6 07-23-2011 11:40 AM


All times are GMT -4. The time now is 12:27 AM.


MobileRead.com is a privately owned, operated and funded community.