|11-09-2013, 04:13 AM||#1|
Join Date: Jan 2013
Location: West Gardiner, Maine
Device: Touch (5.3.7)
I compiled this and ripped convert and tesseract-ocr from debian. put together a few scripts to try it out. I did not bother with the GUI as it's Tcl/Tk.
result: works terribly, can't download all the needed files to convert properly.
why the heck did I post this: I thought maybe someone else might be interested enough to mess with it. I'm done, but if someone wants, I can supply a larger file with the tessdata directory to make tesseract work - it's 34Mb so I didn't post it yet.
directions: in a web browser, find a book in google books that you can preview. write down the code after the ID= part in the address. In the KUAL button for getxbook, type "./getgbook.sh code" and it should download all the pages (mostly jpg and pngs) to a directory in the current. "ls" the directory name. "mkpdf.sh directoryname" should try to build a pdf of the images into a pdf. mkocrtxt.sh is to convert the images to a tiff, then OCR the images to text files. I couldn't figure out getbnbook or getabook. Lots of other smart people out there, try "./getbnbook.sh -h"...
Have a nice day.