MobileRead Forums - View Single Post

brianinmaine · 11-09-2013, 05:13 AM

http://njw.me.uk/getxbook/

source: http://njw.me.uk/getxbook/getxbook-1.1.tar.bz2

I compiled this and ripped convert and tesseract-ocr from debian. put together a few scripts to try it out. I did not bother with the GUI as it's Tcl/Tk.

result: works terribly, can't download all the needed files to convert properly.

why the heck did I post this: I thought maybe someone else might be interested enough to mess with it. I'm done, but if someone wants, I can supply a larger file with the tessdata directory to make tesseract work - it's 34Mb so I didn't post it yet.

directions: in a web browser, find a book in google books that you can preview. write down the code after the ID= part in the address. In the KUAL button for getxbook, type "./getgbook.sh code" and it should download all the pages (mostly jpg and pngs) to a directory in the current. "ls" the directory name. "mkpdf.sh directoryname" should try to build a pdf of the images into a pdf. mkocrtxt.sh is to convert the images to a tiff, then OCR the images to text files. I couldn't figure out getbnbook or getabook. Lots of other smart people out there, try "./getbnbook.sh -h"...

Have a nice day.