11-05-2018, 09:59 PM | #1606 | |
Junior Member
Posts: 5
Karma: 42206
Join Date: Nov 2018
Device: kindle paperwhite 3
|
Quote:
|
|
11-06-2018, 08:28 AM | #1607 | |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
-ocrout outfile.txt You'll probably have to go through and clean it up a bit, but the OCR layer appears to be very good, so hopefully your editing will be minimal. I've attached the output from pages 20-25. |
|
Advert | |
|
11-06-2018, 01:26 PM | #1608 |
Junior Member
Posts: 4
Karma: 42206
Join Date: Nov 2018
Device: Kindle Paperwhite
|
Hi! First post here, amazing piece of software I have to say I am however having trouble getting tesseract to work? Despite adding all of the files you show as necessary to the tesseract folder, I get the message 'could not find tesseract data'. Any idea what might be the problem?
Thanks |
11-06-2018, 07:48 PM | #1609 | |
Junior Member
Posts: 5
Karma: 42206
Join Date: Nov 2018
Device: kindle paperwhite 3
|
Quote:
|
|
11-06-2018, 07:51 PM | #1610 | |
Junior Member
Posts: 5
Karma: 42206
Join Date: Nov 2018
Device: kindle paperwhite 3
|
Quote:
|
|
Advert | |
|
11-07-2018, 05:41 PM | #1611 |
Junior Member
Posts: 5
Karma: 37936
Join Date: Sep 2018
Device: kindle paperwhite 7th (5.10.1.1)
|
How to fix these small sentences
|
11-07-2018, 08:51 PM | #1612 |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
|
11-08-2018, 04:15 AM | #1613 |
Junior Member
Posts: 5
Karma: 37936
Join Date: Sep 2018
Device: kindle paperwhite 7th (5.10.1.1)
|
Ok my friend
|
11-08-2018, 11:10 PM | #1614 |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Any kind of intelligent formatting that k2pdfopt tries to do will likely not be very successful because of how diverse your source PDF is (some pages are 2 columns, some are not, and lots of pages have specially positioned text relative to a figure), so I'd recommend just a straight cropping of every page into 2 pages (2 columns) using the grid option, even when it will cut a figure in half:
k2pdfopt -grid 2x1 sourcefile.pdf |
11-09-2018, 05:36 AM | #1615 | |
Junior Member
Posts: 4
Karma: 42206
Join Date: Nov 2018
Device: Kindle Paperwhite
|
Quote:
"NOTE! To use the Tesseract OCR engine built into k2pdfopt, you only have to install the Tesseract language training file for your language (see example below for English). You do not need to install the Tesseract engine! You can install multiple language files if you want to be able to OCR documents in different lanugages. " Have I missed something? |
|
11-09-2018, 05:46 PM | #1616 | |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
Last edited by willus; 11-09-2018 at 05:57 PM. |
|
11-12-2018, 09:39 AM | #1617 | |
Junior Member
Posts: 4
Karma: 42206
Join Date: Nov 2018
Device: Kindle Paperwhite
|
Quote:
Yes, I read through all of it, and I followed the instructions exactly to the best of my knowledge. Any idea what might be the problem? Let me know screenshots of anything in particular would be useful |
|
11-14-2018, 09:52 PM | #1618 |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Are you comfortable running things from the command line? (That gives me an idea to put a Tesseract diagnostic into the GUI...)
|
11-18-2018, 05:38 AM | #1619 |
Junior Member
Posts: 7
Karma: 42206
Join Date: Nov 2018
Device: Kindle 8
|
Tesseract 4.0.0 - environment variable cannot find tessdata (mac)
Hello everybody, Willus thank you so much for having taken time to help in this.
Im Mac user, still 10.9, I had to install tesseract via brew. Tesseract version 4.0.0. Folder of tessdata is: /usr/local/Cellar/tesseract/4.0.0/share/tessdata/ Now, I set environment variable as: export TESSDATA_PREFIX=/usr/local/Cellar/tesseract/4.0.0/share/ ( I tried also without last slash: export TESSDATA_PREFIX=/usr/local/Cellar/tesseract/4.0.0/share ) But I keep having error that cannot pick up the tessdata files (I m using command line): Initializing OCR for 4 threads xxxx Could not find Tesseract data (env var TESSDATA_PREFIX = /usr/local/Cellar/tesseract/4.0.0/share/). Using GOCR v0.50. Note tessdata folder contains: configs eng.traineddata osd.traineddata pdf.ttf tessconfigs Maybe a change in the version files from tesseract 3. to 4. ? Or am I mistyping something with env var? As test, I exctracted a tif file from a pdf with ghostscript, run tesseract: tesseract -l eng mypdf.tif mypdf it works. Can you help fix k2pdfopt be able recognise tesseract installation ? |
11-18-2018, 09:40 AM | #1620 |
Junior Member
Posts: 7
Karma: 42206
Join Date: Nov 2018
Device: Kindle 8
|
Tesseract 4.0.0 - environment variable cannot find tessdata (mac) (2)
Hi Willus,
I tried to install tessdata v.3.05 from: https://github.com/tesseract-ocr/tessdata It works, processing now , I ll check result when finish but at least it is working. Could you tell which files I need to keep to process eng language? Would you consider to update to tesseract v.4.0 ? I looked at git repos for k2pdfopt but: - could not compile for I miss header file: k2pdfopt.h - I don't much C neither tesseract to make modification to your wrapper :/ |
Tags |
ebook apps, k5 tools, kindle tools, kindle touch, tools |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Viewing PDFs with another font | Font | PocketBook | 4 | 11-12-2010 08:27 AM |
Viewing Textbook PDFs... | NJReader | enTourage Archive | 4 | 08-17-2010 05:17 PM |
PRS-600 Restart bug while viewing PDFs? | conundrum | Sony Reader | 2 | 03-04-2010 08:46 PM |
More on viewing pdfs | dso371 | Bookeen | 8 | 03-11-2008 07:15 PM |
Viewing Untagged PDFs on Palm T|X | Eroica | Reading and Management | 3 | 12-10-2007 01:44 PM |