Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 11-18-2018, 03:30 PM   #1621
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,015
Karma: 8416109
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by gg4u View Post
Hi Willus,

I tried to install tessdata v.3.05 from:

https://github.com/tesseract-ocr/tessdata

It works, processing now , I ll check result when finish but at least it is working.

Could you tell which files I need to keep to process eng language?

Would you consider to update to tesseract v.4.0 ?

I looked at git repos for k2pdfopt but:
- could not compile for I miss header file: k2pdfopt.h
- I don't much C neither tesseract to make modification to your wrapper :/
I am hoping to eventually compile w/Tesseract 4.0.0. It was just officially released only three weeks ago (Oct 29, 2018). I don't recommend trying to build k2pdfopt yourself unless you are pretty adventurous. It has a lot of dependencies.

For Tesseract 3.0.5, you need these files in your data folder:

eng.cube.params
eng.cube.nn
eng.cube.bigrams
eng.cube.lm
eng.tesseract_cube.nn
eng.cube.word-freq
eng.cube.size
eng.cube.fold
eng.traineddata
willus is offline   Reply With Quote
Old 11-19-2018, 10:53 AM   #1622
gg4u
Junior Member
gg4u began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2018
Device: Kindle 8
oh thank you Willus,

keeping eng.file only will free up some space on disk.

Would you suggest hpw to make best use of k2pdfopt ?

I'd like to reflow a pdf - of scanned images - in a epub containins figures, and chapters.

k2pdfopt seems to detect where images are, I processed the original pdf into OCRed version, and characters are blurred.

I tried to make comparison by using ghostscript and tesseract:
from pdf to tiff, from tiff to txt.

Here, results where quite good but I miss all the figures and markup for chapters.

As final result for written text, I would like to have epub or mobi (sharp rendering of chars) , not pdf , but yet with the figures - and TOC .

Maybe is there another file but txt, that tessearct export to and that will keep images (RTF)?

I could eventually manually mark the TOC - which is correct markup?

What kind of steps should I take to convert pdf in epub containing images and markup ?

I also shared this thread https://www.mobileread.com/forums/sh...d.php?t=312652


Can I also ask you how you approached the problem to be able detect figures in PDF - interested in problem solving
gg4u is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 09:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 06:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 09:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 08:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 02:44 PM


All times are GMT -4. The time now is 10:00 AM.


MobileRead.com is a privately owned, operated and funded community.