Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 04-15-2013, 08:17 PM   #391
CheriePie
Connoisseur
CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.
 
CheriePie's Avatar
 
Posts: 91
Karma: 6020
Join Date: Feb 2009
Location: Silicon Valley, CA
Device: Kindle Voyage, Samsung Galaxy S23+, Galaxy Tab S6
I get an error when trying to use Tesseract OCR engine on the 64-bit windows platform (v1.65). After selecting Tesseract for the OCR choice, I've left all other choices in that selection at their default. The only other change I'm making is the Device settings (d) for Kindle Paperwhite.

So this is the command line I've built:

Selected options:
"C:\Users\Cherie\Documents\My eBooks\Calibre Library\Jesse
Petersen\Club Monstrosity (124)\Club Monstrosity - Jesse Petersen.pdf"
-dev kpw -ocr t -ocrhmax 1.5 -ocrvis s



After hitting enter to begin the conversion, I get the following errors:

Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not find Tesseract data (env var TESSDATA_PREFIX = (not assigned)).
Using GOCR v0.49.

Reading 233 pages from C:\Users\Cherie\Documents\My eBooks\Calibre Library\Jesse
Petersen\Club Monstrosity (124)\Club Monstrosity - Jesse Petersen.pdf ...

Detecting document orientation ... No rotation necessary.

SOURCE PAGE 1 of 233 (7.5 x 9.4 in) ... 0 new pages saved.


And then it stops working completely at page 2, throwing up the standard k2pdfopt.exe has stopped working error dialog from Windows.

I don't get these errors using the Gocr engine, but I guess Tesseract is more accurate so I'd like to try to use that one if possible.
CheriePie is offline   Reply With Quote
Old 04-15-2013, 08:56 PM   #392
gadd
Member
gadd began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2012
Device: Kindle 3
hi,

i have just installed the new version. each time i launch the program and start to convert a pdf file, it crushes. i checked your suggestion and downloaded v. 1.65 for Windows 7 32 Bit. it still doesn't work. previous versions worked in my pc without causing any problem. what should i have to do right now? should i continue using older version?
gadd is offline   Reply With Quote
Advert
Old 04-16-2013, 12:01 AM   #393
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by CheriePie View Post
I get an error when trying to use Tesseract OCR engine on the 64-bit windows platform (v1.65). After selecting Tesseract for the OCR choice...
I was able to replicate this problem on my system, so I'll fix it for the next release. It must be something related to not correctly transitioning from Tesseract to GOCR when the Tesseract language files aren't found. To get Tesseract to work, please see my OCR help page. Or you might try Wallauer's Windows GUI.
willus is offline   Reply With Quote
Old 04-16-2013, 12:10 AM   #394
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by gadd View Post
hi,

i have just installed the new version. each time i launch the program and start to convert a pdf file, it crushes. i checked your suggestion and downloaded v. 1.65 for Windows 7 32 Bit. it still doesn't work. previous versions worked in my pc without causing any problem. what should i have to do right now? should i continue using older version?
After launching it (and specifying a file), type this at the prompt:

-v -debug

...and then press <Enter> twice to start the conversion. Immediately after the crash, take a screen shot by typing <Shift>-<Print Screen> (move the dialog box out of the way of the command terminal). Post the screen shot of the command terminal if you can manage it (or send my a private message if you prefer to take it off this thread). That will help.
willus is offline   Reply With Quote
Old 04-16-2013, 03:59 AM   #395
CheriePie
Connoisseur
CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.
 
CheriePie's Avatar
 
Posts: 91
Karma: 6020
Join Date: Feb 2009
Location: Silicon Valley, CA
Device: Kindle Voyage, Samsung Galaxy S23+, Galaxy Tab S6
Quote:
Originally Posted by willus View Post
I was able to replicate this problem on my system, so I'll fix it for the next release. It must be something related to not correctly transitioning from Tesseract to GOCR when the Tesseract language files aren't found. To get Tesseract to work, please see my OCR help page. Or you might try Wallauer's Windows GUI.
D'oh! I checked like all the other help page files, searched this forum thread, but that's the one page I didn't check. I figured there must've been something else I needed to install but until you pointed me to your OCR page, I didn't find it myself. So thanks for straightening me out.
CheriePie is offline   Reply With Quote
Advert
Old 04-16-2013, 05:48 AM   #396
gadd
Member
gadd began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2012
Device: Kindle 3
hi again,

i am not sure if i got you right. but i try to take a screenshot of the program when windows crashes. you can find the image attached in the post. hope it will work.
Attached Thumbnails
Click image for larger version

Name:	Capture 1.JPG
Views:	399
Size:	55.5 KB
ID:	104470  
gadd is offline   Reply With Quote
Old 04-16-2013, 09:43 AM   #397
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by CheriePie View Post
D'oh! I checked like all the other help page files, searched this forum thread, but that's the one page I didn't check. I figured there must've been something else I needed to install but until you pointed me to your OCR page, I didn't find it myself. So thanks for straightening me out.
It's no trouble. The OCR page is easy to miss. Maybe I'll add it to the FAQ page somehow, but then again, if the FAQ page becomes too long, its utility is reduced. Anyway, I'm glad the solution was so easy. BTW, the

-ocrhmax 1.5 -ocrvis s

options in your command line are both the default values, so you don't need to specify them.
willus is offline   Reply With Quote
Old 04-16-2013, 09:48 AM   #398
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by gadd View Post
hi again,

i am not sure if i got you right. but i try to take a screenshot of the program when windows crashes. you can find the image attached in the post. hope it will work.
That's perfect--very helpful. It looks like the crash might be specific to your document (crashes during processing of page 1). Have you tried more than one document with the new version, or just this one? Can you post the first page of this document (or the whole document, if it's not too large and/or copy-protected)? You can use jpdftweak or pdfsam to extract the first page. (If you do that, be sure that the crash still happens on the single page document.)

You might also try skipping page one--use this command option, for example:

-p 2-


That will convert pages 2 and up.

Last edited by willus; 04-16-2013 at 10:17 AM.
willus is offline   Reply With Quote
Old 04-18-2013, 03:57 AM   #399
CheriePie
Connoisseur
CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.CheriePie got an A in P-Chem.
 
CheriePie's Avatar
 
Posts: 91
Karma: 6020
Join Date: Feb 2009
Location: Silicon Valley, CA
Device: Kindle Voyage, Samsung Galaxy S23+, Galaxy Tab S6
So I'm playing around with the best conversion options for this particular book I received from the publisher in PDF format.

The options that produce the best options for reading on my paperwhite so far are

-mode fw -dev kpw -ls- -mb 0.25 -mt 0.25

However the cover image is now being split across 2 pages so I tried to add -f2p -1 to the command line as well, but that causes all the other pages to lose their "fit to width" formatting. And then some pages, often the ones that are the beginning of a new chapter, are even more compressed. And as far as I can tell, there aren't even any images on any of those other pages (unless the chapter marker is an image instead of text) so I don't know why the addition of the f2p option would cause all the other pages in the PDF to change too.


Also, I read a few pages back that Vanilla was trying to get k2pdfopt to recognize chapter breaks and keep the existing page breaks there. Were you able to accomplish that? I'd love to have that. But if not, I'll try to run the resulting PDF through Calibre to do that.


Finally, I discovered that k2pdfopt doesn't like filenames with parenthesis in the name. I had to remove the parenthesis so that the program wouldn't error out.


Anyhoo, let me know if you want me to PM you the PDF I'm trying to work with. It's copyrighted so I can't post it here. Though maybe you can just figure out what's going on based on my description above... so I'll wait to hear back from you before sending it along.

Thanks!
CheriePie is offline   Reply With Quote
Old 04-18-2013, 11:17 AM   #400
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by CheriePie View Post
So I'm playing around with the best conversion options for this particular book I received from the publisher in PDF format.

The options that produce the best options for reading on my paperwhite so far are

-mode fw -dev kpw -ls- -mb 0.25 -mt 0.25

However the cover image is now being split across 2 pages so I tried to add -f2p -1 to the command line as well, but that causes all the other pages to lose their "fit to width" formatting. And then some pages, often the ones that are the beginning of a new chapter, are even more compressed. And as far as I can tell, there aren't even any images on any of those other pages (unless the chapter marker is an image instead of text) so I don't know why the addition of the f2p option would cause all the other pages in the PDF to change too.


Also, I read a few pages back that Vanilla was trying to get k2pdfopt to recognize chapter breaks and keep the existing page breaks there. Were you able to accomplish that? I'd love to have that. But if not, I'll try to run the resulting PDF through Calibre to do that.


Finally, I discovered that k2pdfopt doesn't like filenames with parenthesis in the name. I had to remove the parenthesis so that the program wouldn't error out.


Anyhoo, let me know if you want me to PM you the PDF I'm trying to work with. It's copyrighted so I can't post it here. Though maybe you can just figure out what's going on based on my description above... so I'll wait to hear back from you before sending it along.

Thanks!
Thanks for reporting these issues. I should be able to fix the parentheses-in-the-file names issue relatively simply.

Right now k2pdfopt doesn't have any smarts to look for chapter tags in a PDF file, if there are such things. I hope to add something like that. What I did was to try and at least put a reasonable gap between places where there is an obvious font size change so that new chapters stand out a little better, but I didn't add a way to put a page break in--I'd need to be certain it was a new chapter and I'm not sure how to do that (yet).

As for the issue with the split figure, k2pdfopt is behaving as designed. The -mode fw command essentially tells it to treat every page region like a single figure (i.e. don't look inside the region except to find the break point where it can split the region across pages), so if you then add -f2p -1 (telling it not to break figures), then you see what you get. I'll have to think about how I want to handle this. It's certainly reasonable (and expected) behavior to do what you wanted. For now, if you're really set on having the cover page not split, you can convert that page with a different command and then merge the outputs together using pdfsam or jpdftweak. Hopefully the next release will offer some smarter behavior.
willus is offline   Reply With Quote
Old 04-19-2013, 03:01 PM   #401
gadd
Member
gadd began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2012
Device: Kindle 3
Quote:
Originally Posted by willus View Post
That's perfect--very helpful. It looks like the crash might be specific to your document (crashes during processing of page 1). Have you tried more than one document with the new version, or just this one? Can you post the first page of this document (or the whole document, if it's not too large and/or copy-protected)? You can use jpdftweak or pdfsam to extract the first page. (If you do that, be sure that the crash still happens on the single page document.)

You might also try skipping page one--use this command option, for example:

-p 2-


That will convert pages 2 and up.
hi,

sorry for this late post. i got that the crash problem was due to the protected pdf file. despite that i removed its protection, the problem still continues. nevertheless, new version works well with other cracked pdf files. thanks once again for your assistance.
gadd is offline   Reply With Quote
Old 04-19-2013, 07:14 PM   #402
danilo93
Member
danilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texan
 
Posts: 11
Karma: 18200
Join Date: Apr 2013
Device: PRESTIGIO PER3464B
Hi! I have PRESTIGIO eBook Reader PER3464B and I have problems with reading scanned books, because the letters are too small. I was hoping your software would help me, but I have some problems...Whenever I try to convert some scanned ebook I get this:



Some of the letters are very big, and some of them are ok...I've noticed that these sentences with big letters are not ok...It seems like they are cut on the wrong places...But those sentences with smaller letters are perfect. Is there any way to make ALL LETTERS be small? I mean, to be equally small and with right sentences which are not cut on a wrong place? Pls, help!
danilo93 is offline   Reply With Quote
Old 04-19-2013, 11:04 PM   #403
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by danilo93 View Post
Hi! I have PRESTIGIO eBook Reader PER3464B and I have problems with reading scanned books, because the letters are too small. I was hoping your software would help me, but I have some problems...Whenever I try to convert some scanned ebook I get this:



Some of the letters are very big, and some of them are ok...I've noticed that these sentences with big letters are not ok...It seems like they are cut on the wrong places...But those sentences with smaller letters are perfect. Is there any way to make ALL LETTERS be small? I mean, to be equally small and with right sentences which are not cut on a wrong place? Pls, help!
Can you please post the source (original) PDF file (or at least a few relevant pages of it)? Are you using any special options to convert, or just the defaults?
willus is offline   Reply With Quote
Old 04-20-2013, 06:31 AM   #404
danilo93
Member
danilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texandanilo93 might easily be mistaken for a Texan
 
Posts: 11
Karma: 18200
Join Date: Apr 2013
Device: PRESTIGIO PER3464B
Quote:
Originally Posted by willus View Post
Can you please post the source (original) PDF file (or at least a few relevant pages of it)? Are you using any special options to convert, or just the defaults?
Oh, sorry! I totally forgot to tell you that part...Yes, I've tried with default options, but it was not good...All the lines were cut on wrong places. Then I tried to do it with this option: "E-reader display pixels per inch 50" and "Input/Source file pixels per inch -2". Then I got this pdf from that picture above. As I said, the smaller sentences are perfect, but I have the problem with big ones. I don't know why. Here is the original pdf file:

Code:
http://www.mediafire.com/view/?60ay8s7q16ikcg1
danilo93 is offline   Reply With Quote
Old 04-20-2013, 09:46 AM   #405
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by danilo93 View Post
Oh, sorry! I totally forgot to tell you that part...Yes, I've tried with default options, but it was not good...All the lines were cut on wrong places. Then I tried to do it with this option: "E-reader display pixels per inch 50" and "Input/Source file pixels per inch -2". Then I got this pdf from that picture above. As I said, the smaller sentences are perfect, but I have the problem with big ones. I don't know why. Here is the original pdf file:

Code:
http://www.mediafire.com/view/?60ay8s7q16ikcg1
Thanks for posting the source document. First off, the page size is 16.6 x 25.4 inches, which I'm guessing should actually be centimeters, so you need to use the document scaling factor to get it to the right size. Secondly, some of the pages are skewed, so you need to turn on the auto-straightening feature. I think with these two adjustments, the result will be satisfactory:

-ds 0.4 (set document scaling factor to 0.4--or select "ds" in the menu and then enter 0.4 for the value).

-as (Turn on auto-straightening--or select "a" in the menu).
willus is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 05:24 AM.


MobileRead.com is a privately owned, operated and funded community.