Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 02-13-2020, 09:13 AM   #1741
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by famfam View Post
Is tesseract integrated in the k2pdfopt-gui and if so version 3 or version 4? Do the traindata files have to be version 3 or version 4? In which folder must the traindata files be in Windows 10: 'Programs' (for 64 bit) or 'Programs (x86)' (for 32 bit)?
In my k2pdfopt-gui I get the error message,

Initializing OCR for 2 threads x x
Could not find Tesseract data (env var TESSDATA_PREFIX = (not assigned)).
Using GOCR v0.50.

What am I doing wrong?
Is my entry in the input window 'Env. var: TESSDATA_PREFIX = c: \ program files \ tesseract-ocr \ tessdata 'not correct?
http://willus.com/k2pdfopt/help/ocr.shtml

Though looking at it now, I see this help page needs some updating. The latest version uses Tesseract 4.0 and can use either Tesseract 3 or 4 training files (I recommend the v4 training files). See also the command-line help page (search for “Tesseract”).

Also, in the GUI “help” menu, select “Tesseract Training File Info”.

Last edited by willus; 02-13-2020 at 09:20 AM.
willus is offline   Reply With Quote
Old 02-15-2020, 07:31 AM   #1742
slide13
Junior Member
slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'
 
Posts: 3
Karma: 42646
Join Date: Feb 2020
Device: Kindle Paperwhite 2
Hello there!
I am trying to find the optimal options for k2pdfopt for my Kindle Paperwhite 2, unfortunately I can't..
The pdf is attached, the language is Greek and after some search, I have used the following options: -mode fp -dev kp2.
Can you please help me?
Thank you very much for your time!
Attached Files
File Type: pdf 04. JO NESBO - ΝΕΜΕΣΙΣ.pdf (15.01 MB, 201 views)
slide13 is offline   Reply With Quote
Advert
Old 02-15-2020, 05:13 PM   #1743
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by slide13 View Post
Hello there!
I am trying to find the optimal options for k2pdfopt for my Kindle Paperwhite 2, unfortunately I can't..
The pdf is attached, the language is Greek and after some search, I have used the following options: -mode fp -dev kp2.
Can you please help me?
Thank you very much for your time!
What do you not like about the conversion? Is the text too small?
Option 1:
k2pdfopt -mode fw -ls- -dev kp2 myfile.pdf
If the text is too small for you, then:
Option 2:
k2pdfopt -dev kp2 myfile.pdf
willus is offline   Reply With Quote
Old 02-15-2020, 06:47 PM   #1744
slide13
Junior Member
slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'
 
Posts: 3
Karma: 42646
Join Date: Feb 2020
Device: Kindle Paperwhite 2
Quote:
Originally Posted by willus View Post
What do you not like about the conversion? Is the text too small?
Option 1:
k2pdfopt -mode fw -ls- -dev kp2 myfile.pdf
If the text is too small for you, then:
Option 2:
k2pdfopt -dev kp2 myfile.pdf
Thank you so much for your immediate response and support!
Yes, the text was too small, the 2nd option helped me, even though the letters are somehow "dirty" now, and the chapters don't start from a new page, or the header/footer seem to be put in the middle of the page (screenshots attached).
But it's ok, it is more than readable, thank you very much dear willus!
Your program rocks and you're so helpful!
Congratulations!
Attached Thumbnails
Click image for larger version

Name:	screenshot_2020_02_16T01_40_45+0200.png
Views:	202
Size:	69.4 KB
ID:	177165   Click image for larger version

Name:	screenshot_2020_02_16T01_41_14+0200.png
Views:	181
Size:	61.0 KB
ID:	177166   Click image for larger version

Name:	screenshot_2020_02_16T01_41_47+0200.png
Views:	192
Size:	49.5 KB
ID:	177167  
slide13 is offline   Reply With Quote
Old 02-16-2020, 06:19 PM   #1745
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by slide13 View Post
Thank you so much for your immediate response and support!
Yes, the text was too small, the 2nd option helped me, even though the letters are somehow "dirty" now, and the chapters don't start from a new page, or the header/footer seem to be put in the middle of the page (screenshots attached).
But it's ok, it is more than readable, thank you very much dear willus!
Your program rocks and you're so helpful!
Congratulations!
If the page numbers and headings bother you, you can crop them out pretty reliably like so for your particular document:

k2pdfopt -dev kp2 -m 0.29in,0.73in,0.32in,0.33in nesbo.pdf

The "dirty" look is not really the fault of k2pdfopt. Your source file, if you magnify it enough, has a very pixelated look to it.
willus is offline   Reply With Quote
Advert
Old 02-17-2020, 06:01 PM   #1746
slide13
Junior Member
slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'slide13 understands when you whisper 'The dog barks at midnight.'
 
Posts: 3
Karma: 42646
Join Date: Feb 2020
Device: Kindle Paperwhite 2
It is perfect!
Thank you so much, it's unbelievable that you respond to each one of us separately..
I really really appreciate it, have a nice day!
slide13 is offline   Reply With Quote
Old 03-01-2020, 06:18 AM   #1747
meem
A Reader who can think..!
meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.meem lived happily ever after.
 
Posts: 257
Karma: 108298
Join Date: Jul 2010
Location: Earth Planet
Device: Kindle 3 WiFi - Kindle DX (B004)
Very hard to use.
So slow to preview the result on good PCs.
I can not get a correct result for a simple two-column book.
meem is offline   Reply With Quote
Old 03-01-2020, 02:56 PM   #1748
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by meem View Post
Very hard to use.
So slow to preview the result on good PCs.
I can not get a correct result for a simple two-column book.
Did you use the help pages/videos at all? Did you try 2-column mode? Sometimes the compression type of the PDF can slow things down for previewing. JPX images are CPU intensive. OCR can also be CPU intensive.

Can you post or PM me a sample of your source PDF?
willus is offline   Reply With Quote
Old 03-07-2020, 11:45 AM   #1749
pandaeye
Member
pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'
 
Posts: 12
Karma: 42646
Join Date: Feb 2020
Device: none
Hello, I am very new to this app but it helped do what I expected, almost. Please do not mind if I am asking stupid question.

As far as I understand, text reflow cannot be done together with native pdf, am I right? I am curious why they cannot be done together. Because I am converting a book which is mostly regularly formatted text, converting it to bitmap does seem silly and a very big file indeed. Is there any way I can do to do the text reflow while keeping a small file size?
pandaeye is offline   Reply With Quote
Old 03-07-2020, 11:54 AM   #1750
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
You need to read PDF#Reflow. You should also know that there are two types of PDF documents. One has text while the other has images of text. A PDF that has image of text must use an OCR to extract the text from an image.

Dale
DaleDe is offline   Reply With Quote
Old 03-07-2020, 01:48 PM   #1751
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by pandaeye View Post
Hello, I am very new to this app but it helped do what I expected, almost. Please do not mind if I am asking stupid question.

As far as I understand, text reflow cannot be done together with native pdf, am I right? I am curious why they cannot be done together. Because I am converting a book which is mostly regularly formatted text, converting it to bitmap does seem silly and a very big file indeed. Is there any way I can do to do the text reflow while keeping a small file size?
I agree 100%. It does seem silly. But k2pdfopt is not the right app to reflow text from a "native" / non-scanned PDF. At its core, k2pdfopt is an image processing program, and it simply rearranges "crop boxes" from the source PDF into new positions. If those crop boxes are large--one or two per page--it can spit out a converted PDF by simply adding a few cropping instructions into the source PDF file--effectively telling it to display its contents in a modified way. But if those crop boxes are numerous and small--i.e. one for each word or row of text, which is how k2pdfopt does reflow--then this method doesn't work well. The resultant PDF won't display well in most readers. It's better to bitmap it.

If you have access to MS Word, I recommend you try loading your PDF into Word. See my PDF conversion tips.

Last edited by willus; 03-07-2020 at 01:50 PM.
willus is offline   Reply With Quote
Old 03-08-2020, 01:00 AM   #1752
pandaeye
Member
pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'pandaeye understands when you whisper 'The dog barks at midnight.'
 
Posts: 12
Karma: 42646
Join Date: Feb 2020
Device: none
Quote:
Originally Posted by willus View Post
I agree 100%. It does seem silly. But k2pdfopt is not the right app to reflow text from a "native" / non-scanned PDF. At its core, k2pdfopt is an image processing program, and it simply rearranges "crop boxes" from the source PDF into new positions. If those crop boxes are large--one or two per page--it can spit out a converted PDF by simply adding a few cropping instructions into the source PDF file--effectively telling it to display its contents in a modified way. But if those crop boxes are numerous and small--i.e. one for each word or row of text, which is how k2pdfopt does reflow--then this method doesn't work well. The resultant PDF won't display well in most readers. It's better to bitmap it.

If you have access to MS Word, I recommend you try loading your PDF into Word. See my PDF conversion tips.
Thank you very much. I think I know more about your work now. I am very satisfied with it.
pandaeye is offline   Reply With Quote
Old 03-09-2020, 05:11 AM   #1753
Flumine
Junior Member
Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'Flumine understands when you whisper 'The dog barks at midnight.'
 
Posts: 1
Karma: 42646
Join Date: Mar 2020
Device: Kindle4, Xiaomi Redmi 7
I wish I saw your application before, it is really unique of its kind (at least I have not yet found anything better).
Actually, I had a similar idea about a two years ago - to re-flow wide scanned documents on a glyph level to fit into mobile screen, so spent some time with my friend writing an android application. The result we got is working fine but with some limitations - it could not recognize complex formulas or multi-column layout.
Here is a good page sample:
https://slack-files.com/T9YDZ38JY-FUG1J0AKA-40bce696bf
Here is a sample of original page which failed to re-flow properly - with glyphs recoginzed - https://glyphs.flum.app/image?id=448&mode=glyphs.
To recognize glyphs we are using OpenCV library and it mostly works fine but it is hard to get formulas to be recognized as a single image. Your application is working much better with them so I wonder what algorithm you are using for that?
Flumine is offline   Reply With Quote
Old 03-10-2020, 12:49 AM   #1754
solanoctes
Junior Member
solanoctes began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2020
Device: Kindle paperwhite 10th gen
Settings for the new paperwhite

What settings (screen size,font,margins.etc) should i use for the paperwhite 10th edition? I tried using the paperwhite 4 prest but the font was too big (like the bottom of the title page spilled over to the next page.
solanoctes is offline   Reply With Quote
Old 03-10-2020, 01:39 AM   #1755
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Flumine View Post
I wish I saw your application before, it is really unique of its kind (at least I have not yet found anything better).
Actually, I had a similar idea about a two years ago - to re-flow wide scanned documents on a glyph level to fit into mobile screen, so spent some time with my friend writing an android application. The result we got is working fine but with some limitations - it could not recognize complex formulas or multi-column layout.
Here is a good page sample:
https://slack-files.com/T9YDZ38JY-FUG1J0AKA-40bce696bf
Here is a sample of original page which failed to re-flow properly - with glyphs recoginzed - https://glyphs.flum.app/image?id=448&mode=glyphs.
To recognize glyphs we are using OpenCV library and it mostly works fine but it is hard to get formulas to be recognized as a single image. Your application is working much better with them so I wonder what algorithm you are using for that?
If you look at the comments at the top of the main source file, k2pdfopt.c, it outlines the high level process used by k2pdfopt and points out some of the key C functions. The algorithms for detection are just my own inventions, with a lot of trial and error for what works well and what doesn't. The basic concept is to first look for columnar regions / large blocks of the page by scanning for horizontal and vertical blank (white) areas between the regions, and then to break those columns/regions into rows of text, and then the rows of text into words. The process has given me a deep appreciation for how easily the human brain can visually parse a page (and instantly know "that is text" and "that is an image", etc.) compared to how hard it is to write a reliable algorithm to do the same thing.
willus is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 09:33 AM.


MobileRead.com is a privately owned, operated and funded community.