Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 03-10-2020, 01:39 AM   #1756
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Flumine View Post
I wish I saw your application before, it is really unique of its kind (at least I have not yet found anything better).
Actually, I had a similar idea about a two years ago - to re-flow wide scanned documents on a glyph level to fit into mobile screen, so spent some time with my friend writing an android application. The result we got is working fine but with some limitations - it could not recognize complex formulas or multi-column layout.
Here is a good page sample:
https://slack-files.com/T9YDZ38JY-FUG1J0AKA-40bce696bf
Here is a sample of original page which failed to re-flow properly - with glyphs recoginzed - https://glyphs.flum.app/image?id=448&mode=glyphs.
To recognize glyphs we are using OpenCV library and it mostly works fine but it is hard to get formulas to be recognized as a single image. Your application is working much better with them so I wonder what algorithm you are using for that?
If you look at the comments at the top of the main source file, k2pdfopt.c, it outlines the high level process used by k2pdfopt and points out some of the key C functions. The algorithms for detection are just my own inventions, with a lot of trial and error for what works well and what doesn't. The basic concept is to first look for columnar regions / large blocks of the page by scanning for horizontal and vertical blank (white) areas between the regions, and then to break those columns/regions into rows of text, and then the rows of text into words. The process has given me a deep appreciation for how easily the human brain can visually parse a page (and instantly know "that is text" and "that is an image", etc.) compared to how hard it is to write a reliable algorithm to do the same thing.
willus is offline   Reply With Quote
Old 03-13-2020, 10:58 PM   #1757
kmoll
Member
kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'
 
Posts: 13
Karma: 42646
Join Date: Mar 2020
Device: BQ Cervantes 4
Reflow with selectable text?

Hi, I cannot seem to find an option to get a reflowed output for use with 6 inch ereaders, but which contains selectable text that I can highlight on my ereader.
Is it not possible at all with k2pdfopt? I would be surprised, since Koreader, which uses your code, does exactly that. The documents I read are A4 in size and with one single column, so using "native pdf" and "2 columns" to cut the page in half vertically is not a solution, as each line of text runs horizontally across the page's width.
By the way, thank you so much for creating and maintaining this marvelous tool!!!
Regards
Attached Files
File Type: pdf exemple-traductio-corrige-1-CNED.pdf (162.2 KB, 50 views)

Last edited by pdurrant; 03-17-2020 at 08:32 AM.
kmoll is offline   Reply With Quote
Old 03-14-2020, 09:39 AM   #1758
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by kmoll View Post
Hi, I cannot seem to find an option to get a reflowed output for use with 6 inch ereaders, but which contains selectable text that I can highlight on my ereader.
Is it not possible at all with k2pdfopt? I would be surprised, since Koreader, which uses your code, does exactly that. The documents I read are A4 in size and with one single column, so using "native pdf" and "2 columns" to cut the page in half vertically is not a solution, as each line of text runs horizontally across the page's width.
By the way, thank you so much for creating and maintaining this marvelous tool!!!
Regards
It may depend on your reader. I ran your document with no special options and got the attached. The text is selectable using my PC PDF viewer, SumatraPDF. See the screen shot.
Attached Thumbnails
Click image for larger version

Name:	screenshot.png
Views:	32
Size:	100.9 KB
ID:	177701  
Attached Files
File Type: pdf exemple_k2opt.pdf (4.68 MB, 34 views)

Last edited by pdurrant; 03-17-2020 at 08:33 AM.
willus is offline   Reply With Quote
Old 03-14-2020, 11:56 AM   #1759
vasilas7
Junior Member
vasilas7 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Keyboard
Hello Willus!
Congratulations for your work! I am trying to find the optimal options for k2pdfopt for my Kindle Paperwhite 4, unfortunately I can't.. Whenever I try to convert it, the result is impossible to read.
The pdf is attached, the language is Greek. I know that the pdf isn't the best, but I need it for my work.
Can you please help me?
Thanks in advance!
Attached Files
File Type: pdf 433252301-Όπλα-Μικρόβια-Κα.pdf (12.18 MB, 39 views)
vasilas7 is offline   Reply With Quote
Old 03-14-2020, 07:51 PM   #1760
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by vasilas7 View Post
Hello Willus!
Congratulations for your work! I am trying to find the optimal options for k2pdfopt for my Kindle Paperwhite 4, unfortunately I can't.. Whenever I try to convert it, the result is impossible to read.
The pdf is attached, the language is Greek. I know that the pdf isn't the best, but I need it for my work.
Can you please help me?
Thanks in advance!
Actually, that PDF is very good for processing. It's perfectly straight, consistent from page to page, and very clean. First, I recommend downloading the Greek Tesseract OCR data set and installing it per these instructions. Then you can run one of the following commands depending how large you want the text.

1. Separate each page into two but don't do any text re-flow.
k2pdfopt -grid 2x1 -n- -ocr t -lang grc source.pdf

2. Same but with text re-flow
k2pdfopt -grid 2x1 -fc- -n- -f2p 0 -wrap -ocr t -lang grc source.pdf

3. Even larger text (50% larger with -mag 1.5)
k2pdfopt -grid 2x1 -fc- -n- -f2p 0 -wrap -ocr t -lang grc -mag 1.5 source.pdf

I've attached the results of these three methods for just page 5 of your PDF. You'll notice the text is selectable and searchable, unlike your original.
Attached Files
File Type: pdf book1.pdf (321.1 KB, 43 views)
File Type: pdf book2.pdf (417.6 KB, 34 views)
File Type: pdf book3.pdf (579.6 KB, 38 views)

Last edited by willus; 03-14-2020 at 07:53 PM.
willus is offline   Reply With Quote
Old 03-15-2020, 01:40 AM   #1761
vasilas7
Junior Member
vasilas7 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Keyboard
Quote:
Originally Posted by willus View Post
Actually, that PDF is very good for processing. It's perfectly straight, consistent from page to page, and very clean. First, I recommend downloading the Greek Tesseract OCR data set and installing it per these instructions. Then you can run one of the following commands depending how large you want the text.

1. Separate each page into two but don't do any text re-flow.
k2pdfopt -grid 2x1 -n- -ocr t -lang grc source.pdf

2. Same but with text re-flow
k2pdfopt -grid 2x1 -fc- -n- -f2p 0 -wrap -ocr t -lang grc source.pdf

3. Even larger text (50% larger with -mag 1.5)
k2pdfopt -grid 2x1 -fc- -n- -f2p 0 -wrap -ocr t -lang grc -mag 1.5 source.pdf

I've attached the results of these three methods for just page 5 of your PDF. You'll notice the text is selectable and searchable, unlike your original.
Thank you so much, Willus. If I select to crop the borders, what I have to do? Because in this pdf file the borders are not exactly in the same position.
vasilas7 is offline   Reply With Quote
Old 03-15-2020, 08:55 AM   #1762
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by vasilas7 View Post
Thank you so much, Willus. If I select to crop the borders, what I have to do? Because in this pdf file the borders are not exactly in the same position.
Do you mean crop the document like I've shown here so you don't get the page headers and page numbers (like I show in the first attachment)? Most pages are consistent, so the best way is to cover those pages and just live with the ones that it misses. You can pick specific pages to have different croppings, but that gets pretty involved. The GUI allows you to pick three different page ranges for applying different crop boxes. See attached screen shot. Here, instead of using the -grid 2x1 option, I have selected two crop boxes to apply to all pages.
Attached Thumbnails
Click image for larger version

Name:	screenshot.png
Views:	29
Size:	275.7 KB
ID:	177731   Click image for larger version

Name:	scrshot2.png
Views:	37
Size:	273.6 KB
ID:	177732  
willus is offline   Reply With Quote
Old 03-16-2020, 07:18 PM   #1763
kmoll
Member
kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'
 
Posts: 13
Karma: 42646
Join Date: Mar 2020
Device: BQ Cervantes 4
Quote:
Originally Posted by willus View Post
It may depend on your reader. I ran your document with no special options and got the attached. The text is selectable using my PC PDF viewer, SumatraPDF. See the screen shot.
Yes, indeed, I hadn't noticed that the text is editable... Sorry for making you loose your time...

BTW, I have another problem with a few files like the one attached, where the words are not cut properly. Do you happen to know how to fix this? Many thanks.
Attached Files
File Type: pdf Territoires- _k2opt.pdf (5.27 MB, 24 views)
File Type: pdf Territoires.pdf (1.02 MB, 23 views)
kmoll is offline   Reply With Quote
Old 03-16-2020, 09:07 PM   #1764
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by kmoll View Post
Yes, indeed, I hadn't noticed that the text is editable... Sorry for making you loose your time...

BTW, I have another problem with a few files like the one attached, where the words are not cut properly. Do you happen to know how to fix this? Many thanks.
When I run k2pdfopt v2.51a on your source document, I get the attached--it seems to break the words okay. What version are you running, or what command-line options are you using?
Attached Files
File Type: pdf territoires_k2opt.pdf (3.68 MB, 26 views)
willus is offline   Reply With Quote
Old 03-17-2020, 07:33 AM   #1765
kmoll
Member
kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'
 
Posts: 13
Karma: 42646
Join Date: Mar 2020
Device: BQ Cervantes 4
Quote:
Originally Posted by willus View Post
When I run k2pdfopt v2.51a on your source document, I get the attached--it seems to break the words okay. What version are you running, or what command-line options are you using?
Please find she options in used in the attached image. I am using the same version as you.
Attached Thumbnails
Click image for larger version

Name:	wrong word cutting.png
Views:	35
Size:	213.4 KB
ID:	177789  
kmoll is offline   Reply With Quote
Old 03-17-2020, 09:26 PM   #1766
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by kmoll View Post
Please find she options in used in the attached image. I am using the same version as you.
Check the "Smart line breaks" option. That is on by default. Not sure how you got it turned off.
willus is offline   Reply With Quote
Old 03-18-2020, 06:37 AM   #1767
kmoll
Member
kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'
 
Posts: 13
Karma: 42646
Join Date: Mar 2020
Device: BQ Cervantes 4
Quote:
Originally Posted by willus View Post
Check the "Smart line breaks" option. That is on by default. Not sure how you got it turned off.
That worked, indeed! I must have disabled that when playing with the options without having fully read the command line options detailed explanation.
A minor issue I have is that, in my workflow, I read pdfs on my ereader and annotate (highlight) them, and then continue working on them on my pc, annotating other stuff. The inconvenience I experience when using k2pdfopt is that, once the file is converted, the font looks too big on the pc screen. It would be great if we could somehow restore the original layout of the document to continue working with the file in windows (while not loosing the highlights previously made)... I got spoiled by the "on-the-fly" conversion made in koreader (which I don't use because there are bugs with the highlighting function).
Again, thank you so much for this wonderful tool and for your patient guidance.
kmoll is offline   Reply With Quote
Old 03-18-2020, 09:20 PM   #1768
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by kmoll View Post
That worked, indeed! I must have disabled that when playing with the options without having fully read the command line options detailed explanation.
A minor issue I have is that, in my workflow, I read pdfs on my ereader and annotate (highlight) them, and then continue working on them on my pc, annotating other stuff. The inconvenience I experience when using k2pdfopt is that, once the file is converted, the font looks too big on the pc screen. It would be great if we could somehow restore the original layout of the document to continue working with the file in windows (while not loosing the highlights previously made)... I got spoiled by the "on-the-fly" conversion made in koreader (which I don't use because there are bugs with the highlighting function).
Again, thank you so much for this wonderful tool and for your patient guidance.
Sorry--not much I can do to help you with the highlighting issue. There's not really a way to do what you suggest above with k2pdfopt.
willus is offline   Reply With Quote
Old 03-20-2020, 12:49 PM   #1769
kmoll
Member
kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'kmoll understands when you whisper 'The dog barks at midnight.'
 
Posts: 13
Karma: 42646
Join Date: Mar 2020
Device: BQ Cervantes 4
Quote:
Originally Posted by willus View Post
Sorry--not much I can do to help you with the highlighting issue. There's not really a way to do what you suggest above with k2pdfopt.
OK, nevermind, thank you for your answer, I realized I can reduce the font by setting a fixed font size or by downgrading the resolution.
However, I noticed that, when trying to select more than one word, in some places, the whole document gets selected, like in the attached image. Even in the version you posted previously of this file, it also happens, for example when trying to select several words from the word "aboutit", line 10, onwards. This really is problematic because I can't highlight words...

Another question : is it possible to reduce the space between words? This time, to avoid asking a question which is already answered, I have looked carefully at the command line options documentation, but haven't seen anything related. As you can see in the attached image, the space between words is sometimes very wide... I have tried solving this both by setting a fixed font size and by playing with the resolution, to no avail.
Attached Thumbnails
Click image for larger version

Name:	problem-highlighting.png
Views:	23
Size:	298.6 KB
ID:	177837  
kmoll is offline   Reply With Quote
Old 03-20-2020, 08:21 PM   #1770
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,118
Karma: 8504331
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by kmoll View Post
OK, nevermind, thank you for your answer, I realized I can reduce the font by setting a fixed font size or by downgrading the resolution.
However, I noticed that, when trying to select more than one word, in some places, the whole document gets selected, like in the attached image. Even in the version you posted previously of this file, it also happens, for example when trying to select several words from the word "aboutit", line 10, onwards. This really is problematic because I can't highlight words...

Another question : is it possible to reduce the space between words? This time, to avoid asking a question which is already answered, I have looked carefully at the command line options documentation, but haven't seen anything related. As you can see in the attached image, the space between words is sometimes very wide... I have tried solving this both by setting a fixed font size and by playing with the resolution, to no avail.
I sent you a private message. The highlighting thing is a known bug. I'm working on a bug-fix release, but it's slow. To get the words closer together, you need to turn off full justification:

k2pdfopt -j 0- ... (that's a zero, not an 'O').
willus is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 06:46 PM.


MobileRead.com is a privately owned, operated and funded community.