Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2018, 09:04 AM   #1516
MarjaE
Addict
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 348
Karma: 1548692
Join Date: Jun 2015
Device: Iriver Story HD and Amazon Kindle DX
Screen shots of what?

I don't know what parts of the conversion attempts you want shots of. If I knew what you thought they might show, I might be able to figure that out.

If I try to run ocr from uncustomized k2pdfopt, it can't find the environment variable TESSDATA_PREFIX, and can't use Tesseract.

If I run from customized k2pdfopt_copy or k2pdfopt_dx, it can. k2pdf_copy uses -mode copy to avoid unnecessary compression. k2pdf_dx uses -mode copy -dev dx.

I have tried -ocr -ocrlang rus in k2pdfopt_copy. I may get a message stating:

Initializing OCR for 2 threads ..
Tesseract Open Source OCR Engine v3.05.00 [CUBE+] (lang=rus)
Reading 443 pages from

... but get no ocr at the end.

I am currently comparing my results with K2 with results with Elucidate, but in the long run, I can't combine K2 with Elucidate.

Last edited by MarjaE; 01-25-2018 at 09:15 AM.
MarjaE is offline   Reply With Quote
Advert
Old 01-27-2018, 11:47 AM   #1517
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 981
Karma: 7562649
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by MarjaE View Post
Screen shots of what?
Something like I've attached. In one graphic, I can tell what options you are running, what version you are using, if OCR started correctly, what size source pages you are converting, how many OCR words were found, and how much CPU was used. It is very useful.

Quote:
Originally Posted by MarjaE View Post
I have tried -ocr -ocrlang rus in k2pdfopt_copy. I may get a message stating:

Initializing OCR for 2 threads ..
Tesseract Open Source OCR Engine v3.05.00 [CUBE+] (lang=rus)
Reading 443 pages from

... but get no ocr at the end...
Please tell me what it says at the end of the conversion (see circled area in my screenshot). Also, would you please convert a relatively small number of pages from your example (maybe 20) and post the converted result that has "no ocr"?
Attached Thumbnails
Click image for larger version

Name:	screenshot.png
Views:	70
Size:	31.6 KB
ID:	161869  
willus is offline   Reply With Quote
Old 02-06-2018, 09:54 AM   #1518
josorio
Junior Member
josorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five words
 
Posts: 3
Karma: 37928
Join Date: Feb 2018
Device: Kindle Paperwhite & Moto X 2nd gen
Question Break 2-column page in exactly 4 pages

Hi, is there a way to force k2pdfopt to break each of my 2-column pages in exactly four small pages?
I will read mainly on a Kindle Paperwhite, but sometimes I'll read the same output in my Moto X 2014 (1080x1920, 424DPI). I've got a neat conversion, with no margins and using native PDF output, but now I want more text in each page, at the expense of smaller letters. Also, I want the footnotes to be always at the foot of a page.
Thanks for any clue.
josorio is offline   Reply With Quote
Old 02-06-2018, 04:54 PM   #1519
josorio
Junior Member
josorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five words
 
Posts: 3
Karma: 37928
Join Date: Feb 2018
Device: Kindle Paperwhite & Moto X 2nd gen
Quote:
Originally Posted by josorio View Post
Hi, is there a way to force k2pdfopt to break each of my 2-column pages in exactly four small pages?
I will read mainly on a Kindle Paperwhite, but sometimes I'll read the same output in my Moto X 2014 (1080x1920, 424DPI). I've got a neat conversion, with no margins and using native PDF output, but now I want more text in each page, at the expense of smaller letters. Also, I want the footnotes to be always at the foot of a page.
Thanks for any clue.
I found a way:
- Conversion mode 2-column. Set Native PDF output.
- Set one crop area around each column.
- Set the Device as Kindle Voyage.
- Set -bp m in additional options.
- Set Width to 758 px, the same as Kindle Paperwhite
- Set DPI to 300 (I guess this is not important)
- Tweak the Height looking at the bottom of the even pages. It should occupy the whole page without throwing any line to the next page. In my case 1220 was a good value.
Any easier option?
josorio is offline   Reply With Quote
Old 02-06-2018, 09:48 PM   #1520
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 981
Karma: 7562649
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by josorio View Post
I found a way:
- Conversion mode 2-column. Set Native PDF output.
- Set one crop area around each column.
- Set the Device as Kindle Voyage.
- Set -bp m in additional options.
- Set Width to 758 px, the same as Kindle Paperwhite
- Set DPI to 300 (I guess this is not important)
- Tweak the Height looking at the bottom of the even pages. It should occupy the whole page without throwing any line to the next page. In my case 1220 was a good value.
Any easier option?
Maybe just use the -grid option:

k2pdfopt -grid 2x2x5 myfile.pdf

This will parse myfile.pdf into a 2 x 2 grid, with an output page for each square in the grid. The "5" in the argument above specifies 5% overlap for the grid squares.
willus is offline   Reply With Quote
Old 02-08-2018, 04:50 PM   #1521
josorio
Junior Member
josorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five wordsjosorio can name that ebook in five words
 
Posts: 3
Karma: 37928
Join Date: Feb 2018
Device: Kindle Paperwhite & Moto X 2nd gen
Thumbs up Break 2-column page in exactly 4 pages

Excellent, Willus! -grid 2x2x0.2 was nice for me. I had to abandon my original crop areas and replace them with ignore areas. Thanks a lot.
josorio is offline   Reply With Quote
Old 02-11-2018, 06:20 AM   #1522
axet
Junior Member
axet began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2018
Device: android phone
Few source code issues:

willus/linux.h

Code:
-#include <sys/termios.h>
+#include <termios.h>
termios unused, can be dropped, not found while compiling for android.

k2pdfopt/k2master.c

Code:
#if HAVE_LEPTONICA_LIB
        wlept_bmp_dewarp(dwbmp,src,srcgrey,white,k2settings->dewarp,
                         k2settings->debug?"k2opt_dewarp_model.pdf":NULL);
#endif
Missing HAVE_LEPTONICA_LIB if/def, not compiling if leptionica is not compiled. Not sure tho, if this is correct replacement, seems like this call should be replaced with equivalent one.

Code:
-    if (k2settings->ocr_max_columns==2 || k2settings->max_columns>1)
+    if (k2settings->max_columns==2 || k2settings->max_columns>1)
missing 'ocr_max_columns' member. replace with 'max_columns'
axet is offline   Reply With Quote
Old 03-01-2018, 08:55 PM   #1523
yxmr
Junior Member
yxmr began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2018
Device: none
K2pdfopt is a magic software, very good.
I found a problem with http://willus.com/k2pdfopt/examples/size/bugs.pdf

-m 0 -bpc 8 -c -p 2
Use the above parameters

Color image brightness reduction and the color of the image is reddened.
The details of the image are not clear, especially in the darker parts.

How to solve?
Attached Thumbnails
Click image for larger version

Name:	Original.jpg
Views:	46
Size:	191.6 KB
ID:	162576   Click image for larger version

Name:	After treatment.jpg
Views:	46
Size:	140.1 KB
ID:	162577  
yxmr is offline   Reply With Quote
Old 03-04-2018, 07:53 PM   #1524
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 981
Karma: 7562649
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by yxmr View Post
K2pdfopt is a magic software, very good.
I found a problem with http://willus.com/k2pdfopt/examples/size/bugs.pdf

-m 0 -bpc 8 -c -p 2
Use the above parameters

Color image brightness reduction and the color of the image is reddened.
The details of the image are not clear, especially in the darker parts.

How to solve?
Turn off gamma correction and contrast adjustment:

-g 1 -cmax 1
willus is offline   Reply With Quote
Old 03-04-2018, 10:16 PM   #1525
konino
Junior Member
konino began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2017
Device: kobo aura 2011 with koreader
Hello sir or madam.You have done some great tool out there .

is it possible to convert pdf/djvu and give output djvu or images?
konino is offline   Reply With Quote
Old 03-05-2018, 08:25 AM   #1526
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 981
Karma: 7562649
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by konino View Post
Hello sir or madam.You have done some great tool out there .

is it possible to convert pdf/djvu and give output djvu or images?
Sorry--k2pdfopt only generates output files in pdf or bitmap formats. You will need to use some other tool to convert that to djvu.
willus is offline   Reply With Quote
Old 03-12-2018, 08:19 PM   #1527
MarjaE
Addict
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 348
Karma: 1548692
Join Date: Jun 2015
Device: Iriver Story HD and Amazon Kindle DX
I've been experimenting with different ocr tools: the built-in ocr in k2pdfopt, Elucidate, and ocrmypdf.

All these implement Tesseract. But the k2pdfopt version often misses text which the other versions convert.

Unfortunately, ocring in either Elucidate, or ocrmypdf; and then converting in either k2pdfopt, or Ghostscript; often leads to an unreadable mess.

Is there any way to ocr and convert in k2pdfopt, while getting the ocr quality of the other ones which implement Tesseract? After setting up the tessadata folder, is it just a matter of downloading from tessdata-best, instead of just tessdata?
MarjaE is offline   Reply With Quote
Old 03-12-2018, 10:10 PM   #1528
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 981
Karma: 7562649
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by MarjaE View Post
I've been experimenting with different ocr tools: the built-in ocr in k2pdfopt, Elucidate, and ocrmypdf.

All these implement Tesseract. But the k2pdfopt version often misses text which the other versions convert.

Unfortunately, ocring in either Elucidate, or ocrmypdf; and then converting in either k2pdfopt, or Ghostscript; often leads to an unreadable mess.

Is there any way to ocr and convert in k2pdfopt, while getting the ocr quality of the other ones which implement Tesseract? After setting up the tessadata folder, is it just a matter of downloading from tessdata-best, instead of just tessdata?
The issue is that k2pdfopt uses its own algorithms to find words in the document, and then it passes only single words to Tesseract for OCR. The other two programs, I'm guessing, use Tesseract's own algorithms to find the words in the document. Presently k2pdfopt does not have a way to use Tesseract's word-finding algorithms, so I'd think your best bet would be to use the other programs to do the OCR first and then process the OCR'd result with k2pdfopt (which you said gives you an unreadable mess). It would help if you could post a file that you OCR'd with elucidate or ocrmypdf so I could try out k2pdfopt on it myself. I presume that, as before, you are working with Russian (Cyrillic) documents?

Edit: Please run Elucidate or ocrmypdf on the attached document and post the resulting PDF.
Attached Files
File Type: pdf cyrillic2.pdf (3.08 MB, 15 views)
File Type: pdf cyrillic2_corrected_page2.pdf (3.08 MB, 16 views)

Last edited by willus; 03-17-2018 at 12:26 PM. Reason: Added corrected attachment
willus is offline   Reply With Quote
Old 03-13-2018, 10:03 AM   #1529
MarjaE
Addict
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 348
Karma: 1548692
Join Date: Jun 2015
Device: Iriver Story HD and Amazon Kindle DX
Sorry, got mixed up. Had decent results with E+K2 and O+K2, mixed and sometimes terrible results with E+GS and O+GS.

Examples include:

E+K2 -mode copy -dev dx

and

E+GS -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%sstderr -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$outputfile" "$f"

where "$f" is the input filename.
Attached Files
File Type: pdf cyrillic2 - Converted_k2opt.pdf (469.6 KB, 17 views)
File Type: pdf cyrillic2 - Converted-converted.pdf (2.46 MB, 18 views)
MarjaE is offline   Reply With Quote
Old 03-13-2018, 08:33 PM   #1530
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 981
Karma: 7562649
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by MarjaE View Post
Sorry, got mixed up. Had decent results with E+K2 and O+K2, mixed and sometimes terrible results with E+GS and O+GS.

Examples include:

E+K2 -mode copy -dev dx

and

E+GS -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%sstderr -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$outputfile" "$f"

where "$f" is the input filename.
Can you please just process my attachment with Elucidate only and post that?
willus is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 04:11 AM.


MobileRead.com is a privately owned, operated and funded community.