Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 03-13-2018, 09:40 PM   #1531
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Sorry.
Attached Files
File Type: pdf cyrillic2 - Converted.pdf (3.54 MB, 279 views)
MarjaE is offline   Reply With Quote
Old 03-17-2018, 03:49 PM   #1532
xilopaint
Junior Member
xilopaint began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2018
Device: none
Hello willus!

I have been trying to follow the instructions to build k2pdfopt in macOS, but I'm getting these errors:

Quote:
$ gcc -Ofast -Wall -m64 -o k2pdfopt.o -c k2pdfopt.cg
clang: error: no such file or directory: 'k2pdfopt.cg'
clang: error: no input files
My intention is trying to build a lightweight version of k2pdfopt for my Alfred workflow. I need only four command line options: "-as", "-mode copy", "-dpi" and "-o".

Last edited by xilopaint; 03-17-2018 at 03:52 PM.
xilopaint is offline   Reply With Quote
Old 03-18-2018, 02:25 PM   #1533
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by xilopaint View Post
Hello willus!

I have been trying to follow the instructions to build k2pdfopt in macOS, but I'm getting these errors:



My intention is trying to build a lightweight version of k2pdfopt for my Alfred workflow. I need only four command line options: "-as", "-mode copy", "-dpi" and "-o".
What instructions are you trying to follow? The instructions I include with the source code say this:
Code:
Build Steps on OS/X (64-bit, gcc 6.2.0, compiled on OSX 10.12 Sierra)
----------------------------------------------------------------------
1. gcc -Ofast -Wall -m64 -o k2pdfopt.o -c k2pdfopt.c

2. g++ -Ofast -m64 -o k2pdfopt k2pdfopt.o -static-libgcc -static-libstdc++ -lk2pdfopt -lwillus -lgocr -ltesseract -lleptonica -ldjvu -lmupdf -lfreetype -ljbig2 -ljpeglib -lopenjpeg -lpng -lzlib -lpthread
The trick is going to be getting all of the libraries to compile. Honestly, I don't think this is worth your effort--you would have to substantially modify the code to really strip out everything but those options. Why not just use the OSX binary? Is the extra size really a problem?
willus is offline   Reply With Quote
Old 03-18-2018, 08:47 PM   #1534
xilopaint
Junior Member
xilopaint began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2018
Device: none
Quote:
Originally Posted by willus View Post
What instructions are you trying to follow? The instructions I include with the source code say this:
Code:
Build Steps on OS/X (64-bit, gcc 6.2.0, compiled on OSX 10.12 Sierra)
----------------------------------------------------------------------
1. gcc -Ofast -Wall -m64 -o k2pdfopt.o -c k2pdfopt.c

2. g++ -Ofast -m64 -o k2pdfopt k2pdfopt.o -static-libgcc -static-libstdc++ -lk2pdfopt -lwillus -lgocr -ltesseract -lleptonica -ldjvu -lmupdf -lfreetype -ljbig2 -ljpeglib -lopenjpeg -lpng -lzlib -lpthread
The trick is going to be getting all of the libraries to compile. Honestly, I don't think this is worth your effort--you would have to substantially modify the code to really strip out everything but those options. Why not just use the OSX binary? Is the extra size really a problem?
Oh, sorry! I didn't realise I had included a letter "g" at the end of the command. Now I'm getting other error:

Quote:
$ gcc -Ofast -Wall -m64 -o k2pdfopt.o -c k2pdfopt.c
k2pdfopt.c:76:10: fatal error: 'k2pdfopt.h' file not found
#include <k2pdfopt.h>
^~~~~~~~~~~~
1 error generated.
k2pdfopt.h is not in the same path of k2pdfopt.c.

Last edited by xilopaint; 03-18-2018 at 09:03 PM.
xilopaint is offline   Reply With Quote
Old 03-27-2018, 05:59 PM   #1535
pgodz
Junior Member
pgodz began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2018
Device: Kindle Paperwhite
Hey! I'm trying to preserve the original layout but bitmap the image and the text. Is it possible to disable scaling?

Thanks!
pgodz is offline   Reply With Quote
Old 03-28-2018, 08:44 AM   #1536
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by pgodz View Post
Hey! I'm trying to preserve the original layout but bitmap the image and the text. Is it possible to disable scaling?

Thanks!
This is exactly what "-mode copy" is for. Set the output page dpi with -odpi. E.g.
k2pdfopt -mode copy -odpi 300 myfile.pdf
...will bitmap each page at 300 dpi and store in the output file.
willus is offline   Reply With Quote
Old 04-19-2018, 08:07 PM   #1537
Ramo
Enthusiast
Ramo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five words
 
Posts: 25
Karma: 37930
Join Date: Mar 2018
Device: Kobo TouchC
How to discard the hidden text?

My Kobo Touch C is having problems reading a very big pdf file - 1466 pages - that also has hidden text. I think that just removing the hidden text would solve my problem. I was able to do this once, but only with a subset of the file that I set u to test it but now can not repeat the results.

Could some one help me show me the options to mantaing everything - size, dpi, color etc. - but just get rid of the hidden text?
Ramo is offline   Reply With Quote
Old 04-19-2018, 11:08 PM   #1538
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Ramo View Post
My Kobo Touch C is having problems reading a very big pdf file - 1466 pages - that also has hidden text. I think that just removing the hidden text would solve my problem. I was able to do this once, but only with a subset of the file that I set u to test it but now can not repeat the results.

Could some one help me show me the options to mantaing everything - size, dpi, color etc. - but just get rid of the hidden text?
There's not a simple option to do this--to just strip hidden text--with k2pdfopt. What you have to do is use -mode copy, but you'll want to set your DPI and color depth to try and match what you have now. You can see information about how the PDF you have is encoded with the -i option, e.g.

k2pdfopt -i myfile.pdf

You can then select options to match, e.g.

k2pdfopt -mode copy -odpi 200 -c -g 1 -sh- -cmax 1 -ocr- myfile.pdf

This copies the dimensions, sets bitmap DPI to 200, turn on color output, sets gamma to 1 (no change), turns off sharpening, turns off contrast adjust, and turns off OCR (no hidden layer). You can try just a few pages of conversion by adding: -p 1-10 (convert the first 10 pages only).
willus is offline   Reply With Quote
Old 04-20-2018, 10:19 AM   #1539
Ramo
Enthusiast
Ramo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five words
 
Posts: 25
Karma: 37930
Join Date: Mar 2018
Device: Kobo TouchC
Quote:
Originally Posted by willus View Post
k2pdfopt -mode copy -odpi 200 -c -g 1 -sh- -cmax 1 -ocr- myfile.pdf

This copies the dimensions, sets bitmap DPI to 200, turn on color output, sets gamma to 1 (no change), turns off sharpening, turns off contrast adjust, and turns off OCR (no hidden layer). You can try just a few pages of conversion by adding: -p 1-10 (convert the first 10 pages only).
Thank You!

It worked!

I don't know if you are still developing the the software, although I can see that you are very active in the forum, but maybe this is a feature worth implementing: deleten the hidden text layer.

In my case the Kobo just couldnt cope with the hidden text. Whenever I moved th image to reposition it I would get the image of the hidden text instead of the actual image layer. It would freeze there and the only way I was able to circunvent it was to put the device to sleep, when waken back it would show me the image like nothing had happend...

Anyways great piece of software! Just the learning curve that is a bit steep.

Last edited by Ramo; 04-21-2018 at 01:23 PM.
Ramo is offline   Reply With Quote
Old 04-20-2018, 11:17 AM   #1540
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Removing the text layer has its downsides. Search for text will no longer work. All you have is a set of images.
DaleDe is offline   Reply With Quote
Old 04-20-2018, 03:02 PM   #1541
Ramo
Enthusiast
Ramo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five words
 
Posts: 25
Karma: 37930
Join Date: Mar 2018
Device: Kobo TouchC
Quote:
Originally Posted by DaleDe View Post
Removing the text layer has its downsides. Search for text will no longer work. All you have is a set of images.
I know, but right now it is unreadable for me, it takes 2 minutes to turn each page. And the hidden text layer isn't good at all, it was made through ORC of images set um in Garamond, a lot of it italics and in French. A recepy for disaster.

I might as well get rid of the whole thing. It is just impressing me how hard it has been to find a tool to such a simple job. At least I think it is simple.

Last edited by Ramo; 04-21-2018 at 01:26 PM.
Ramo is offline   Reply With Quote
Old 04-21-2018, 11:21 AM   #1542
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Ramo View Post
I know, but rigjt now it is unredable for me, it takes 2 minutes to turn each page. And the hidden text layer isn't good at all, it was made through ORC of images of garamond, a lot of it italics and in French. A recepy for disaster. I might as well get rid of the whole thing. It si just impressing me how hard it has been to find a tool to do the job.
Is there any chance you can post a few pages from your PDF file? I'd like to experiment with it.
willus is offline   Reply With Quote
Old 04-21-2018, 01:41 PM   #1543
Ramo
Enthusiast
Ramo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five words
 
Posts: 25
Karma: 37930
Join Date: Mar 2018
Device: Kobo TouchC
Quote:
Originally Posted by willus View Post
Is there any chance you can post a few pages from your PDF file? I'd like to experiment with it.
Thank you, willus.
Just sent it via PM.
Ramo is offline   Reply With Quote
Old 04-21-2018, 02:38 PM   #1544
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Ramo View Post
Thank you, willus.
Just sent it via PM.
As I suspected, the images are stored in JPEG 2000 format (you can see this when you use the k2pdfopt -i option), which taxes most PDF readers significantly more than JPEG or PNG. Moreover, they are 600 dpi--very high res. That is probably why your reader does not like displaying the file--not because of the hidden text. The default k2pdfopt output is PNG ("Flate"), which is much faster to display, but, as you noted, balloons the file size considerably depending on your chosen resolution and color depth. You might try leaving OCR selected (-ocr m) rather than disabling it. I'll bet it will still work fine and you'll then be able to search the document.

There is not a trivial way to simply remove hidden text from a PDF and leave everything else exactly the way it is. I could maybe make it easier to use the method I showed you with a single command-line option to try to intelligently choose the parameters, but in terms of leaving all of the bitmaps in exactly their original format (highly compressed JPEG 2000), I don't have a way to do that.
willus is offline   Reply With Quote
Old 04-22-2018, 08:23 AM   #1545
Ramo
Enthusiast
Ramo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five words
 
Posts: 25
Karma: 37930
Join Date: Mar 2018
Device: Kobo TouchC
Quote:
Originally Posted by willus View Post
As I suspected, the images are stored in JPEG 2000 format (you can see this when you use the k2pdfopt -i option), which taxes most PDF readers significantly more than JPEG or PNG. Moreover, they are 600 dpi--very high res. That is probably why your reader does not like displaying the file--not because of the hidden text. The default k2pdfopt output is PNG ("Flate"), which is much faster to display, but, as you noted, balloons the file size considerably depending on your chosen resolution and color depth. You might try leaving OCR selected (-ocr m) rather than disabling it. I'll bet it will still work fine and you'll then be able to search the document.

There is not a trivial way to simply remove hidden text from a PDF and leave everything else exactly the way it is. I could maybe make it easier to use the method I showed you with a single command-line option to try to intelligently choose the parameters, but in terms of leaving all of the bitmaps in exactly their original format (highly compressed JPEG 2000), I don't have a way to do that.
Thank you! It is not the perfect one-button-solution for all my problems, but now I understand what is happening!

I learned about JPEG 2000 just 2 minutes ago when downloading a set of scanned images from archive.org and failing to make scantailor work on them. Talk about Sincronicity!

Way better suport that I've ever had from any company! You're awesome!

Just out of curiosity, do you have a guess of if KOreader would do a better job with this kind of pdf instead of the Nikel standart software on my Kobo TouchC? And how did you found out about the resolution of the images on the PDF, is there a option to do that on K2PDFopt? I Couldn't find it. And the JPX & JBIG2 on brackets on -i are the file formats of the imagens than?
Ramo is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 04:10 PM.


MobileRead.com is a privately owned, operated and funded community.