Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 03-18-2013, 05:18 PM   #361
lumocolor
Junior Member
lumocolor began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2013
Device: Kindle
Quote:
Originally Posted by willus View Post
Your PDF example looks pretty clean / straightforward. What are the issues with the conversion? Do you want the text re-flowed? What options have you tried?

Update: Since your book goes right to the edges, you'll need to specify -m 0 so that k2pdfopt doesn't crop away some of the text. If you want re-flow, that should work reasonably well. If you want OCR along with re-flow, add -ocr t (you'll want to set up tesseract). If you can read the book without re-flowing it, you may prefer using -mode fw (see my native PDF help page).
-m 0 does it! I spent hours trying to figure out the right settings and the solution was so simple. Thank you for helping.
lumocolor is offline   Reply With Quote
Old 03-18-2013, 08:42 PM   #362
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by lumocolor View Post
-m 0 does it! I spent hours trying to figure out the right settings and the solution was so simple. Thank you for helping.
Very sorry about that. As I said just a few posts ago, the next release will default to -m 0. BTW, see the 10th question on the FAQ page.

Last edited by willus; 03-18-2013 at 08:44 PM.
willus is offline   Reply With Quote
Old 03-18-2013, 10:11 PM   #363
monkey1d
Junior Member
monkey1d began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2013
Device: kindle 4
Quote:
Originally Posted by willus View Post
Very sorry about that. As I said just a few posts ago, the next release will default to -m 0. BTW, see the 10th question on the FAQ page.
Can you help me solve the problem on #360?thanks
monkey1d is offline   Reply With Quote
Old 03-19-2013, 01:22 AM   #364
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by monkey1d View Post
Hey, firstly let me thank you for this amazing program.I use - ws 0.01 -dpi 300 -gtc 0 -gtc 0 -gtw 0 to convert test.pdf.The result is test_k2opt.pdf.But i feel the word spacing and line spacing is too large on my kindle4,thus a waste of space.In order to make smaller word spacing and line spacing,what command should i use?What do you think is the perfect command for test.pdf on kindle4 ?thanks.
Sorry I missed this one. It sounds like you think the magnification is too large, so try a smaller output dpi (output dpi help page). The -vls option controls the line spacing (command line options).
willus is offline   Reply With Quote
Old 03-23-2013, 10:34 AM   #365
Kornholio
Junior Member
Kornholio began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2013
Device: Sony PRS-T1
I've used k2pdfopt before with good results, however now i'm having trouble with terrible line breaking (check attachment).
Wrapping is on and obviously i'm using the -fc- parameter.

Is there any way to improve the wrapping? I tried some different -ws values but I'm not getting anything remotely useable.

This is the example (source) page:
http://www.pdf-archive.com/2013/03/23/page17/page17.pdf

Any help would be greatly appreciated.
Attached Thumbnails
Click image for larger version

Name:	k2pdfopt.jpg
Views:	514
Size:	457.8 KB
ID:	103354  
Kornholio is offline   Reply With Quote
Old 03-25-2013, 02:22 PM   #366
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
Question

Quote:
Originally Posted by willus View Post
Yes, there's a way. You use the -grid option to create a temporary PDF file which has one page per page, and then do a second pass on that file. See post #324.
In this particular case I had to do three steps: cut the page in two using -grid, remove the scan artifact margins (because on each second page the scan artifact margin was on the different side, and the text was not in exactly the same position as the previous text), and finally adapt the book to my ereader screen.

When doing multiple step conversion quality of pdf's gets lost... The text in the final version looks much worse than the original.

is there a way to avoid image quality loss? Maybe this could be done by enforcing use of lossless image format?
dgvirtual is offline   Reply With Quote
Old 03-25-2013, 07:06 PM   #367
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dgvirtual View Post
In this particular case I had to do three steps: cut the page in two using -grid, remove the scan artifact margins (because on each second page the scan artifact margin was on the different side, and the text was not in exactly the same position as the previous text), and finally adapt the book to my ereader screen.

When doing multiple step conversion quality of pdf's gets lost... The text in the final version looks much worse than the original.

is there a way to avoid image quality loss? Maybe this could be done by enforcing use of lossless image format?
If you use native mode (-n) in the first couple of steps, there should be no loss of fidelity at all--you can use -n as long as you aren't doing re-flow. But even if you don't use native mode, you just need to keep the output resolution high enough, and you should be fine. Use the -dr option to increase the display resolution of the output PDF file if you are not using native mode. E.g. -dr 2 will double the default output resolution. The image format within the PDF file (when not using native output mode) defaults to lossless (.png), so you shouldn't have to change that. If you can't get it, then post a few more pages of your sample and the commands you are using, and I can check and advise.
willus is offline   Reply With Quote
Old 03-27-2013, 11:47 AM   #368
markom
Banned
markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.
 
Posts: 488
Karma: 1080260
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
Quote:
Originally Posted by dgvirtual View Post
In this particular case I had to do three steps: cut the page in two using -grid, remove the scan artifact margins (because on each second page the scan artifact margin was on the different side, and the text was not in exactly the same position as the previous text), and finally adapt the book to my ereader screen.

When doing multiple step conversion quality of pdf's gets lost... The text in the final version looks much worse than the original.

is there a way to avoid image quality loss? Maybe this could be done by enforcing use of lossless image format?
I would usually use Briss or PDf Scissors before, only to roughly crop pdf margins & cut pages in two and then use k2pdfopt in landscape mode.

If needed, I would then do quick OCR-ing in Acrobat or ABBYY FineReader.

https://www.mobileread.com/forums/sho...=32066&page=14

Last edited by markom; 03-27-2013 at 12:28 PM.
markom is offline   Reply With Quote
Old 03-28-2013, 08:33 AM   #369
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Kornholio View Post
I've used k2pdfopt before with good results, however now i'm having trouble with terrible line breaking (check attachment).
Wrapping is on and obviously i'm using the -fc- parameter.

Is there any way to improve the wrapping? I tried some different -ws values but I'm not getting anything remotely useable.

This is the example (source) page:
http://www.pdf-archive.com/2013/03/23/page17/page17.pdf

Any help would be greatly appreciated.
Sorry--I missed this post. The problem is that your document size (4.5 x 7 inches) combined with k2pdfopt's default output resolution (167 dpi) results in no wrapping being required. So you have two options if you want wrapped text: (1) increase the output dpi (will make everything larger) to something like 200, or (2) use -wrap+, which will un-wrap the narrow column on the right so that all the text fits the width of your reader screen. You also should use -m 0 to avoid having any clipping since your viewable region runs right to the edge of the page. Finally, for cases like this I like to use -sm so that I can verify how k2pdfopt is interpreting the page layout. Final commands, then:

k2pdfopt -m 0 -sm -fc- -odpi 200 page17.pdf

or

k2pdfopt -m 0 -sm -fc- -wrap+ page17.pdf

(you can also combine -odpi 200 and -wrap+).
willus is offline   Reply With Quote
Old 03-28-2013, 02:40 PM   #370
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
Question

Quote:
Originally Posted by willus View Post
If you use native mode (-n) in the first couple of steps, there should be no loss of fidelity at all--you can use -n as long as you aren't doing re-flow. But even if you don't use native mode, you just need to keep the output resolution high enough, and you should be fine. Use the -dr option to increase the display resolution of the output PDF file if you are not using native mode. E.g. -dr 2 will double the default output resolution. The image format within the PDF file (when not using native output mode) defaults to lossless (.png), so you shouldn't have to change that. If you can't get it, then post a few more pages of your sample and the commands you are using, and I can check and advise.

Thanks for explanation. Native mode does not work here, since the file is scanned. It might well be jpeg-based... Now, if the original pdf is composed of jpeg images, would the k2pdfopt convert it to a png based pdf in any step of the three below?

But then I tried to repeat the steps by which I got the pdf with quality loss and I got stuck I do not know how I succeeded the first time...

Here are the three commands I used:

Code:
k2pdfopt -ui- -mode copy -n -grid 2x1x0 -w 1t -h 1t page01.pdf -o page1.pdf

k2pdfopt -ui- -mode copy -ml 0.5 -mr 0.2 page1.pdf -o page2.pdf

k2pdfopt -ui- -as -w 758 -h 942 -odpi 213 -om 0.04 page2.pdf -o page3.pdf
First one runs ok, second one produces errors and saves an empty output file. Output looks like this:

Code:
Reading 4 pages from page1.pdf ...
warning: unknown keyword: 'e-14'
warning: unknown keyword: 'e-14'
SOURCE PAGE 1 of 4 (5.9 x 8.3 in) ... 0 new pages saved.
And I never get the chance to run the third command

Could you tell me what is it that goes wrong here? I am attaching the original image... page01.pdf
dgvirtual is offline   Reply With Quote
Old 03-28-2013, 11:26 PM   #371
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dgvirtual View Post
Thanks for explanation. Native mode does not work here, since the file is scanned. It might well be jpeg-based... Now, if the original pdf is composed of jpeg images, would the k2pdfopt convert it to a png based pdf in any step of the three below?

But then I tried to repeat the steps by which I got the pdf with quality loss and I got stuck I do not know how I succeeded the first time...
...
Could you tell me what is it that goes wrong here? I am attaching the original image... Attachment 103604
It took a little tinkering, but the commands I used are:

k2pdfopt -ui- -mode copy -cmax -1 -g 1 -bpc 8 -grid 2x1x0 -n- -w 1t -h 1t -dr 2 page01.pdf -o temp.pdf

k2pdfopt -ui- -as -ml 0.4 -mr 0.4 -w 758 -h 942 -odpi 213 -om 0.04 temp.pdf -o out.pdf

The 'e-14' issue is a bug in my code. Apparently scientific notation is not allowed for move commands in PDF files, and since I use %g as a formatter and your document has a /Rotate 90 directive in the page objects, there end up being some very small scientifically formatted values in the move commands (e.g. 1.00234e-14), which aren't allowed in PDF. So I'll fix that, but for now I've provided a work around. Note that if k2pdfopt didn't have this bug, native mode would have worked fine in the first command--it doesn't matter whether the source document is scanned or not. Native mode conversion is the best way to preserve the fidelity of the original file.

Anyway, because of the bug, I had to use bitmap mode in the first conversion command. And yes, k2pdfopt, in bitmap mode, will effectively convert the scanned JPEG images to PNG (4-bit grayscale by default).

Here are what the less obvious options do:

-mode copy sets the output to be a copy of the input, with the output in bitmap mode.

-cmax -1 -g 1 -bpc 8 preserves the source contrast, gamma, bits per pixel (none of this would have been necessary if I could have done a native mode conversion) so that you don't get dithering artifacts.

-grid 2x1x0 breaks each page into 2 output pages (2 x 1 grid, no overlap)

-n- turns off native mode (-grid turns it on, so you have to turn it back off--again, not necessary if native mode had worked)

-w 1t -h 1t sets the output page size to mirror the gridded pieces of the source

-dr 2 doubles the output resolution so that we don't lose fidelity (not necessary if native mode had worked)

The other options are pretty straightforward and can be reviewed in my command-line options page.

Last edited by willus; 03-29-2013 at 08:53 AM. Reason: Found reason for 'e-14' error.
willus is offline   Reply With Quote
Old 03-29-2013, 04:33 PM   #372
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
Wink

Quote:
Originally Posted by willus View Post
It took a little tinkering, but the commands I used are ....
Hey, it worked, the results are much better than before. Thank you very much for looking into the issue and figuring the best way to transform the text for ereader. The program AND your help are amazing!
dgvirtual is offline   Reply With Quote
Old 03-29-2013, 08:26 PM   #373
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dgvirtual View Post
Hey, it worked, the results are much better than before. Thank you very much for looking into the issue and figuring the best way to transform the text for ereader. The program AND your help are amazing!
Thank you. Happy to help. I have fixed the scientific notation issue for the next release.
willus is offline   Reply With Quote
Old 04-02-2013, 01:43 PM   #374
jldg
Junior Member
jldg began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2013
Location: france
Device: koboglo
I tried to convert the pdf output of k2pdfopt, to epub format with calibre.
the goal is using the font size adjustment during reading on my kobo.
(obviously, I used the native output option )
it's near Ok BUT epub pages are repeated 2 times!
the pdf is ok when I read it with adobe reader or corelpdf.
BUT pages are repeated when I open the pdf with PdfMasher (before conversion to epub)!
do you understand why? and how to avoid this repeat?

links to pdf and epub files are here:

k2opt.pdf

k2opt.epub

thank you for help
jldg is offline   Reply With Quote
Old 04-02-2013, 10:41 PM   #375
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by jldg View Post
I tried to convert the pdf output of k2pdfopt, to epub format with calibre.
the goal is using the font size adjustment during reading on my kobo.
(obviously, I used the native output option )
it's near Ok BUT epub pages are repeated 2 times!
the pdf is ok when I read it with adobe reader or corelpdf.
BUT pages are repeated when I open the pdf with PdfMasher (before conversion to epub)!
do you understand why? and how to avoid this repeat?

links to pdf and epub files are here:

k2opt.pdf

k2opt.epub

thank you for help
When k2pdfopt constructs the converted document in native PDF mode, it uses crop boxes to place part of each source page on each destination page. So page 1 of the source file, for example, may be spread across 2 - 3 pages of the destination file, using crop boxes to show the different regions of source page 1 that belong on each different destination page. It would appear that the conversion software you are using is ignoring the crop boxes and converting the entire contents of each source page multiple times (for each destination page that any of it appears on). There's not an easy work-around for this that I know of. I can think of two options: 1. Don't use native mode and instead use OCR to get the text (obviously it's not very satisfactory to convert native text to a bitmap and then OCR it back to text!); and 2. Use the source PDF file directly rather than using k2pdfopt to convert it first. If all you want is the text from the file, that's probably the way to go.
willus is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 03:22 AM.


MobileRead.com is a privately owned, operated and funded community.