Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 02-12-2015, 07:46 PM   #976
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
I have for several years been using the Briss tool to manually massage a magazine available to me in PDF format for easy reading on my Kindle 3 WiFi. Most of the pages are in two column format, but a few are a singular and some have tables or images spanning both columns. Footnotes are also present in some of the articles, which I usually break out on their own. There are headers and footers added by the source of the PDF that I strip out. The PDF file is page images as opposed to anything more useful.

Is it possible for k2pdfopt to do this for me? If you would like to see a specific example I will be happy to provide you with an example magazine, before and after. In case it makes a difference, I am using linux mint 17.1 rebecca x64 here and have downloaded and installed the latest linux release.

Dave

Last edited by dhdurgee; 02-12-2015 at 07:47 PM. Reason: add detail
dhdurgee is online now   Reply With Quote
Old 02-12-2015, 10:12 PM   #977
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dhdurgee View Post
I have for several years been using the Briss tool to manually massage a magazine available to me in PDF format for easy reading on my Kindle 3 WiFi. ...
Is it possible for k2pdfopt to do this for me? ...

Dave
Yes, it's possible k2pdfopt might automate this process for you--it depends largely on the specifics of the magazine layout, how clean and consistent the pages are, etc., as to how satisfactory your results will be. You can post or PM me an example--that would be the most helpful.
willus is offline   Reply With Quote
Advert
Old 02-13-2015, 09:27 AM   #978
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
Quote:
Originally Posted by willus View Post
Yes, it's possible k2pdfopt might automate this process for you--it depends largely on the specifics of the magazine layout, how clean and consistent the pages are, etc., as to how satisfactory your results will be. You can post or PM me an example--that would be the most helpful.
I have placed a sample PDF, before and after, in a DropBox folder and offered to share it to your email address. If you have any further questions or need further examples please let me know.

Dave
dhdurgee is online now   Reply With Quote
Old 02-13-2015, 10:46 AM   #979
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dhdurgee View Post
I have placed a sample PDF, before and after, in a DropBox folder and offered to share it to your email address. If you have any further questions or need further examples please let me know.

Dave
k2pdfopt works well on your original source document (before processing with Briss) with just some cropping directives to crop out the scanning artifacts at the edges of the pages. You can adjust them as you like to also try and crop out footers or headers:

k2pdfopt -m .11in,.04in,.14in,.6in source.pdf

Or, in the GUI, which you can run in Wine, set the crop margins (see attached).
Attached Thumbnails
Click image for larger version

Name:	gui.png
Views:	358
Size:	26.5 KB
ID:	134796  
willus is offline   Reply With Quote
Old 02-13-2015, 12:19 PM   #980
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
Quote:
Originally Posted by willus View Post
k2pdfopt works well on your original source document (before processing with Briss) with just some cropping directives to crop out the scanning artifacts at the edges of the pages. You can adjust them as you like to also try and crop out footers or headers:

k2pdfopt -m .11in,.04in,.14in,.6in source.pdf

Or, in the GUI, which you can run in Wine, set the crop margins (see attached).
I did give it a try with the above options and agree that it is producing a good, if not totally optimal version. I am a bit surprised at the growth in size, from less than 8mb to over 25mb, is there a way to address that?

Regarding further cropping, it appears from inspecting the original that although there is a consistent footer, except where I have inserted blanks to keep the even/odd pages correct, the header area that I crop out manually only appears at the beginning of articles. Thus the PDF pages are either first page of an article, subsequent pages of an article or inserted blanks.

Is the tool up to such a detailed classification of pages? If so, perhaps this can be further cleaned up.

I also notice that the table of contents got a bit mangled, but other than that a casual check seems to show a good job done on articles themselves.

Thank you for your assistance with this.

Dave
dhdurgee is online now   Reply With Quote
Advert
Old 02-13-2015, 01:37 PM   #981
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dhdurgee View Post
I did give it a try with the above options and agree that it is producing a good, if not totally optimal version. I am a bit surprised at the growth in size, from less than 8mb to over 25mb, is there a way to address that?

Regarding further cropping, it appears from inspecting the original that although there is a consistent footer, except where I have inserted blanks to keep the even/odd pages correct, the header area that I crop out manually only appears at the beginning of articles. Thus the PDF pages are either first page of an article, subsequent pages of an article or inserted blanks.

Is the tool up to such a detailed classification of pages? If so, perhaps this can be further cleaned up.

I also notice that the table of contents got a bit mangled, but other than that a casual check seems to show a good job done on articles themselves.

Thank you for your assistance with this.

Dave
Regarding the size of the converted file (in bytes), you can try using 2-column mode (-mode 2col) which uses native PDF output and therefore preserves the size much better, but it may result in slower rendering (and sometimes out-of-memory errors) on your reader, and it also will not re-flow text from wide/single columns. You might try -ppgs to mitigate slower rendering. On the plus side, this does prevent the TOC from being mangled. Or you can reduce the number of bits per pixel from 4 to a smaller number, e.g. 2 (-bpc 2). You can even combine the two methods, using -mode 2col -n- -bpc 2, which will use 2-column mode (preventing any text re-flow) but still do bitmapped rendering rather than native (the -n- turns off the native output feature of 2-column mode).

At this time you can use crop boxes (-cbox) to crop individual sets of pages differently, e.g.

-cbox5,10,20-29 .11in,.04in,5.35in,8.99in


Would crop pages 5, 10, and 20-29 starting at .11 inches from the left, 0.04 inches from the top, to a width x height of 5.35 in x 8.99 in. But for me it wouldn't be worth it go to that kind of trouble just for casual reading.

If you want to get really fancy, you can use the -p option to only process certain source pages (different ways) and then re-assemble all of the converted parts with something like PDFtk or jpdftweak. But again, for me, it wouldn't be worth it for casual reading.
willus is offline   Reply With Quote
Old 02-13-2015, 03:41 PM   #982
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
Does the -cbox option work in addition to the -m option? I am already using PDFtk to assemble a single PDF from the individual articles, so I would not have too much problem with determinine which pages have the extra header information that needs to be removed. Looking at the output I assume I should be somewhat conservative on this as you appear to be detecting white space margins for automatic cropping.

Dave
dhdurgee is online now   Reply With Quote
Old 02-13-2015, 05:19 PM   #983
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dhdurgee View Post
Does the -cbox option work in addition to the -m option? I am already using PDFtk to assemble a single PDF from the individual articles, so I would not have too much problem with determinine which pages have the extra header information that needs to be removed. Looking at the output I assume I should be somewhat conservative on this as you appear to be detecting white space margins for automatic cropping.

Dave
Yes, -cbox works in addition to -m. They actually work a little differently. The -m option acts only on the entire source page (not the cropped area) and causes the source page to be treated as if all those margin areas become white (that's what actually happens to the internal bitmap). So it doesn't affect the size of the processed source page. The -cbox option causes just that cropped area to be processed, and the source page becomes a smaller size. It's a subtle difference but can affect how things are processed. To disable auto-cropping of white space, you can use -t-.
willus is offline   Reply With Quote
Old 02-13-2015, 07:50 PM   #984
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
I just took a look at the full command line documentation and I now am wondering if using your tool it might make sense to use it first on the inividual article PDF files and then use PDFtk to merge the processed files into a single PDF file. In that approach the top portion I would need to crop would be on the first page of each article. So I assume I would be able to use -cbox1 (specific figures) -m (specific figures) ./*.pdf as an argument to crop page one only and then marginalize the appropriate sections.

Dave
dhdurgee is online now   Reply With Quote
Old 02-13-2015, 08:00 PM   #985
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dhdurgee View Post
I just took a look at the full command line documentation and I now am wondering if using your tool it might make sense to use it first on the inividual article PDF files and then use PDFtk to merge the processed files into a single PDF file. In that approach the top portion I would need to crop would be on the first page of each article. So I assume I would be able to use -cbox1 (specific figures) -m (specific figures) ./*.pdf as an argument to crop page one only and then marginalize the appropriate sections.

Dave
You are correct except that you need to add -p 1 if you want to process only page 1 of each PDF file. Otherwise -cbox1 <...> will apply that crop box to page 1, but the other pages will be done as usual, full size.
willus is offline   Reply With Quote
Old 02-13-2015, 09:49 PM   #986
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
Quote:
Originally Posted by willus View Post
You are correct except that you need to add -p 1 if you want to process only page 1 of each PDF file. Otherwise -cbox1 <...> will apply that crop box to page 1, but the other pages will be done as usual, full size.
Actually I want to process each PDF completely, but I need to special case the first page as it has that header I don't want/need to keep. Given this it sounds worth trying the next time a magazine becomes available for download.

I also tried adding -bpc 1, which gets the size down to a much more comparable figure. Given these are B&W or at most greyscale image scans can I expect any particular problems with this approach? On occasion there are photo images or artwork in the magazine. Is there any special provision to treat those differently, perhaps doing them at a higher bpc and keeping the bpc as 1 for text areas?

Thanks again for your input on this.

Dave
dhdurgee is online now   Reply With Quote
Old 02-14-2015, 01:29 AM   #987
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dhdurgee View Post
...I also tried adding -bpc 1, which gets the size down to a much more comparable figure. Given these are B&W or at most greyscale image scans can I expect any particular problems with this approach? On occasion there are photo images or artwork in the magazine. Is there any special provision to treat those differently, perhaps doing them at a higher bpc and keeping the bpc as 1 for text areas?
There's no special provision for having different -bpc levels within one conversion at this time. I suggest trying different values for -bpc and seeing how you think the converted file looks.
willus is offline   Reply With Quote
Old 02-14-2015, 04:30 PM   #988
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
I just gave the following a try, but the results were unexpected:

k2pdfopt -bpc 1 -cbox1 0in,.52in -m .11in,.04in,.14in,.6in Analog_2014-12-01.pdf

Reading 4 pages from Analog_2014-12-01.pdf ...

Detecting document orientation ... No rotation necessary.

SOURCE PAGE 1 of 4 (5.7 x 9.6 in) ... 3 new pages saved.

SOURCE PAGE 2 of 4 (5.6 x 9.0 in) ... 0 new pages saved.

SOURCE PAGE 3 of 4 (5.7 x 9.0 in) ... 0 new pages saved.

SOURCE PAGE 4 of 4 (5.6 x 9.0 in) ... 0 new pages saved.

4 pages written to Analog_2014-12-01_k2opt.pdf (0.1 MB).

Note that only the first page was processed, I expected all four pages to be processed with the first page only having just the top .52in cropped off before further processing. What did I miss?

Dave
dhdurgee is online now   Reply With Quote
Old 02-14-2015, 08:18 PM   #989
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dhdurgee View Post
I just gave the following a try, but the results were unexpected:

k2pdfopt -bpc 1 -cbox1 0in,.52in -m .11in,.04in,.14in,.6in Analog_2014-12-01.pdf

Reading 4 pages from Analog_2014-12-01.pdf ...

Detecting document orientation ... No rotation necessary.

SOURCE PAGE 1 of 4 (5.7 x 9.6 in) ... 3 new pages saved.

SOURCE PAGE 2 of 4 (5.6 x 9.0 in) ... 0 new pages saved.

SOURCE PAGE 3 of 4 (5.7 x 9.0 in) ... 0 new pages saved.

SOURCE PAGE 4 of 4 (5.6 x 9.0 in) ... 0 new pages saved.

4 pages written to Analog_2014-12-01_k2opt.pdf (0.1 MB).

Note that only the first page was processed, I expected all four pages to be processed with the first page only having just the top .52in cropped off before further processing. What did I miss?

Dave
I may have misremembered--you may need to explicitly state that all the other pages use a default crop box. Try adding this to the command line:

-cbox2- 0,0
willus is offline   Reply With Quote
Old 02-15-2015, 07:53 AM   #990
dhdurgee
Guru
dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.dhdurgee ought to be getting tired of karma fortunes by now.
 
Posts: 905
Karma: 3000000
Join Date: Jun 2010
Device: K3W, PW4
Quote:
Originally Posted by willus View Post
I may have misremembered--you may need to explicitly state that all the other pages use a default crop box. Try adding this to the command line:

-cbox2- 0,0
That did the trick in most cases, but I am finding that a one page PDF is being processed twice! Looking at these cases it seems that one page PDFs are first processed using the -cbox1 setting and then again with the -cbox2- setting.

Any idea why this is happening? I guess I can work around it with a two pass process, but it is strange. Is this a bug that needs fixing?

Dave

Last edited by dhdurgee; 02-15-2015 at 07:54 AM. Reason: fix typo
dhdurgee is online now   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 04:39 PM.


MobileRead.com is a privately owned, operated and funded community.