View Single Post
Old 04-09-2015, 08:42 AM   #1036
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by RTL View Post
...
(3) For each single page, I used cbox option to divided a page into 2 pages, namely top page and buttom page.
(4) For the top page, I used again cbox to divided it into left and right portion, discarded the unnecessary part (the boxed original text part)
(5) Before I merge back the top and buttom part, I need to resize the top part's width to the buttom part's width. Otherwise, as portion of the top part was cut out, they have different width and thus reflowing does not work well.
(6) Merged the top and buttom part into a single pdf page.
(7) With pdfsam again, merged all pages into one pdf file.
(8) Used k2pdfopt to reflow.

This is just for around 10-20 pages for trial, as the manual workload is too much.

I attached the overlayed odd and even pages.
You can do steps 3 - 6 all in one step by using multiple -cbox options with one k2pdfopt command. Multiple -cbox's are allowed, so you can use one -cbox for the upper left (or right) region followed by a -cbox for the bottom region. k2pdfopt will process the two consecutively. It would certainly be easy enough to do something like -ibox for an "ignore box" which k2pdfopt would ignore. I will consider that for the next release.

Also, your overlays look quite consistent--I can see a clear boundary around the region to ignore, and I can also see a pretty clear boundary between the top and bottom text regions. Are you sure you can't automate with something like this?

-cbox1o 0s,0s,.5s,.5s -cbox1o 0,.5s -cbox2e 0.5s,0s,.5s,.5s -cbox2e 0,.5s

The values might not quite be right, but they should be close. I'm also not sure if I got the even/odd pages correctly correlated with which side the "ignore box" is on. (If you leave off the width and height values from -cbox, it will default to extend to the edge of the page.)

If you want to convert your PDF to bitmaps, there are multiple programs that can do this. You could certainly write your own using the MuPDF library, or, like you said, it would be a trivial feature to add to k2pdfopt (it can already reassemble bitmaps into a PDF--just feed it a folder of .png or .jpg files named sequentially in the order you want them processed). The "convert" program from ImageMagick will also do this:

convert file.pdf file.png

... will create file-001.png, file-002.png, ...
It is a powerful bitmap conversion command with many options.

Last edited by willus; 04-09-2015 at 08:45 AM.
willus is offline   Reply With Quote