Originally Posted by ashkulz
First, there are two very good projects which already implement this: pdfcrop
(the latter has a very good fork at pdfcrop2
). All of them have the same disadvantage: they detect the bounding box using ghostscript (which is very good and accurate) but then they don't update the PDF in-place: they re-create the PDF using pdftex or other software.
Thanks for the links, the pdfcrop2 project looks quite interesting. When I get a chance, I'll give detecting a bounding box via gs a go.
The main thing I want to do is create a programme that will take a page and cut it into two pages. Those "half-pages" can then be rescaled using a PDF printer (probably as landscape)... Think how well that will display on the Iliad? Half an A4 page is about the same size as the Iliads screen, so it should work nicely.
The trick, however, is to cut the page in such a way that you do not cut through a sentence.
I tried converting PDF pages to images and then using Python Image Library to analyze the color composition of areas at and near the middle. If the area was mostly white, then it was fine to cut there...
It almost worked, but the results were quite inconsistent. Some pages were cut cleanly near the middle, others were cut either a third of the way down etc.
The idea that I have now is to export each page of the PDF as a SVG file. Since SVG is an XML-based format, one can then simply copy elements with y coordinates above or below a certain value to separate SVG files. Then print those files as PDFs, and merge all of them back into one PDF.
Unfortunately I haven't had the time to really sit and code the above. Never worked with with parsing XML in Python, so I have to first learn how to do that...
Any suggestions would be welcome.
BTW, I'm not a programmer... I only code for fun.