MobileRead Forums - View Single Post

ali · 09-12-2006, 12:28 PM

Quote:

Originally Posted by arivero

For instance it doesnt work with sed, because it wipes away the null characters or something so. It is a pity because

cat prueba.pdf | sed 's/MediaBox \[.*\]/MediaBox \[0 0 300 500\]/g' > prueba2b.pdf

sounds elegant. I wonder if there is some utility to transform binary pdf files to text and back.

Weird. The sed on my system has no problems with \0. perl might help:

Code:

perl -e 'while(<STDIN>){s/XXX/YYY/g;print;};'

is equivalent to sed s/XXX/YYY/g

Are you sure that sed is your problem? Are you aware that the regexp you posted is erraneous? It matches from the first "MediaBox [" to the last "]" in a line, which might be the whole file. (sed always chooses the longest possible match of a regexp) This is better:

Code:

sed 's/MediaBox *\[[^]]*\]/MediaBox [0 0 300 500]/g'

Finally all crashes of acroread after tampering with MediaBoxes could be traced back to a broken xref table, which can be reconstructed using pdftk (see earlier post).