03-17-2007, 08:37 PM | #1 |
Member
Posts: 14
Karma: 10
Join Date: Nov 2006
Device: prs-500
|
Poor boys way of editing pdf files (mostly linux, cygwin)
Hi all,
I want to start a thread about editing pdfs with command line tools (mostly linux, cygwin). It might be boring for most folks but I find it interesting because I don't want to pay for a commercial pdf editor and there are no good free ones for linux (at least for the things that I want to do). WARNING: the post is long and boring The main reason for fulling around with pdfs is to make them more readable on my sony prs500. There are quite a few books that are available online but they are not formatted for small screen: fonts are thin, gray text and huge margins with page numbers titles etc. Considering that pdf/ps formats are well documented and essentially they are text files it is possible to edit them with some text utils. An example of what can be achieved by standard tools is attached at the end (this page is from apress book, I don't want to promote them but apress sells pdfs for 50% of paper copy price. Real pdfs no subscription stupidity. quote "Printing, document assembly, content copying or extraction, and content extraction for accessibility are permissible, for personal use only."). Back to the topic. Bellow is the way to do this conversion , I am not very good with bash scripting and editing pdfs so any input would be interesting. The first thing to do is to convert pdf to ps. The program I used is pdftops from xpdf package (pdf2ps is a different one and not good). It has a couple interesting options that might be useful Code:
-f <int> : first page to print -l <int> : last page to print Code:
#!/bin/bash # this script grabs all fonts from a postscript file and dumps them on stdout # the first argument is the file name to work with # get all font names and ids, replace " " with ":", "for" does not like spaces fonts=`cat $1 | grep -E "/F[0-9]{1,5}_0 /" | sed "s/ /:/g"` # get the number of times particular font id occurs in the file for i in $fonts do fontid=`echo $i | cut -f1 -d:` # the thing that we are going to look for fontname=`echo $i | cut -f2 -d: | sed 's/\///'` fontfreq=`cat $1 | grep -E "$fontid" | wc -l` echo $fontid $fontname $fontfreq done Code:
/F154_0 HelveticaNeue-MediumCond 2 /F155_0 HelveticaNeue-Condensed 3 /F156_0 ZapfDingbats 2 /F157_0 FAADGE+TimesNewRoman 3 /F100_0 HelveticaNeue-BoldCond 2 /F103_0 Utopia-Regular 7 /F108_0 TheSansMonoConSemiLight 6 /F111_0 HelveticaNeue-BoldCond 55 /F112_0 ZapfDingbats 48 /F109_0 HelveticaNeue-MediumCond 67 /F110_0 Utopia-Regular 481 /F113_0 TheSansMonoConSemiLight 430 /F122_0 Utopia-Italic 64 /F117_0 HelveticaNeue-HeavyCond 9 /F119_0 Utopia-Semibold 34 /F123_0 FAADGE+TimesNewRoman_0 29 /F129_0 HelveticaNeue-Condensed 102 /F130_0 HelveticaNeue-CondensedObl 9 /F135_0 TheSansMonoConSemiLight-Italic 4 /F132_0 Utopia-Bold 6 /F136_0 Symbol 2 Code:
cat fontfreq.txt | grep "Utopia-Regular" | ./fontreplace.sh original.ps /F111_0 Code:
#!/bin/bash # the first argument is the file to work on # the second argument is the id of the font to use in place of incoming list of fonts while read fontstring do fontid=`echo $fontstring | cut -f1 -d" "` # first column is supposed to be font id echo $fontid sed "s/\\$fontid 1/\\$2 1/" $1 > tmp.ps mv tmp.ps $1 done Code:
cat fontfreq.txt | grep "TheSansMonoConSemiLight" | ./fontreplace.sh original.ps /F132_0 cat fontfreq.txt | grep "HelveticaNeue-Condensed" | ./fontreplace.sh original.ps /F111_0 Code:
[/CropBox [110 100 500 670] /PAGES pdfmark Code:
%%DocumentMedia: plain 612 792 0 () () %%BoundingBox: 0 0 612 792 Code:
./insertcrop.sh original.ps 110 100 500 670 > original_wcrop.ps #!/bin/bash # inserts crop box command into the file passed as the first argument # values for crop are 2..5th cl arguments prologln=`grep -m 1 -n "^%%BeginProlog$" $1 | cut -f1 -d:` head -n $prologln $1 echo "[/CropBox [$2 $3 $4 $5] /PAGES pdfmark" tail -n +`expr $prologln + 1` $1 Code:
ps2pdf original_wcrop.ps edited.pdf Any comments on how to edit pdf/ps files will be appreciated. |
03-25-2007, 11:37 PM | #2 |
Blueberry!
Posts: 888
Karma: 133343
Join Date: Mar 2007
Device: Sony PRS-500 (RIP); PRS-600 (Good Riddance); PRS-505; PRS-650; PRS-350
|
Thanks for these utilities! There seems to be an overt emphasis on Windows for conversion utilities, with Linux and Mac sitting, dejected on the sidelines.
I hope to give these a run on Mac OS X, and if it works you can add that to the subject. Bash comes standard, and Fink (Mac OS X port of debian's apt) has xpdf and ghostscript, so those are easy. I don't have my PRS500 yet, so it'll be a few days before I can test. -Pie |
Advert | |
|
03-30-2007, 11:14 PM | #3 |
Blueberry!
Posts: 888
Karma: 133343
Join Date: Mar 2007
Device: Sony PRS-500 (RIP); PRS-600 (Good Riddance); PRS-505; PRS-650; PRS-350
|
As promised, I demoed these, and they work perfectly under Mac OS X.
However, I only tried them with the Reader Manual (how idiotic is it this thing is nigh impossible to read?). To say the least, this was far too complicated, as there are like 20 or 30 fonts, and insanely difficult to find text strings in a Postscript. All in all this is a very complicated process and needs more automation. Don't know if that's possible. But there it is. -Pie |
04-02-2007, 03:26 PM | #4 | |
Member
Posts: 14
Karma: 10
Join Date: Nov 2006
Device: prs-500
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Looking for Linux PDF editing tools for DX format | tobor | Kindle Developer's Corner | 1 | 06-19-2009 07:37 PM |
Do need help editing text files? | Nate the great | Workshop | 3 | 04-01-2009 01:18 PM |
Poor editing? | thibaulthalpern | News | 39 | 03-18-2009 07:47 PM |
Reading PDF files on Windows or Linux | Bob Russell | 18 | 02-14-2009 01:21 PM | |
Editing RTF Files | DougFNJ | Sony Reader | 3 | 11-29-2007 01:27 PM |