MobileRead Forums - View Single Post

ludo · 08-14-2009, 05:24 AM

Quote:

Originally Posted by JamesFlames

I have been tinkering with this problem and have a solution which seems quite effective.

If I use convert on large PDF files, it consumes so much memory that the process is killed before it manages to convert a single page. I read up a bit on alternative methods, and this is what I came up with.

First, we use pdftoppm to convert all the pages in the PDF, or a subsection of them. To convert all pages

Code:

pdftoppm my_pdf_file.pdf prefix_name

where my_pdf_file.pdf is the file to convert, and prefix_name the base name used for the converted images. To convert a subsection

Code:

pdftoppm my_pdf_file.pdf -f first_page -l last_page prefix_name

You can also get grayscale images etc. using additional pdftoppm options. The conversion is pretty fast and produces good looking images.

Now you have to convert the ppm images to jpeg. One thing you can also do is crop the resulting jpeg images, so as to eliminate white borders around the text. Conversion is done using convert, but again processing each single file works much better on my pc than doing everything in one go

Code:

for f in $(ls *ppm); do convert $f -quality 0.85 $f.jpg; done

If you need to crop edges, first look at the original dimensions

Code:

identify prefix_name_001.jpg

or open one of the files in an image editor. Let's say our image file is 1220x1610 and we want to crop 30 pixels from each side, our conversion options become

Code:

for f in $(ls *ppm); do convert $f -crop 1160x1550+30+30\! -quality 0.85 $f.jpg; done

We are basically saying to crop the original image so that we get a 1160x1550 one, with horizontal and vertical offsets of 30 pixels, and to resize the canvas dimensions of the converted image to the cropped size.

Then it's just a matter of zipping all the jpg files.