View Full Version : PDFRead on Mac OS X -- PDFRasterFarian for OS X!


sammykrupa
04-05-2007, 03:53 PM
ashkulz's amazing PDFRasterFarian-style program named PDFRead (http://www.mobileread.com/forums/showthread.php?t=10184&page=1&pp=15) is working on Mac OS X 10.4! See his post (http://www.mobileread.com/forums/showthread.php?t=10184&page=1&pp=15) for details on PDFRead.

I have only tested PDFRead on Mac OS X, and only when generating files for the Sony Reader.

Here is how to get it working on Mac OS X 10.4:

Before you do anything, you should have Apple's developer tools installed ( http://developer.apple.com/tools/ ).

First, download the latest 2.4.3 python from http://www.python.org/download/releases/2.4.3/

The Mac OS X installer application puts a big "MacPython 2.4" folder in your applications folder. Go into that folder and double-click the "Update Shell Profile.command" file.

Next, install pdftk ( http://www.pdfhacks.com/pdftk/#packages )

Now install fink from http://finkproject.org/

After fink is installed, type this into the command line (the Terminal application):

sudo apt-get install xpdf imagemagick ghostscript ghostscript-fonts

Now, download the latest PIL source from http://www.pythonware.com/products/pil/

cd into the directory of the source code you just downloaded, then run this command:

sudo python setup.py install

Download the PDFRead v4 source code:
https://sourceforge.net/project/showfiles.php?group_id=87679&package_id=91453

Extract it somewhere. The above directory will contain a command line program in Python, so you can execute it via:
python src/pdfread.py <options> pdf-file

Use the same options as detailed in ashkulz's post. However, note that you will have to copy the output directory name shown and navigate to it yourself.
If your not running python 2.5, download the ElementTree library from here:

http://effbot.org/downloads/elementtree-1.2.6-20050316.tar.gz

and extract it and copy the folder 'elementtree' to the same location as pdfread.py.

The last step is to download the file named "pdfread_py.txt" that is attached to this message and rename it to pdfread.py and replace the original pdfread.py file with it.

Now your done!

See ashkulz's original post for details about PDFRead!

nagegowda
05-15-2007, 04:44 AM
sir i am using mac os i have install the fink and xcopy so know next step is to

execute the command $ apt-get install ghostscript i am getting the error as

command not found what may be the problem

sammykrupa
05-15-2007, 07:20 AM
Ashkulz,

I myself am not able to install the new version. The png image proccessor is Intel-mac-only. That is as far I was I was able to get in the installiation.

ashkulz
05-15-2007, 07:54 AM
sir i am using mac os i have install the fink and xcopy so know next step is to

execute the command $ apt-get install ghostscript i am getting the error as

command not found what may be the problem I think your installation of fink is not set up properly, I cannot really help with you that as I don't have OS X ... sammykrupa may be able to help you there.

Ashkulz,

I myself am not able to install the new version. The png image proccessor is Intel-mac-only. That is as far I was I was able to get in the installiation. Hmm, do you mean pngnq OS X Tiger binary (http://www.cybertherial.com/pngnq/pngnq.html) is Intel-only? If you can compile it yourself and post it somewhere, that'd be great (You'll need zlib-dev and libpng-dev installed). Or else, please wait for a day or two (I'm planning to release 1.8 in a day or two, and I will be providing alternatives to pngnq)

sammykrupa
05-15-2007, 08:00 AM
I think I will wait for 1.8.

Thanks!

Sam Krupa

ricanchuloinfla
11-20-2008, 07:58 PM
Hi there:

First of all, many thanks to all of you for contributing to this forum. I've been lurking here for quite some time trying to learn as much as I can. I just acquired a Kindle and was looking for a better method to transferring my pdfs to the Kindle. Per this thread, I was able to install all of the packages. However, I keep on getting an error and only the first page gets processed.

The error I get is the following:

Temporary directory: /var/folders/mA/mAM-fCWAFOaq2F68laHB0E+++TI/-Tmp-/pdfread-JwOKDk

Page 1/39: EXTRACT RASTERIZE CROP DILATE SAVE Traceback (most recent call last):
File "/Users/john/Desktop/src/pdfread.py", line 207, in <module>
main()
File "/Users/john/Desktop/src/pdfread.py", line 84, in main
options.unpaper_args, options.no_crop, options.no_dilate)
File "/Users/john/Desktop/src/pdfread.py", line 63, in convert
output.add_page(page, mode_tranform(image))
File "/Users/john/Desktop/src/common.py", line 153, in add_page
self.downsample(image, filename)
File "/Users/john/Desktop/src/common.py", line 163, in downsample
call('pngnq', '-fs', '1', '-n', str(self.colors), 'page.png')
File "/Users/john/Desktop/src/common.py", line 201, in call
stderr = subprocess.STDOUT)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/subprocess.py", line 593, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/subprocess.py", line 1079, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
MacBook:src john$


I did notice that although I installed Python 2.6, only 2.5 is reflected here. Other than that, I can't figure out why this is not working with OS X. Anyone more knowledgeable can offer any insights? I've spent all day trying to figure this out and I want to fling my MacBook out the window (but can't because its new!) :o

I notice that some of the directions are unclear (some say to install pngnq and the latest pdfread source code does not make any mention of it)



I think I will wait for 1.8.

Thanks!

Sam Krupa

nrapallo
11-20-2008, 08:18 PM
I notice that some of the directions are unclear (some say to install pngnq and the latest pdfread source code does not make any mention of it)

This is a old thread using an older version of PDFRead. :smack:

For clearer and hopefully better install instructions, see the thread PDFRead 1.8.2 released! (http://www.mobileread.com/forums/showthread.php?t=21906) and in particular the thread PDFRead 1.8.2 working on Mac OS X. (http://www.mobileread.com/forums/showthread.php?p=176233#post176233) :thumbsup:

Please report back if this works for you. Enquiring minds want to know... :)

p.s. and welcome to Mobileread.com!

ricanchuloinfla
11-20-2008, 08:29 PM
Nick:

Thanks for the prompt reply. I've actually been bouncing between this thread and the other two you posted and I still have the error.

I have the Apple Developer Tool installed.
I downloaded the latest Python (2.6)
I installed pdktk
I downloaded pngnq (how does this get installed - just curious)
Installed fink and downloaded all packages as instructed
downloaded PIL and installed it
downloaded the latest PDFRead source from here and followed the directions

I'm a bit dumbfounded as to what I could be doing wrong .... should I post this on the newer thread instead of here?


This is a old thread using an older version of PDFRead. :smack:

For clearer and hopefully better install instructions, see the thread PDFRead 1.8.2 released! (http://www.mobileread.com/forums/showthread.php?t=21906) and in particular the thread PDFRead 1.8.2 working on Mac OS X. (http://www.mobileread.com/forums/showthread.php?p=176233#post176233) :thumbsup:

Please report back if this works for you. Enquiring minds want to know... :)

p.s. and welcome to Mobileread.com!

marcusgennaroz
12-10-2008, 05:05 AM
Hi! I am trying to install PDFread on my Mac running Leopard, and I have the same issue as described above (two messages above!). Has anyone solved this issue for Mac? Any success story? :-)

Thanks!
M.

nrapallo
12-10-2008, 09:03 AM
Hi! I am trying to install PDFread on my Mac running Leopard, and I have the same issue as described above (two messages above!). Has anyone solved this issue for Mac? Any success story? :-)

Thanks!
M.

:blink: Did you see this message just above your post http://www.mobileread.com/forums/showthread.php?p=293243#post293243 ?

If you are installing PDFRead 1.8, then you need to follow the Mac OS X instructions in the thread PDFRead 1.8.2 working on Mac OS X (http://www.mobileread.com/forums/showthread.php?p=176233#post176233).

This is an old thread with an older version of PDFRead. Please post in the newer threads in the links provided.

larryy
10-31-2009, 07:36 PM
I got PDFRead (http://www.mobileread.com/forums/showthread.php?t=21906) (1.8.2) to work reasonably well on Mac OS X 10.5.x, but it took a lot of effort getting all the pieces together, and required some minor hacking of PDFRead itself. Best place to start is the current thread (http://www.mobileread.com/forums/showthread.php?p=64075).

And after hours spent on this, it works exactly as advertised, but the text is just so small for your typical two-column journal article that I'm not sure how much use I'll get out of it. It looks like there might be hope for this from PaperCrop, but that's back to Windows-only, sigh. Anyway, here are some PaperCrop links:
mobileread forum thread (http://www.mobileread.com/forums/showthread.php?t=31677)
home page (http://jupiter.kaist.ac.kr/~taesoo/projects/paperCrop/index_eng.html)
source code (http://code.google.com/p/papercrop/)

I sure would love to see the big PDFRead rewrite incorporate the PDF parsing and re-layout algorithms of PaperCrop, so two-column documents could be gracefully turned into single-column documents (making sure whole-page-width figures don't get chopped), and taken all the way to .prc files. That would be my dream tool. I'm fine with using it from the command-line, though Python GUIs can be made cross-platform, which would be nice, I suppose.

Anyway, a few notes on things I had to do to get PDFRead to be fully functional on Mac OS X 10.5.x are below.

- larryy

---------------------------------------------------------------------------------

I followed the CPAN instructions on this mobiperl page (https://dev.mobileread.com/trac/mobiperl) to install Palm::PDB, XML::Parser::Lite::Tree, GD, Image::BMP, Image::Size, HTML::TreeBuilder, Getopt::Mixed, Date::Parse, and Date::Format.

I also had to install pngnq, which I did using FinkCommander (http://finkcommander.sourceforge.net/), which requires fink (http://www.finkproject.org/), as I already had fink and FinkCommander installed.

That also allowed me to do the following:
sudo apt-get install xpdf imagemagick ghostscript ghostscript-fonts

At a minimum, PDFRead's common.py check_commands() function needs to replace
call(command)
with
call(command, '-h')
as call('gs') hangs, probably waiting for input.

I also had trouble with PDFRead building the html and png files, but not generating the .prc file, until I coaxed the code into telling me what pieces it was missing (in that same check_commands() function). Note: It's okay for rbmake and djvused to be missing, if you're trying to go from PDF to PRC.

Sorry if that's not a complete specification. I pounded on this for hours, and these are my undoubtedly spotty recollections.

nrapallo
11-02-2009, 06:27 PM
I got PDFRead (http://www.mobileread.com/forums/showthread.php?t=21906) (1.8.2) to work reasonably well on Mac OS X 10.5.x, but it took a lot of effort getting all the pieces together, and required some minor hacking of PDFRead itself. Best place to start is the current thread (http://www.mobileread.com/forums/showthread.php?p=64075).

Thanks for sticking it through those rough spots. I updated the PDFRead forum post #1 (http://www.mobileread.com/forums/showthread.php?t=21906) to include a link to your instructions above. :)

And after hours spent on this, it works exactly as advertised, but the text is just so small for your typical two-column journal article that I'm not sure how much use I'll get out of it. It looks like there might be hope for this from PaperCrop, but that's back to Windows-only, sigh. Anyway, here are some PaperCrop links:
mobileread forum thread (http://www.mobileread.com/forums/showthread.php?t=31677)
home page (http://jupiter.kaist.ac.kr/~taesoo/projects/paperCrop/index_eng.html)
source code (http://code.google.com/p/papercrop/)

Yep, I agree that PaperCrop is a useful software program especially when used with the pi algorithm.

I sure would love to see the big PDFRead rewrite incorporate the PDF parsing and re-layout algorithms of PaperCrop, so two-column documents could be gracefully turned into single-column documents (making sure whole-page-width figures don't get chopped), and taken all the way to .prc files. That would be my dream tool. I'm fine with using it from the command-line, though Python GUIs can be made cross-platform, which would be nice, I suppose.

Me too, I would love to see this happen, but my python programming skills are not that strong (to be able to do this alone... :whistle: )

Anyway, a few notes on things I had to do to get PDFRead to be fully functional on Mac OS X 10.5.x are below.

- larryy

---------------------------------------------------------------------------------

I followed the CPAN instructions on this mobiperl page (https://dev.mobileread.com/trac/mobiperl) to install Palm::PDB, XML::Parser::Lite::Tree, GD, Image::BMP, Image::Size, HTML::TreeBuilder, Getopt::Mixed, Date::Parse, and Date::Format.

This is useful for my modified "NRhtml2mobi.pl" adapted from the MobiPerl suite.

I also had to install pngnq, which I did using FinkCommander (http://finkcommander.sourceforge.net/), which requires fink (http://www.finkproject.org/), as I already had fink and FinkCommander installed.

That also allowed me to do the following:
sudo apt-get install xpdf imagemagick ghostscript ghostscript-fonts

At a minimum, PDFRead's common.py check_commands() function needs to replace
call(command)
with
call(command, '-h')
as call('gs') hangs, probably waiting for input.

I also had trouble with PDFRead building the html and png files, but not generating the .prc file, until I coaxed the code into telling me what pieces it was missing (in that same check_commands() function). Note: It's okay for rbmake and djvused to be missing, if you're trying to go from PDF to PRC.

Sorry if that's not a complete specification. I pounded on this for hours, and these are my undoubtedly spotty recollections.

Anyone having any other issues getting this to work? Ask now or forever... :rolleyes:

larryy
11-07-2009, 10:18 PM
...the text is just so small for your typical two-column journal article that I'm not sure how much use I'll get out of it.

Having said that, I thought I should follow up and note that a bit of experimenting with --hres, --vres, and -m portrait-2col produced much more readable text. It splits page-spanning figures in half, but this is closer to being actually usable. I wish there was a -m mode that split the page in half horizontally, like portrait-2col, but broke the pages down into as many pages as necessary to display the columns in landscape mode. That would make the text large enough to be comfortably usable, I think. (Unfortunately, this is not what -m landscape-2col does, or at least I never found parameters that would make it work this way.)

I haven't had time to dig a great deal deeper... Is there perhaps enough fine-grained control to do what I'm asking already?

And, nrapallo, I doubt I can possibly help with changes until the summer, but it's conceivable I could look into importing some of PaperCrop's logic into pdfread then. I'm way not promising, but I'll definitely keep it in mind. The more I can use my Kindle for the reading of journal articles, the happier I'll be!