pdftoiliad

IanHelgesen · 04-29-2008, 05:21 PM

Attached is a simple script I knocked together this morning in a fit of creative procrastination. This will take one or more pdfs, attempt to extract metadata (title, author, # of pages), create a thumbnail of the first page, and create a directory with the pdf and manifest.xml file, ready for the iliad.

Caveats:
This is the result of a morning's hacking, so it _will_ have bugs. The only one I'm aware of at the moment is that it will fail if the file name has spaces (and probably with other special characters too). I can't seem to get the imaging libraries (used to create the thumbnail) to load these files. Any suggestions on fixing this would be greatly appreciated, as would reports on other problems.

Requirements:
This was made for Unix-ish systems. I'm using it on Ubuntu (8.04), but it should work fine on other versions of Linux, and probably on Mac OS X. While it could theoretically be made to work on Windows, I'm not sure exactly what would need to be done. Also, it has no graphical interface, so it would not be terribly useful without considerable work.

You will need:
Python, Python Imaging Libraries, Popplerutils (pdfinfo, pdftoppm)

Example usage:
pdftoiliad myfile.pdf
This will create output myfile in the current directory.

pdftoiliad -o /mnt/Iliad/Books myfile.pdf
This will place the output in your Iliad books directory (assuming /mnt/Iliad is the Iliad's mount point)

pdftoiliad -m myfile.pdf
This will remove the original file.

I have this set up with nautilus actions so that I can simply right click on a file and have it sent to a directory which is synced to my iliad. To do this (in Gnome), install Nautilus Actions, and create a "Send to Iliad" action. Set the path to "/usr/local/bin/pdftoiliad.py -m -o /home/Library/Iliad/Books" (replacing /home/Library/Iliad/Books with whereever you want to keep your books), and the Parameters to %m. In the conditions tab, enter "*.pdf" in the Filenames spot, and "application/pdf" for Mimetypes.

Cheers,
Ian

kusmi · 04-30-2008, 04:35 AM

Great stuff!

I will try this out and include it in my "iLiad-workflow" on my linux server!

IanHelgesen · 07-02-2008, 02:14 AM

So, the other day I was digging into my copy of the Baen library, and was reminded why I wrote this in the first place. I decided to try to extend this to other formats, so that I can have a nice, pretty Baen library to go with my Tor ebooks. While I haven't gotten to actually adding any of these other formats yet, preparing to add them led to numerous bug fixes, several new feature, and major refactoring for better code quality and error handling. This has been a big enough improvement that I've decided to upload my new and improved version now.

Major changes:
Format support is modularised. Only PDF is currently included, but I will hopefully be releasing ePub and possibly others in the not to distant future.

Problems with spaces/special characters in the file name are no more.

New options:
(-f, --fullscreen): Open in fullscreen mode by default. You will need a copy of ipdf with the fullscreen patch installed for this to have any effect.

(-a, --archive): Place a copy of the original file in the specified location.

All known bugs are fixed and error handling is greatly improved. While I cannot guarrenty that there are not bugs remaining, the script should at least fail gracefully. Most of the commandline hacker is removed, thanks to the PyPDF module, which should make things more robust.

Default options can be set via settings file.

Code is much less ugly. Feedback from more experience python programmers is still welcome.

Installation:
1. Install python with the standard libraries, python imaging libraries (PIL), and PyPDF (from http://pybrary.net/pyPdf/)

2. Place 'sendtoiliad.py' in your path, or wherever else you want it.

3. Place 'pdf.py' in one of /usr/share/sendtoiliad, /usr/local/share/sendtoiliad, ~/.sendtoiliad/modules

4. Optionally place 'settings' in ~/.sendtoiliad, and edit to your liking. Note that 'True' and 'False' are case-sensitive.

IanHelgesen · 07-05-2008, 12:52 AM

I've finished inital epub support. If you already have the software installed, all you need to do is download epub.py.txt, remove the .txt extension, and drop it in the modules directory. I've also attached the complete package. This has worked beautifully for the Baen library, converted from .lit using ConvertLit and OEBtoePub (see attached screenshot). It will also pick up author and title from ePubs downloaded from Feedbooks.com, although these will lack cover thumbnails.

Notes on epub support:

Page count: Currently, page counts are not calculated. Pages are a bit of a fuzzy subject with reflowable formats, so this is more difficult to do. I do plan to attempt to add this in a later version (using FBReader's definition of a page).

Cover art: As far as I have been able to find (please correct me if I've missed something), epub does not specify a standard for cover art. This makes reliable thumbnailing a bit difficult. Currently this script first checks for lines created when using ConvertLit and OEBtoePub to create an ePub file from a .lit. If these are not found, the first image referenced in the .opf manifest will be used. If no images are in this list, no icon will be made.

Metadata: ePub metadata is generally filled out correctly, so this does not attempt the heuristics I used with PDFs.

kusmi · 07-14-2008, 05:28 AM

When using your great script, I saw, that the cover.png is not antialiased, so I tried to modify your script and just changed:

cover.thumbnail(coversize) to
cover.thumbnail(coversize, Image.ANTIALIAS)

but then it does not create any image at all :-) I guess this is a bug in the python imaging library - or on my python installation... Do you also see this problem, when you change it to use antialiasing?

thanks!

kusmi · 07-14-2008, 05:55 AM

ok, I solved it, I had to add an "import" of the Image class, so the changes to produce Antialiased cover pictures are:

change line:
import optparse, sys, os, shutil, xml.dom.minidom, ImageOps, datetime, configobj

to:
import optparse, sys, os, shutil, xml.dom.minidom, ImageOps, datetime, configobj, Image

and:
cover.thumbnail(coversize)

to:
cover.thumbnail(coversize, Image.ANTIALIAS)

kusmi · 07-14-2008, 11:54 AM

I also added a bit better contrast (if the title page of the pdf only contains text, the cover icon is hard to read), so the cover code now looks like this:

Code:

#-------- Create thumbnail --------
def make_cover(cover, out):
    """Takes a PIL image and returns the image transformed
    to be used as a thumbnail on the Iliad"""
    if cover:
        try:
            coversize = (69, 93)
            out = os.path.join(out, "cover.png")
            cover = ImageOps.grayscale(cover)
            cover.thumbnail(coversize, Image.ANTIALIAS)
            cover = ImageOps.autocontrast(cover,0)
            cover.save(out)
            return True
        except: return False
    else:
        return False

Again, many thanks for your great tool - it is really useful!!!

IanHelgesen · 07-14-2008, 06:01 PM

Quote:

Originally Posted by kusmi

I also added a bit better contrast (if the title page of the pdf only contains text, the cover icon is hard to read), so the cover code now looks like this:

Code:

#-------- Create thumbnail --------
def make_cover(cover, out):
    """Takes a PIL image and returns the image transformed
    to be used as a thumbnail on the Iliad"""
    if cover:
        try:
            coversize = (69, 93)
            out = os.path.join(out, "cover.png")
            cover = ImageOps.grayscale(cover)
            cover.thumbnail(coversize, Image.ANTIALIAS)
            cover = ImageOps.autocontrast(cover,0)
            cover.save(out)
            return True
        except: return False
    else:
        return False

Again, many thanks for your great tool - it is really useful!!!

Nice work! I'm uploading a new version with your changes. It's nice to see that other people are finding this useful as well.

04-30-2008, 04:35 AM	#2
kusmi Connoisseur Posts: 73 Karma: 16 Join Date: Jul 2006 Location: Zurich, Switzerland	Great stuff! I will try this out and include it in my "iLiad-workflow" on my linux server!

07-14-2008, 05:28 AM	#5
kusmi Connoisseur Posts: 73 Karma: 16 Join Date: Jul 2006 Location: Zurich, Switzerland	When using your great script, I saw, that the cover.png is not antialiased, so I tried to modify your script and just changed: cover.thumbnail(coversize) to cover.thumbnail(coversize, Image.ANTIALIAS) but then it does not create any image at all :-) I guess this is a bug in the python imaging library - or on my python installation... Do you also see this problem, when you change it to use antialiasing? thanks!

07-14-2008, 05:55 AM	#6
kusmi Connoisseur Posts: 73 Karma: 16 Join Date: Jul 2006 Location: Zurich, Switzerland	ok, I solved it, I had to add an "import" of the Image class, so the changes to produce Antialiased cover pictures are: change line: import optparse, sys, os, shutil, xml.dom.minidom, ImageOps, datetime, configobj to: import optparse, sys, os, shutil, xml.dom.minidom, ImageOps, datetime, configobj, Image and: cover.thumbnail(coversize) to: cover.thumbnail(coversize, Image.ANTIALIAS)

Advert

Advert