sendtoiliad.py:
This script will extract metadata from ebooks and write in a form that will be displayed in the contentlister of the Irex Iliad. It would also provide a thumbnail of the cover, replacing the dull, generic file icon.

Formats supported:
Formats support is modular, so new formats can be added by dropping a module in the correct directory. Current modules are:
PDF
Epub (In developement)
Lit (through conversion to Epub) (In developement)

Platform support:
This script was developed on Ubuntu 8.10. I do not have other systems available to test, but it should work on any modern Unix-ish system, including Mac OSX. Windows is not supported, although advanced users may be able to get it to run by changing the mpdule search path. 

Base requirements:
Python with standard libraries (optparse, sys, os, shutil, xml.dom.minidom, datetime)
Python Imaging Libraries (PIL)

PDF Support:
Python standard libries libraries (re, subprocess, cStringIO)
PyPDF (available from http://pybrary.net/pyPdf/)
pdftoppm (for cover thumbnails, part of the xpdf/poppler utils package)

Epub support:
Incomplete

Lit support:
incomplete

Installation:
1. Place sendtoiliad.py in your path. 
2. Place modules (pdf.py) in one of: /usr/share/sendtoiliad, /usr/local/share/sendtoiliad, ~/.sendtoiliad.modules. 
3. Optionally place 'settings' in ~/.sendtoiliad, and edit to your liking. Note that 'True' and 'False' are case-sensitive.

Usage:
Sendtoiliad.py is a commandline tool. Run sendtoiliad.py --help to see help on all options.  Basic usage is as follows:

sendtoiliad.py myebook.pdf
Will create a folder "myebook" in the current directory, with the file, manifest.xml, and cover image, leaving the original file in place.

sendtoiliad.py myebook.pdf -o ~/Iliad/outbox/MMC/books
Will create the folder in your ~/Iliad/outbox/MMC/books, leaving the original file in place.

sendtoiliad.py myebook.pdf -o ~/Iliad/outbox/MMC/books -a ~/Library/New --move
Will create the fold in the outbox as above, place a copy of the original file in ~/Library/New, and remove the original file.

Additional options are available. Please see the help text for a complete list. Also note defaults for all options can be specified in ~/.sendtoiliad/settings. See the included settings file for examples.

Modules
New formats can be created by adding additional modules. A modules is a python modules named after the tag used by the supported format (e.g.: pdf.py is for .pdf files). This module must define a fuction, named init(), which accepts a generic file_info object as an argument and returns a subclass of it.

This object may define any or all of the following methods and properties:

Methods:
  fullscreen(self, manifest): Accepts a minidom object containing the manifest.xml. This should append whatever is needed to instruct the view to open in fullscreen mode, and return the manifest object.
  crop(self, file): Accepts the path to the output file, and does whatever is needed to crop margins to the level specified in the settings file.

Properites:
  file: Text string giving the path of the input file. This is set by the parent class and generally should not be altered.
  title: Text string giving the title of the book (displayed on first line in contentlister).
  author: Text string giving the author of the book (displayed on the second line).
  date: Not currently used.
  pages: Integer giving the number of pages (also displayed on second lne).

All of these methods and properties are optional. If you do not define them, a reasonable default will be inherited from the parent class. None is also an acceptable value for all properties except file.

Questions:
Why is the contentlister text oddly formatted or showing things other than author and title?

Sadly document metadata is often missing or incomplete, especially with PDFs. The script tries to be somewhat intelligent about extracting this information. For pdfs, it will first look at the title and author fields in the file. If these are blank or contain obvious garbage (all too common), it will extract text from the first two pages. The first non-empty line will be assumed to be the author, the second the title. If all of this fails, the filename will be used as the title, and author will be left empty. Like all such heuristics, this is often imperfect, but will hopefully be useful most of the time.
