View Single Post
Old 07-15-2012, 09:57 AM   #25
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,653
Karma: 1137414
Join Date: Jan 2010
Location: France
Device: Many android devices
The following seems to work, but I make no guarantees. It produces a list of numbers and a list of titles. The cruft in the middle is necessary to filter out ancillary text such as "aka". As far I can tell from brief looks, the numbers and titles correspond until the numbers run out. The titles after the numbers run out seem to be anthologies or other "non-numbered" books.

This script runs with calibre-debug -e

Code:
from lxml import html
import urllib2
from calibre import browser
from contextlib import closing

url = 'http://www.fantasticfiction.co.uk/p/james-patterson/'
br = browser()
with closing(br.open(url, timeout=10)) as f:
    doc = html.fromstring(f.read())
    for data in doc.xpath(('//div[@class="sectionleft"]')):
        t = data.xpath('./text()')
        numbers = []
        for x in t:
            try:
                f = float(x)
                numbers.append(int(f))
            except:
                pass
        books = data.xpath('a[contains(@href,".htm")]/text()')
        print len(numbers), len(books), numbers, books
chaley is offline   Reply With Quote