Two small changes in the code that significantly improved the outcome:
- Added a procedure to omit duplicates of articles
Spoiler:
PHP Code:
...
filterDuplicates = True
url_list = []
...
def print_version(self, url):
if self.filterDuplicates:
if url in self.url_list:
return
self.url_list.append(url)
return url + '?full=true&print=true'
- Added an option to convert images to gray scale if needed
Spoiler:
PHP Code:
...
Convert_Grayscale = True
...
def postprocess_html(self, soup, first):
if self.Convert_Grayscale:
#process all the images
for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
iurl = tag['src']
img = Image()
img.open(iurl)
if img < 0:
raise RuntimeError('Out of memory')
img.type = "GrayscaleType"
img.save(iurl)
return soup
To all the science fans who also love Calibre