View Single Post
Old 01-19-2012, 05:24 PM   #1
kiavash
Old Linux User
kiavash began at the beginning.
 
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
Updated recipe for New Scientist

Two small changes in the code that significantly improved the outcome:
  1. Added a procedure to omit duplicates of articles
    Spoiler:
    PHP Code:
    ...
        
    filterDuplicates True
        url_list 
    = []
    ...
        
    def print_version(selfurl):
            if 
    self.filterDuplicates:
                if 
    url in self.url_list:
                    return
            
    self.url_list.append(url)
            return 
    url '?full=true&print=true' 
  2. Added an option to convert images to gray scale if needed
    Spoiler:
    PHP Code:
    ...
        
    Convert_Grayscale True
    ...
        
    def postprocess_html(selfsoupfirst):
            if 
    self.Convert_Grayscale:
                
    #process all the images
                
    for tag in soup.findAll(lambda tagtag.name.lower()=='img' and tag.has_key('src')):
                    
    iurl tag['src']
                    
    img Image()
                    
    img.open(iurl)
                    if 
    img 0:
                        
    raise RuntimeError('Out of memory')
                    
    img.type "GrayscaleType"
                    
    img.save(iurl)
            return 
    soup 

To all the science fans who also love Calibre
Attached Files
File Type: zip new_scientist.zip (2.4 KB, 321 views)

Last edited by kiavash; 01-19-2012 at 05:27 PM. Reason: (sp)
kiavash is offline   Reply With Quote