Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-03-2011, 06:34 PM   #16
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
Download EBooks in any format from Website

This is building up on "Recipe to download an EPUB from feed" by Starsom17.
You can use it to download all EBooks offered from a News Website, in all formats you like (epub, pdf, mobi, ...).

To see how it works, first take a look at Starsom17's post. His trick is needed to cheat the recipe process so that it gets some epub to work on.

Additionally, this recipe looks for links to other EBook formats, downloads them to a common temporary directory and then applies a system call "calibredb add -1 dir", so that all formats are added to the calibre db as one single logical book.

If there are several logical books to download, you'll need to create a directory and make a system call for each one (or, don't use the -1 option, if there is only one format per book).

Note: I have tested this on Linux and it works fine. Maybe on other OS one has to tweak the system call.

Spoiler:
Code:
import re, zipfile, os
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse

GET_MOBI=False
GET_PDF=True

class DownloadAllFormats(BasicNewsRecipe):

    def build_index(self):
        browser = self.get_browser()

        # find the links (Adjust to your needs!)
        epublink = browser.find_link(text_regex=re.compile('.*Download ePub.*'))
        mobilink = browser.find_link(text_regex=re.compile('.*Download Mobi.*'))
        pdflink = browser.find_link(text_regex=re.compile('.*Download PDF.*'))

        # Cheat calibre's recipe method, as in post from Starsom17
        self.report_progress(0,_('downloading epub'))
        response = browser.follow_link(epublink)
        dir = PersistentTemporaryDirectory()
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=dir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))


        #
        # Now, download the remaining files
        #
        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=dir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=dir)
           browser.back()
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()

        # Get all formats into Calibre's database as one single book entry
        self.report_progress(0.6,_('Adding files to Calibre db'))
        cmd = "calibredb add -1 " + dir
        os.system(cmd)

        return index
achims is offline   Reply With Quote
Old 11-09-2011, 11:01 AM   #17
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Multiple Page Sites

This is not my code, but there have been many requests for code to handle sites where each article is split into multiple pages. At the bottom of each page will be a button to go to the next page. Here is typical code from Darko Miletic's builtin recipe for Adventure Gamers that is used in this situation:

You may want to look at the source for an article at Adventure Gamers with FireBug or equivalent. The append_page code identifies each "next page" button, follows the link it points to ("nexturl"), finds the article text on that next page, inserts that text into the first page beneath the article text found on the first page, and recursively reiterates that process until the last page (identified by not having the "next page" button) is found.

The append_page code is then used in preprocess_html.
Spoiler:
Code:
    INDEX                 = u'http://www.adventuregamers.com'
    def append_page(self, soup, appendtag, position):
        pager = soup.find('div',attrs={'class':'toolbar_fat_next'})
        if pager:
           nexturl = self.INDEX + pager.a['href']
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'bodytext'})
           newpos = len(texttag.contents)
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        pager = soup.find('div',attrs={'class':'toolbar_fat})
        if pager:
           pager.extract()
        return self.adeify_images(soup)
Starson17 is offline   Reply With Quote
Old 11-21-2011, 09:56 PM   #18
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Masthead logos and cover pages images for Kindle Fire

Kindle Fire treats masthead logos differently than its e-ink cousins, and they end up not looking as good as on e-ink readers. The Fire automatically scales the logos and color-inverts them (so black becomes white, red become turquoise, etc.). The logo is displayed an an almost-black background (it's actually a slight gradiant).

The Fire also displays the publication front page on the Newsstand bookshelf, so this encouraged me to go looking for a source of these front page images instead of looking at the default calibre image.

The following code fragments can be inserted into your custom recipe to invoke a custom masthead logo and a front page image (if it's available).
Spoiler:
Code:
    def get_cover_url(self):
        # If your newspaper is represented at http://www.newseum.org/todaysfrontpages/default.asp
        # mouse over its front page image and look for fpVname=<tag> in the URL, and replace NY_NYT
        # with the <tag>. For example, for the New York Times the URL looks like
        #    http://www.newseum.org/todaysfrontpages/hr.asp?fpVname=NY_NYT&ref_pge=gal&b_pge=1
        # so the <tag> is NY_NYT
        from datetime import date
        # Note: this tag is for the New York Times, You must replace NY_NYT with your <tag>
        tag = 'NY_NYT'
        cover = 'http://webmedia.newseum.org/newseum-multimedia/dfp/jpg'+str(date.today().day)+'/lg/'+tag+'.jpg'
        br = BasicNewsRecipe.get_browser()
        try:
            br.open(cover)
        except:
            self.log("\nCover unavailable")
            cover = None
        return cover

    # Provide the path to your custom masthead logo here. This path is for my New York Times logo.
    # Your logo should have a height/width ratio as close to 1/10 as possible.
    # Kindle Fire color-inverts the logo and scales it automatically to fit in a box approximately
    # 25 x 250 pixels. For best results your logo should have a background of R/G/B 211/211/211
    # since this will appear transparent. If you are really picky you can make your backgound a
    # linear gradiant of 211/211/211 at the top to 214/214/214 at the bottom.
    masthead_url="C:\\Users\\Nick\\nytlogo.jpg"
    def prepare_masthead_image(self, path_to_image, out_path):
        from calibre import fit_image
        from calibre.utils.magick import Image, create_canvas
        img = Image()
        img.open(path_to_image)
        img.open(path_to_image)
        width, height = img.size
        img2 = create_canvas(width, height)
        img2.compose(img)
        img2.save(out_path)

Note that when you develop a masthead logo, plan for it to be color-inverted (so if you want the original color, provide the color-inverted version as the logo). The background should be R/G/B 211/211/211 and (after being inverted) it will blend with the Fire background to appear transparent. If you are really picky you can make the background pretty well perfect by using a linear gradiant (top to bottom) of 211/211/211 to 214/214/214. The size of the logo isn't all that important since the Fire will scale it, but logos at least 250 pixels wide will look better than smaller ones since upscaling doesn't work as well as reduction.

I have atached 4 Fire-friendly logos in a ZIP file (NY Times, Wall Street Journal, Globe and Mail, National Post).
Attached Files
File Type: zip logos.zip (91.0 KB, 132 views)

Last edited by Starson17; 11-22-2011 at 09:42 AM.
nickredding is offline   Reply With Quote
Old 01-10-2012, 02:21 AM   #19
kiavash
Old Linux User
kiavash began at the beginning.
 
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
Quote:
Originally Posted by kiklop74 View Post
Let us assume that you have a feed with links that all point to redirected pages. By default Calibre does not handle this case so the safest way of doing this could be summarized like this:

Code:
    def print_version(self, url):
        return self.browser.open_novisit(url).geturl()
Of course similar thing can be done with urllib2 but using internal browser automatically adds support for sites that require login.
Actually you can at the same time get the print page to. Just modify the code to something like this:

PHP Code:
    def print_version(selfurl):
        return 
self.browser.open_novisit(url).geturl().replace('/article.asp?HH_ID=''/Print.asp?Id='
Of course modify the replace part fot your page.
kiavash is offline   Reply With Quote
Old 01-19-2012, 03:35 PM   #20
kiavash
Old Linux User
kiavash began at the beginning.
 
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
Some sites need to submit login information twice. Bellow is an example that worked with MWJournal. It submit the credentials 1st, then saves the outcome to the system temp location, then open it again and submit. In this case the 2nd page didn't have a form a fill so just submit. Some other sites may need more info to be filled then follow normal procedure to fill and submit.

Spoiler:
PHP Code:
    def get_browser(self):
            ...
            
raw br.submit().read()        # submit the form and read the 2nd login page

            # save it to an htm temp file
            
with TemporaryFile(suffix='.htm') as fname:
                
with open(fname'wb') as f:
                    
f.write(raw)
                
br.open_local_file(fname)

            
br.select_form(nr=0)    # finds submit on the 2nd form
            
didwelogin br.submit().read()        # submit it and read the return html
             
...
        return 
br 
kiavash is offline   Reply With Quote
Old 02-10-2012, 02:02 PM   #21
kiavash
Old Linux User
kiavash began at the beginning.
 
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
Embed images into an ebook

Some sites don't include the figures/images into articles and instead the reader needs to click on an href link to see the image/figure. This wouldn't be possible on many ebook readers. To embed the images into output ebook, the tag type needs to be changed from <a> to <img>. Also the "href" property needs to be changed to "src". The following code does the job by looking for all the links to jpg files, then changed them to <img> tags.The code should be included into preprocess_html

Spoiler:
PHP Code:
    def preprocess_html(selfsoup):
    
        
# Includes all the figures inside the final ebook
        # Finds all the jpg links
        
for figure in soup.findAll('a'attrs = {'href' lambda xand 'jpg' in x}):
            
            
# makes sure that the link points to the absolute web address
            
if figure['href'].startswith('/'):
                
figure['href'] = self.site figure['href']
                
            
figure.name 'img' # converts the links to img
            
figure['src'] = figure['href'# with the same address as href
            
figure['style'] = 'display:block' # adds /n before and after the image
            
del figure['href']
            
del figure['target']
        return 
soup 
kiavash is offline   Reply With Quote
Old 06-13-2012, 07:55 PM   #22
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
How to search for a specific part of tag attribute:

Code:
dict(attrs={'someattribute':re.compile('(^|| )somestring($|| )', re.DOTALL)})
For example to remove all tags that have class Sample (along with other clases) this will do the work:

Code:
remove_tags = [
dict(attrs={'class':re.compile('(^|| )Sample($|| )', re.DOTALL)})                   
]
kiklop74 is offline   Reply With Quote
Old 06-14-2012, 12:38 AM   #23
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,435
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@kiklop74: An easier way would be:

Code:
remove_tags = [
dict(attrs={'class':lambda x: x and 'Sample' in x.split()}),
]
kovidgoyal is offline   Reply With Quote
Old 12-09-2012, 02:23 PM   #24
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Sometimes sites can be badly implemented or overloaded so that first fetch of an article fails but second or third passes OK. To add that functionality to the calibre recipe you can use this approach:

Code:
# In the include section add this
from calibre.ptempfile import PersistentTemporaryFile

#later in the recipe class add this
class MyRecipeclass(BasicNewsRecipe):
# ...
    temp_files              = []
    articles_are_obfuscated = True  

# and than somewhere in the class add this method

    def get_obfuscated_article(self, url):
        count = 0
        attempts = 4
        html = None
        while (count < attempts):
            try:
                response = self.browser.open(url)
                html = response.read()
                count = attempts
            except:
                print "Retrying download..."
            count += 1
            
        if html is None:
           pass
           
        tfile = PersistentTemporaryFile('_fa.html')
        tfile.write(html)
        tfile.close()
        self.temp_files.append(tfile)
        
        return tfile.name
Replaces the value of variable attempts to change number of download attempts. This works just fine. That approach was used in Financial Times UK recipe.
kiklop74 is offline   Reply With Quote
Old 02-12-2013, 08:16 AM   #25
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
If you would like to add series support for some of your recipes this is what needs to be done:

Code:
    def get_cover_url(self):
        soup = self.index_to_soup('someurl')
        #determine somehow the series number of the publication
        # and store it in seriesnr variable
        self.conversion_options.update({'series':'My series name'})
        self.conversion_options.update({'series_index':seriesnr})
        # code for cover url if any
        return None
It is usefull for magazines or newspapers where you can easily track the number of publication.

All this applies mostly to EPUB the rest of the formats AFAIK do not offer a chance to store this metadata.
kiklop74 is offline   Reply With Quote
Old 06-25-2013, 07:14 AM   #26
koliberek
Junior Member
koliberek began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2013
Device: K3 (Keyboard)
Can I collect clips, translate them into Polish, put in ebook and publish on the Polish forum? The goal is to make them available to users, who don't speak English. I would like to have permission to publish it.

TIA
koliberek is offline   Reply With Quote
Old 06-25-2013, 10:52 AM   #27
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
I doubt it would be a problem. Kovid is the owner of this forum so it is his call in the end.
kiklop74 is offline   Reply With Quote
Old 06-25-2013, 03:23 PM   #28
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,435
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Feel free to do so, I have no objections.
kovidgoyal is offline   Reply With Quote
Old 06-27-2013, 03:23 AM   #29
koliberek
Junior Member
koliberek began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2013
Device: K3 (Keyboard)
Thanks a lot
koliberek is offline   Reply With Quote
Old 12-16-2013, 05:04 PM   #30
sup
Enthusiast
sup began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Sep 2013
Device: Kindle Paperwhite (2012)
Quote:
Originally Posted by Pahan View Post
Here is a recipe template that keeps track of already downloaded feed items and only downloads items that it hasn't seen before or whose description, content, or URL have changed. It does so by overriding the parse_feeds method.
Some caveats:
  • I recommend setting max_articles_per_feed and oldest_article to very high values. The first time, the recipe will download every item in every feed, but after that, it will "remember" not to do it again and will grab all new articles no matter how much time had elapsed since the last time it had been run and how many entries had been added. In particular, if you set max_articles_per_feed to a small value and the feed is one that lists all articles in a particular order, you might never see new articles.
  • The list of items downloaded for each feed will be stored in "Calibre configuration directory/recipes/recipe_storage/Recipe title/Feed title". This is probably suboptimal, and there ought to be a persistent storage API for recipes, but it's the best I could come up with.
  • The list of items downloaded is written to disk before the items are actually downloaded. Thus, if an item fails to download for some reason, the recipe won't know, and will not try to download it again. This could probably be fixed by writing the new item lists to temporary files and overriding some method later in the sequence to "commit" by overwriting the downloaded item lists with the new lists. (Thus, if the recipe fails before that, it will never get to that point, so the old lists will remain intact and will redownload next time the recipe is run.)
  • If there are no new items to download and remove_empty_feeds is set to True, the recipe will return an empty list of feeds, which will cause Calibre to raise an error. As far as I can tell, there is nothing that the recipe can do about that without a lot more coding.
  • I've tried to make this code portable, but I've only tested it on Linux systems, so let me know if it doesn't work on the other platforms. I am particularly unsure about newline handling.
Spoiler:
Code:
from calibre.constants import config_dir, CONFIG_DIR_MODE
import os, os.path, urllib
from hashlib import md5

class OnlyLatestRecipe(BasicNewsRecipe):
    title          = u'Unknown News Source'
    oldest_article = 10000
    max_articles_per_feed = 10000
    feeds          = [ ]

    def parse_feeds(self):
        recipe_dir = os.path.join(config_dir,'recipes')
        hash_dir = os.path.join(recipe_dir,'recipe_storage')
        feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':'))
        if not os.path.isdir(feed_dir):
            os.makedirs(feed_dir,mode=CONFIG_DIR_MODE)

        feeds = BasicNewsRecipe.parse_feeds(self)

        for feed in feeds:
            feed_hash = urllib.quote(feed.title.encode('utf-8'),safe='')
            feed_fn = os.path.join(feed_dir,feed_hash)

            past_items = set()
            if os.path.exists(feed_fn):
               with file(feed_fn) as f:
                   for h in f:
                       past_items.add(h.strip())
                       
            cur_items = set()
            for article in feed.articles[:]:
                item_hash = md5()
                if article.content: item_hash.update(article.content.encode('utf-8'))
                if article.summary: item_hash.update(article.summary.encode('utf-8'))
                item_hash = item_hash.hexdigest()
                if article.url:
                    item_hash = article.url + ':' + item_hash
                cur_items.add(item_hash)
                if item_hash in past_items:
                    feed.articles.remove(article)
            with file(feed_fn,'w') as f:
                for h in cur_items:
                    f.write(h+'\n')

        remove = [f for f in feeds if len(f) == 0 and
                self.remove_empty_feeds]
        for f in remove:
            feeds.remove(f)

        return feeds
This is a simple version of the above method that does not keep track of changes and assumes that what was once put online never changes (which is generally not true but for some feeds is). Also, it is using the parse_index method instead of parse_feeds as it assumes you to scrap a website. The same caveats but the first one apply. This recipe only keeps the last twenty articles for any given section - if you need more, change the limit.
Code:
Spoiler:
from calibre.constants import config_dir, CONFIG_DIR_MODE import os def parse_index(self): # Read already downloaded articles recipe_dir = os.path.join(config_dir,'recipes') old_articles = os.path.join(recipe_dir,self.title.encode('utf-8').replace('/',':')) past_items = [] if os.path.exists(old_articles): with file(old_articles) as f: for h in f: l = h.strip().split(" ") past_items.append((l[0]," ".join(l[1:]))) old_urls = [x[0] for x in past_items] count_items = {} current_items = [] # Keep a list of only 20 latest articles for each section past_items.reverse() for item in past_items: if item[1] in count_items.keys(): if count_items[item[1]] < 20: count_items[item[1]] += 1 current_items.append(item) else: count_items[item[1]] = 1 current_items.append(item) current_items.reverse() # do stuff to get 'list_of_articles' containing dictionnaries in the form like this {'title':title,'url':url} # and to get variable 'feed_name'; see the following link for details: # http://manual.calibre-ebook.com/news_recipe.html#calibre.web.feeds.news.BasicNewsRecipe.parse_index ans = [] for article in list_of_articles if article['url'] not in old_urls: current_items.append((article['url'],feed_name)) ans.append((feed_name,list_of articles # Write already downloaded articles with file(old_articles,'w') as f: f.write('\n'.join('{} {}'.format(*x) for x in current_items)) return ans

Last edited by sup; 01-14-2014 at 01:50 PM.
sup is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DR800 The working (usable) screen resolution PaulS iRex 7 04-23-2010 01:27 PM
Let's create a source code repository for DR 800 related code? jraf iRex 3 03-11-2010 01:26 PM
any usable epub reader? janw iRex 10 09-04-2009 01:25 PM
FICTIONWISE, still usable? jcbeam Amazon Kindle 4 03-19-2009 02:17 PM
iLiad usable for scientists? doctorow iRex 5 08-14-2006 06:00 PM


All times are GMT -4. The time now is 02:39 PM.


MobileRead.com is a privately owned, operated and funded community.