Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-18-2009, 05:36 AM   #241
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
Physicstoday.org

Now for physicstoday.org. Pretty much the same deal. Needed login for some articles. This is essentually the entire magazine, so I think it'll be quite useful. Could you help me with the login again?
Code:
import re

class AdvancedUserRecipe1234950056(BasicNewsRecipe):
    title          = u'Physicstoday'
    oldest_article = 30
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_tags_before = dict(name='h1')
    remove_tags_after   = [dict(name='div', attrs={'id':'footer'})]

    feeds          = [(u'All', u'http://www.physicstoday.org/feed.xml')]
Hypernova is offline  
Old 02-18-2009, 12:53 PM   #242
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for Serbian newspaper Press:
Attached Files
File Type: zip pressonline.zip (2.4 KB, 362 views)
kiklop74 is offline  
Advert
Old 02-18-2009, 12:59 PM   #243
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Smile

Quote:
Originally Posted by Hypernova View Post
Now for physicstoday.org. Pretty much the same deal. Needed login for some articles. This is essentually the entire magazine, so I think it'll be quite useful. Could you help me with the login again?
Code:
import re

class AdvancedUserRecipe1234950056(BasicNewsRecipe):
    title          = u'Physicstoday'
    oldest_article = 30
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_tags_before = dict(name='h1')
    remove_tags_after   = [dict(name='div', attrs={'id':'footer'})]

    feeds          = [(u'All', u'http://www.physicstoday.org/feed.xml')]
Since I believe in teaching in a man to fish, here's the login code from physics world

Code:
needs_subscription = True
def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            br.open('http://physicsworld.com/cws/sign-in')
            br.select_form(nr=1)
            br['username'] = self.username
            br['password'] = self.password
            br.submit()
        return br
kovidgoyal is offline  
Old 02-18-2009, 06:46 PM   #244
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Ars Technica Now Fetching Entire Article! Super!

Quote:
Originally Posted by kiklop74 View Post
Updated recipe Ars technica with multipage news support
kiklop74,

Your latest revised Ars Technica recipe seems to be working fine. Thanks a million.

I guess this segment of your code is what fetches articles continued across multiple pages:

Code:
def append_page(self, soup, appendtag, position):
        pager = soup.find('div',attrs={'id':'pager'})
        if pager:           
           for atag in pager.findAll('a',href=True):
               str = self.tag_to_string(atag)
               if str.startswith('Next'):
                  soup2 = self.index_to_soup(atag['href'])
                  texttag = soup2.find('div', attrs={'class':'news-item-text'})
                  for it in texttag.findAll(style=True):
                      del it['style']
                  newpos = len(texttag.contents)          
                  self.append_page(soup2,texttag,newpos)
                  texttag.extract()
                  pager.extract()
                  appendtag.insert(position,texttag)
Again, thanks.

Xanthan Gum

Last edited by XanthanGum; 02-18-2009 at 06:49 PM. Reason: To correct code entry
XanthanGum is offline  
Old 02-18-2009, 08:07 PM   #245
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
Physics Today magazine recipe

Quote:
Originally Posted by kovidgoyal View Post
Since I believe in teaching in a man to fish, here's the login code from physics world
I agree with you completely. However, I did try to understand the login code from user manual, but failed. Fortunately, physicstoday is actually closed enough to nytime so that I can use the same code. So here it is.

EDIT: I tried the recipe and it works. So I add some infomation (author, change the class name, etc.) but for some reason that make the login failed. I'm investigating this.

EDIT 2: Seem like I just trying sucessively too many times. It works fine as long as you don't fetch like two times in five minutes, I think.

Last edited by Hypernova; 02-19-2009 at 02:06 AM. Reason: Better version recipe in the next post
Hypernova is offline  
Advert
Old 02-19-2009, 02:04 AM   #246
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
I see Physicstoday in 0.4.138 . But please note that I've updated the recipe. It works much better than the old one that included with calibre.

I notice a problem though. EPUB output give me "Protected Page" on my PRS-505 for every page except the Table of Content. I'm investigating this. My guess is it's the reader fault.
Attached Files
File Type: zip Physicstoday.zip (749 Bytes, 396 views)

Last edited by Hypernova; 02-19-2009 at 02:07 AM.
Hypernova is offline  
Old 02-19-2009, 09:20 AM   #247
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Aparently people from Harper's Magazine decided to completely remove text version of their printed edition articles leaving only PDF and image version. That change is applied as of March 2009 edition. This means that recipe for printed edition will stop working.

I will see if there is any chance of manipulating pdf format, but since I know how tough format that is I do not expect much. However the recipe might be modified in such way to at least enable download of older issues.

Is there interrest for such thing?
kiklop74 is offline  
Old 02-19-2009, 03:37 PM   #248
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
Is there a way to make calibre go to the print edition when the link pointing there on the article just have "http://ptonline.aip.org/servlet/PrintPTJ"? My Physicstoday recipe has a problem with epub because, on the reader, it will try to render the highslide?(a box will pop up when you click to picture so you can see the bigger version & some explantion) on the main body of the article. Printer-friendy version does not have this, but I have no idea how to point calibre to get it.
Hypernova is offline  
Old 02-19-2009, 04:15 PM   #249
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
you could just remove the highslide box using remove_tags

Ingeneral the print version needs a unique URL per article to work.
kovidgoyal is offline  
Old 02-19-2009, 04:40 PM   #250
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
I did what you suggested. But the highslide box include some explanations of the picture, which is quite crucial. Is there a way to leave it at the end of the article or something? I see that the html actuall have the highslide box contents at the end, but I'm not sure how to keep it. It goes like this
Code:
<div class="highslide-html-content" id="highslide-html">
	<div class="highslide-header">
		<ul>
			<li class="highslide-move"><a href="#" onclick="return false">Move</a></li>
			<li class="highslide-close"><a href="#" onclick="return hs.close(this)">Close</a></li>
		</ul>	    
	</div>
	<div class="highslide-body">
 
<body>
<div id="figure">
  <div align="center"><table width="100%" border="0" cellspacing="5" cellpadding="1">
  <tr>
    <td><img src="/journals/doc/PHTOAD-ft/vol_62/iss_2/images/40_1fig1a.jpg" alt="Figure" width="630" height="420" />&nbsp;</td>
  </tr>
  <tr>
    <td><img src="/journals/doc/PHTOAD-ft/vol_62/iss_2/images/40_1fig1b.jpg" alt="Figure" width="511" height="408" />&nbsp;</td>
  </tr>
</table>
  </div>
  
<p><strong>Figure 1.</strong> Snapshots of high-school physics.<strong> (a)</strong>***Some long explanation here***</a>.)</p>
 
</div>
</body>
</div>
    <div class="highslide-footer">
        <div>
            <span class="highslide-resize" title="Resize">
                <span></span>            </span>        </div>
    </div>
</div>
For now, I attached the new one with cover added and remove all highslide contents.
Attached Files
File Type: zip Physicstoday.zip (857 Bytes, 361 views)

Last edited by Hypernova; 02-19-2009 at 04:43 PM. Reason: Slide mistake in the recipe
Hypernova is offline  
Old 02-19-2009, 04:48 PM   #251
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It should be doable by using the postprocess_html method, which allows you to perform arbitrary manipulations on the downloaded html just before it is saved.

So what you will need to do is for each such image figure out the corresponding text and add it ina <p> after the image.

The postproces_html method is passed two parameters a BeautifulSoup instance and a boolean indicating if the HTML is the first page of the article or not. You can use the soup parameter to perform the manipulations. See the documentation of the BeautifulSoup package to understand how to use it.
kovidgoyal is offline  
Old 02-20-2009, 03:37 AM   #252
Emm3t
Junior Member
Emm3t began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2009
Location: Spain
Device: Sony PRS-505
Economist Feed

Is anyone else having problems with The Economist feed? I'm using the latest version of Calibre (0.4.138), but it just appears to get the titles and not the body.

Expecting user-error, but I can't see what's up.

If additional information is required, please let me know what you need.

Thanks

Emmet
Emm3t is offline  
Old 02-20-2009, 12:40 PM   #253
kitzj0
Member
kitzj0 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
I posted a couple of days ago a problem I was having with a feed to my local paper and kiklop74 was kind to provide assistance. The code is:

Code:
class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    encoding = 'cp1252'
    extra_css = ' p {font-size: medium; font-weight: normal;} '
    
    keep_only_tags = [dict(name='div', attrs={'class':'padding'})]
    
    remove_tags = [
                     dict(name=['object','link','table','embed'])
                    ,dict(name='div',attrs={'id':'pluckcomments'})
                    ,dict(name='div',attrs={'class':'articleflex-container'})
                  ]
   
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
This worked if I manually put the generated epub file into the Sony library software to transfer to my reader. I downloaded the new version of Calibre and now I can use calibre to transfer over the file and I am getting text for the article. However, now in the generated epub file I am getting this overlaying the text:

Starting first parse
.Parsing macro pluck_InitializeArticles
..Build 3: 953 ms (Article)
...Build 3: 46 ms (Article)
..Build 9: 187 ms (Content)
.Completed macro pluck_InitializeArticles
.Build 0: 16 ms (Misc)
.Build 3: 2984 ms (Article)
.Parsing macro seo
..Build 0: 0 ms (Misc)
.Completed macro seo
.Parsing macro sitecatalyst
..Build 0: 0 ms (Misc)
.Completed macro sitecatalyst
..Build 3: 62 ms (Article)
.Parsing macro footer_local
--> Starting first parse
.Build 0: 16 ms (Misc)
.Build 3: 31 ms (Article)
.Build 9: 0 ms (Content)
Retrieve categories: 0ms
Read templates: 0ms
Read objects: 0ms
Scripts: 0ms

the message goes on for several lines. This happens regardless of using the Sony library software or calibre to transfer the feed to the device. Is this a bug?

Last edited by kitzj0; 02-20-2009 at 12:45 PM.
kitzj0 is offline  
Old 02-20-2009, 12:45 PM   #254
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The format of the Economist webste changed, fix will be in the next release, in the meantime here's the updated recipe

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
economist.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

import mechanize, string
from urllib2 import quote

class Economist(BasicNewsRecipe):
    
    title = 'The Economist'
    language = _('English')
    __author__ = "Kovid Goyal"
    description = 'Global news and current affairs from a European perspective'
    oldest_article = 7.0
    needs_subscription = False # Strange but true
    INDEX = 'http://www.economist.com/printedition'
    remove_tags = [dict(name=['script', 'noscript', 'title'])]
    remove_tags_before = dict(name=lambda tag: tag.name=='title' and tag.parent.name=='body')
    
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            req = mechanize.Request('http://www.economist.com/members/members.cfm?act=exec_login', headers={'Referer':'http://www.economist.com'})
            data = 'logging_in=Y&returnURL=http%253A%2F%2Fwww.economist.com%2Findex.cfm&email_address=username&pword=password&x=7&y=11'
            data = data.replace('username', quote(self.username)).replace('password', quote(self.password))
            req.add_data(data)
            br.open(req).read()
        return br
    
    def parse_index(self):
        soup = BeautifulSoup(self.browser.open(self.INDEX).read(),
                             convertEntities=BeautifulSoup.HTML_ENTITIES)
        index_started = False
        feeds = {}
        ans = []
        key = None
        for tag in soup.findAll(['h1', 'h2']):
            text = ''.join(tag.findAll(text=True))
            if tag.name == 'h1':
                if 'Classified ads' in text:
                    break
                if 'The world this week' in text:
                    index_started = True
                if not index_started:
                    continue
                text = string.capwords(text)
                if text not in feeds.keys():
                    feeds[text] = []
                if text not in ans:
                    ans.append(text)
                key = text
                continue
            if key is None:
                continue
            a = tag.find('a', href=True)
            if a is not None:
                url=a['href'].replace('displaystory', 'PrinterFriendly') 
                if url.startswith('/'):
                    url = 'http://www.economist.com' + url
                article = dict(title=text, 
                    url = url,
                    description='', content='', date='')
                feeds[key].append(article)
                
        ans = [(key, feeds[key]) for key in ans if feeds.has_key(key)]
        return ans
kovidgoyal is offline  
Old 02-20-2009, 01:07 PM   #255
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Will be fixed in the next release

Quote:
Originally Posted by kitzj0 View Post
I posted a couple of days ago a problem I was having with a feed to my local paper and kiklop74 was kind to provide assistance. The code is:

Code:
class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    encoding = 'cp1252'
    extra_css = ' p {font-size: medium; font-weight: normal;} '
    
    keep_only_tags = [dict(name='div', attrs={'class':'padding'})]
    
    remove_tags = [
                     dict(name=['object','link','table','embed'])
                    ,dict(name='div',attrs={'id':'pluckcomments'})
                    ,dict(name='div',attrs={'class':'articleflex-container'})
                  ]
   
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
This worked if I manually put the generated epub file into the Sony library software to transfer to my reader. I downloaded the new version of Calibre and now I can use calibre to transfer over the file and I am getting text for the article. However, now in the generated epub file I am getting this overlaying the text:

Starting first parse
.Parsing macro pluck_InitializeArticles
..Build 3: 953 ms (Article)
...Build 3: 46 ms (Article)
..Build 9: 187 ms (Content)
.Completed macro pluck_InitializeArticles
.Build 0: 16 ms (Misc)
.Build 3: 2984 ms (Article)
.Parsing macro seo
..Build 0: 0 ms (Misc)
.Completed macro seo
.Parsing macro sitecatalyst
..Build 0: 0 ms (Misc)
.Completed macro sitecatalyst
..Build 3: 62 ms (Article)
.Parsing macro footer_local
--> Starting first parse
.Build 0: 16 ms (Misc)
.Build 3: 31 ms (Article)
.Build 9: 0 ms (Content)
Retrieve categories: 0ms
Read templates: 0ms
Read objects: 0ms
Scripts: 0ms

the message goes on for several lines. This happens regardless of using the Sony library software or calibre to transfer the feed to the device. Is this a bug?
kovidgoyal is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:42 AM.


MobileRead.com is a privately owned, operated and funded community.