Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-30-2010, 05:47 PM   #1
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
postprocess_html

i have a recipe that i am working on.
it has a few tags in the middle od the article text like this:
<p>&nbsp;&nbsp;</p>
and some like this:
<p>&nbsp;</p>
there is now way to remove them with remove_tag.
thought of something like this:
Spoiler:
Code:
    def postprocess_html(self, soup):
        print 'the soup is', soup
        for tag in soup.findAll(name=['p']):
            print tag
            text= tag.contents()
            print text
            if text = '&nbsp;':
                tag.extract ()


once i add this function, the recipe does not give me any articles. am i using it right?

on an other recipe i am working on i want to use the description form the rss feed replace with a tag in the article it self.

can i the description as one of the variables that postprocess_html gets?
what is the name of the description variable in calibre?
something along the lines of
Spoiler:
Code:
    def postprocess_html(self, soup, description):
        print 'the soup is', soup
        for tag in soup.findAll(name=['td']):
            print tag
            text= tag.id
            print text
            if text ='titleContainer1':
                tag.replaceWith (description)

or something like that?
marbs is offline   Reply With Quote
Old 10-31-2010, 12:38 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by marbs View Post
thought of something like this:
Spoiler:
Code:
    def postprocess_html(self, soup):
        print 'the soup is', soup
        for tag in soup.findAll(name=['p']):
            print tag
            text= tag.contents()
            print text
            if text = '&nbsp;':
                tag.extract ()


once i add this function, the recipe does not give me any articles. am i using it right?
You are using "=" the assignment operator. You want "==" the comparison operator. You could also use
Code:
    preprocess_regexps = [
        (re.compile(r'<p>&nbsp;</p>', re.DOTALL|re.IGNORECASE), lambda match: '')
        ]
Quote:
can i the description as one of the variables that postprocess_html gets?
what is the name of the description variable in calibre?
The name is "text_summary", available in parse_feeds
Code:
    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.text_summary is: ', article.text_summary
      return feeds
I've never used it outside of parse_feeds, that part is up to you.
Starson17 is offline   Reply With Quote
Advert
Old 10-31-2010, 02:56 PM   #3
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
the preprocess_regexps was a stroke of genius. very much!

as for the pre/postprocess_html, i just cant get the article soup and the article.text_summary to meet in one function. have any ideas?
marbs is offline   Reply With Quote
Old 10-31-2010, 04:05 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Put the article.text_summary into a list with append in parse_feeds, increment a counter in postprocess_html to extract it?
BTW, I did test this. I put an article_title_list = [] and index n = 0 outside the functions, then used self.article_title_list.append(article.text_summar y) inside parse_feeds and
self.article_title_list[self.n] inside postprocess_html to access it and self.n = self.n + 1 inside postprocess_html to increment the index. I didn't check to see that the order of the created list correctly matched the order the articles were accessed by postprocess_html, so you may have to deal with that.

Last edited by Starson17; 11-01-2010 at 02:11 PM.
Starson17 is offline   Reply With Quote
Old 10-31-2010, 04:09 PM   #5
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
got the idea. i think i can figure that out. let me work on it a bit...

any new thoughts on my maya recipe?
marbs is offline   Reply With Quote
Advert
Old 10-31-2010, 04:15 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by marbs View Post
any new thoughts on my maya recipe?
Yes. What is that site?
Starson17 is offline   Reply With Quote
Old 10-31-2010, 04:23 PM   #7
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
what site? i mean this thread
marbs is offline   Reply With Quote
Old 10-31-2010, 04:54 PM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by marbs View Post
what site? i mean this thread
I know. The recipe is for a site. What is that site?
Starson17 is offline   Reply With Quote
Old 10-31-2010, 05:02 PM   #9
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
the site is "maya.tase.co.il".

i feel i should explain more exactly what it is. it is the local stock exchange reports. on the 1st page that opens you get all the reports from all the companies for that day.

you may want all the reports from yesterday.

you might also want all the reports from one company going back to 1/1/2000.

i thought this would be possible, but they are really not making my life easy.
marbs is offline   Reply With Quote
Old 11-01-2010, 07:54 AM   #10
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
how do i use extra_css to make the article it self go rtl. i am missing the XXXX. extra_css='XXXX{direction: rtl;}'
marbs is offline   Reply With Quote
Old 11-01-2010, 09:04 AM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by marbs View Post
how do i use extra_css to make the article it self go rtl. i am missing the XXXX. extra_css='XXXX{direction: rtl;}'
I assume you want right-to-left, and I have no idea how to do that. It's not something I do a lot.
Starson17 is offline   Reply With Quote
Old 11-01-2010, 09:08 AM   #12
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
i have done it before. its my curse.

in any case all i need to know is how to change the CSS on the article body it self. in other words, the article body name...
marbs is offline   Reply With Quote
Old 11-01-2010, 10:48 AM   #13
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by marbs View Post
i have done it before. its my curse.

in any case all i need to know is how to change the CSS on the article body it self. in other words, the article body name...
Code:
    extra_css             = ' body{whatever} '
Starson17 is offline   Reply With Quote
Old 11-01-2010, 11:26 AM   #14
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
ill give it a try.

you haven't shared your idea about maya yet.
marbs is offline   Reply With Quote
Old 11-01-2010, 02:12 PM   #15
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Put the article.text_summary into a list with append in parse_feeds, increment a counter in postprocess_html to extract it?
BTW, I did test this. I put an article_title_list = [] and index n = 0 outside the functions, then used self.article_title_list.append(article.text_summar y) inside parse_feeds and
self.article_title_list[self.n] inside postprocess_html to access it and self.n = self.n + 1 inside postprocess_html to increment the index. I didn't check to see that the order of the created list correctly matched the order the articles were accessed by postprocess_html, so you may have to deal with that.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump


All times are GMT -4. The time now is 02:14 AM.


MobileRead.com is a privately owned, operated and funded community.