Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-13-2011, 12:04 PM   #1
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
Ukrainian Legal News site - plishing the feed

Hi, can somebody help me with this one and tell me what's wrong with this code:

Code:
class AdvancedUserRecipe1300026627(BasicNewsRecipe):
    title          = u'Liga Zakon'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_tags_before = dict(name='div ', attrs={'class':'news_title_7 content'})
    remove_tags_after = dict(id='main_content')
    no_stylesheets = True

    feeds          = [(u'\u041b\u0456\u0433\u0430 \u0417\u0430\u043a\u043e\u043d', u'http://news.ligazakon.ua/news_rss/tape_articles.xml')]
somehow remove_tags_before doesn't do it's job: feed remains polluted with the stuff.

am I doing something wrong? Or is there a better way to polish the feed?

defining this parameter with id='main_content' works perfectly, but main_content doesn't include article's title, which is a bit inconvenient.

Thanks for help!

Question on a side note: since this recipe has only one feed, how can it be modified, so it jumps to the contents directly, without creating a list of feeds, with only one link?

Last edited by Dereks; 03-13-2011 at 12:06 PM.
Dereks is offline   Reply With Quote
Old 03-13-2011, 01:20 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Dereks View Post
somehow remove_tags_before doesn't do it's job: feed remains polluted with the stuff.
Usually that just means what you think is first/last isn't where you think it is. Print out the soup and look at it with
Code:
    def preprocess_html(self, soup):
        print 'The soup is: ',soup
        return soup
Starson17 is offline   Reply With Quote
Old 03-13-2011, 03:48 PM   #3
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
Sorry, I'm relatively inexperienced with this. What do you mean "print out the soup"? I've included the function "def preprocess_html" into the recipe and then tried to search for "soup" word in the feed through calibre reader. found nothing....I gave you the code to print the soup. Look in the output to see the code in the order that Calibre sees it.

Last edited by Starson17; 03-14-2011 at 07:59 AM.
Dereks is offline   Reply With Quote
Old 03-14-2011, 12:53 PM   #4
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
ok. I now read a bit of documentation on Beautiful Soup and understand what it does. What I still can't get is how to "print" it practically. If include this function into the recipe - it will be performed as part of convention. The final file will have file structure created by calibre, not the preprocessed structure of the web-page I want to convert. So how can I print out the soup?
Dereks is offline   Reply With Quote
Old 03-14-2011, 02:30 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Dereks View Post
What I still can't get is how to "print" it practically.
Add the code to the recipe
Quote:
If include this function into the recipe - it will be performed as part of convention.
It will be executed with the recipe and appear in the job details, or better, use ebook-convert as described here.

but use this form:
Code:
ebook-convert myrecipe.recipe output_dir --test -vv>job_details.txt
and look in job_details.txt.
Starson17 is offline   Reply With Quote
Old 03-14-2011, 04:11 PM   #6
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
ok. did that, it worked. But still didn't find out where the problem was: even with beautiful soup everything appeared to be in the same order.
In the end, I simply used "keep_only_tags" and it worked much better. I only have to polish some table formatting with extra_css and that's about it.
Anyways, thanks for help and links: I do feel a great deal more comfortable with recipe editing now.
Dereks is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
News Feed Error nook77 Recipes 0 03-07-2011 08:26 PM
Recipe for Ukrainian Economic / Legal news sites. Dereks Recipes 4 11-28-2010 06:31 PM
New recipe request - BBC News Ukrainian storkozos Introduce Yourself 7 10-25-2010 11:36 AM
News Feed Covers DenverReader Calibre 4 02-06-2010 12:00 AM
News feed error thibaulthalpern Calibre 4 03-22-2009 02:21 AM


All times are GMT -4. The time now is 07:28 PM.


MobileRead.com is a privately owned, operated and funded community.