![]() |
#1 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Ukrainian Legal News site - plishing the feed
Hi, can somebody help me with this one and tell me what's wrong with this code:
Code:
class AdvancedUserRecipe1300026627(BasicNewsRecipe): title = u'Liga Zakon' oldest_article = 7 max_articles_per_feed = 100 remove_tags_before = dict(name='div ', attrs={'class':'news_title_7 content'}) remove_tags_after = dict(id='main_content') no_stylesheets = True feeds = [(u'\u041b\u0456\u0433\u0430 \u0417\u0430\u043a\u043e\u043d', u'http://news.ligazakon.ua/news_rss/tape_articles.xml')] am I doing something wrong? Or is there a better way to polish the feed? defining this parameter with id='main_content' works perfectly, but main_content doesn't include article's title, which is a bit inconvenient. Thanks for help! Question on a side note: since this recipe has only one feed, how can it be modified, so it jumps to the contents directly, without creating a list of feeds, with only one link? Last edited by Dereks; 03-13-2011 at 12:06 PM. |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
def preprocess_html(self, soup): print 'The soup is: ',soup return soup |
|
![]() |
![]() |
![]() |
#3 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Sorry, I'm relatively inexperienced with this. What do you mean "print out the soup"? I've included the function "def preprocess_html" into the recipe and then tried to search for "soup" word in the feed through calibre reader. found nothing....I gave you the code to print the soup. Look in the output to see the code in the order that Calibre sees it.
Last edited by Starson17; 03-14-2011 at 07:59 AM. |
![]() |
![]() |
![]() |
#4 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
ok. I now read a bit of documentation on Beautiful Soup and understand what it does. What I still can't get is how to "print" it practically. If include this function into the recipe - it will be performed as part of convention. The final file will have file structure created by calibre, not the preprocessed structure of the web-page I want to convert. So how can I print out the soup?
|
![]() |
![]() |
![]() |
#5 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Add the code to the recipe
Quote:
but use this form: Code:
ebook-convert myrecipe.recipe output_dir --test -vv>job_details.txt |
|
![]() |
![]() |
![]() |
#6 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
ok. did that, it worked. But still didn't find out where the problem was: even with beautiful soup everything appeared to be in the same order.
In the end, I simply used "keep_only_tags" and it worked much better. I only have to polish some table formatting with extra_css and that's about it. Anyways, thanks for help and links: I do feel a great deal more comfortable with recipe editing now. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
News Feed Error | nook77 | Recipes | 0 | 03-07-2011 08:26 PM |
Recipe for Ukrainian Economic / Legal news sites. | Dereks | Recipes | 4 | 11-28-2010 06:31 PM |
New recipe request - BBC News Ukrainian | storkozos | Introduce Yourself | 7 | 10-25-2010 11:36 AM |
News Feed Covers | DenverReader | Calibre | 4 | 02-06-2010 12:00 AM |
News feed error | thibaulthalpern | Calibre | 4 | 03-22-2009 02:21 AM |