11-18-2010, 03:09 AM | #1 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
cant seem to keep my articles!
i whiped up this recipe of globes.co.il.
i have everything down, but the articles are wider than the pdf output. i have no idea what is going on. any ideas? the code: Spoiler:
example of print page here. thank you |
11-18-2010, 11:54 AM | #2 |
creator of calibre
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
you need to remove non reflowable marjup like tables, pre tags, or other tags with an explicitly specified width.
|
Advert | |
|
11-18-2010, 12:56 PM | #3 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
got it
thanks kovid
ready to be built in: Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import re class AdvancedUserRecipe1283848012(BasicNewsRecipe): description = 'This is a recipe of Globs.co.il.' cover_url = 'http://www.the7eye.org.il/SiteCollectionImages/BAKTANA/arye_avnery_010709_377.jpg' title = u'Globes' language = 'he' __author__ = 'marbs' extra_css='img {max-width:100%;} body{direction: rtl;max-width:100%;}title{direction: rtl; } article_description{direction: rtl; }, a.article{direction: rtl;max-width:100%;} calibre_feed_description{direction: rtl; }' simultaneous_downloads = 5 remove_javascript = True timefmt = '[%a, %d %b, %Y]' oldest_article = 1 max_articles_per_feed = 100 remove_attributes = ['width','style'] feeds = [(u'שוק ההון', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=585'), (u'נדל"ן', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=607'), (u'וול סטריט ושווקי העולם', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=1225'), (u'ניתוח טכני', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=1294'), (u'היי טק', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=594'), (u'נתח שוק וצרכנות', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=821'), (u'דין וחשבון', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=829'), (u'רכב', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=3220'), (u'דעות', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=845'), (u'קניון המניות - טור שבועי', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=3175'), (u'סביבה', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=3221')] def print_version(self, url): split1 = url.split("=") print_url = 'http://www.globes.co.il/serve/globes/printwindow.asp?did=' + split1[1] return print_url def preprocess_html(self, soup): soup.find('tr',attrs={'bgcolor':'black'}).findPrevious('tr').extract() soup.find('tr',attrs={'bgcolor':'black'}).extract() print 'soup is',soup,'end of soup' return soup def fixChars(self,string): # Replace lsquo (\x91) fixed = re.sub("■","■",string) return fixed Last edited by kovidgoyal; 11-18-2010 at 01:08 PM. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Where to submit articles ? | spoudaios | Writers' Corner | 6 | 05-26-2010 08:43 PM |
PRS-600 Articles like this | scottjl | Sony Reader | 31 | 12-30-2009 05:41 AM |
Wikipedia articles | Sordelka | Calibre | 1 | 04-20-2009 09:02 AM |
Submit my articles | Shannon | Lounge | 3 | 01-08-2009 12:56 PM |
A crop of articles from the UK | Argel | News | 3 | 09-08-2008 10:13 AM |