![]() |
#2041 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 56
Karma: 1140
Join Date: Apr 2010
Device: Kindle / Palm Pre / iPhone
|
Wired fixed itself
Quote:
|
|
![]() |
![]() |
#2042 |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Is there a way to NOT rescale an immage added (or rescale with better resolution/quality)?
|
![]() |
Advert | |
|
![]() |
#2043 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,400
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Choose an output profile that has a screen size large enough to accomodate the image,like the iPad output profile.
|
![]() |
![]() |
#2044 |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Code:
from calibre.web.feeds.news import BasicNewsRecipe class LaStampaParseIndex(BasicNewsRecipe): title = u'Debug Parse Index' cover_url = 'http://www.lastampa.it/edicola/PDF/1.pdf' remove_javascript = True no_stylesheets = True def nz_parse_section(self, url): def get_article_url(self, article): link = article.get('links') print link if link: return link[0]['href'] soup = self.index_to_soup(url) head = soup.findAll('div',attrs= {'class': 'entry'}) descr = soup.findAll('div',attrs= {'class': 'feedEntryConteny'}) dt = soup.findAll('div',attrs= {'class': 'lastUpdated'}) print head print descr print dt current_articles = [] # a = head.find('a', href = True) # title = self.tag_to_string(a) # url = a.get('href', False) # description = self.tag_to_string(descr) # date = self.tag_to_string(dt) # self.log('title ', title) # self.log('url ', url) # self.log('description ', description) # self.log('date ', date) # current_articles.append({'title': title, 'url': url, 'description':description, 'date':date}) current_articles.append({'title': '', 'url':'', 'description':'', 'date':''}) return current_articles keep_only_tags = [dict(attrs={'class':['boxocchiello2','titoloRub','titologir','catenaccio','sezione','articologirata']}), dict(name='div', attrs={'id':'corpoarticolo'}) ] remove_tags = [dict(name='div', attrs={'id':'menutop'}), dict(name='div', attrs={'id':'fwnetblocco'}), dict(name='table', attrs={'id':'strumenti'}), dict(name='table', attrs={'id':'imgesterna'}), dict(name='a', attrs={'class':'linkblu'}), dict(name='a', attrs={'class':'link'}), dict(name='span', attrs={'class':['boxocchiello','boxocchiello2','sezione']}) ] def parse_index(self): feeds = [] for title, url in [(u'Politica', u'http://www.lastampa.it/redazione/cmssezioni/politica/rss_politica.xml'), (u'Torino', u'http://rss.feedsportal.com/c/32418/f/466938/index.rss') ]: print url articles = self.nz_parse_section(url) if articles: feeds.append((title, articles)) return feeds Probably it's the same problem that calibre find when parse itself the feed and don't put the correct values into title. I don't understand why... I am don't understand to use the normal method to parse the feeds (using get_article('links')) and override only the title. |
![]() |
![]() |
#2045 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
Advert | |
|
![]() |
#2046 | |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
Am i so newbie?!?!??!? ![]() thanks a lot. now that i am looking the correct source of the feed, i try to search 'title', 'description' and pubDate. ![]() Last edited by gambarini; 06-04-2010 at 09:43 AM. |
|
![]() |
![]() |
#2047 |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Code:
<item> <title><![CDATA[Alfano ai giudici: "Sciopero politico"]]></title> <description><![CDATA[ROMA<BR>Alla vigilia della riunione del Comitato direttivo centrale dell'Anm, dove verranno fissati i tempi e le modalità dello sciopero indetto dal sindacato delle toghe contro la manovra economica del Governo, le tensioni non si placano. <BR><BR>La reazione el governo è affidata al Guardasigilli Alfano. «Lo sciopero dei magistrati è uno sciopero politico, il governo chiede ai magistrati un sacri ...(continua)]]></description> <author><![CDATA[]]></author> <category><![CDATA[POLITICA]]></category> <pubDate><![CDATA[Fri, 4 Jun 2010 14:5:28 +0200]]></pubDate> <link>http://www.lastampa.it/redazione/cmsSezioni/politica/201006articoli/55639girata.asp</link> <enclosure url='http://www.lastampa.it/redazione/cmssezioni/politica/201006images/alfano01G.jpg' type='image/jpeg' /> <image> <url>http://www.lastampa.it/redazione/cmssezioni/politica/201006images/alfano01G.jpg</url> <title></title> <link></link> <width></width> <height></height> </image> </item> |
![]() |
![]() |
#2048 | |
Member
![]() Posts: 16
Karma: 10
Join Date: May 2010
Location: Southern California
Device: JetBook-Lite
|
Quote:
The tag names are the same as the 1st page, not sure why they're not being removed after the 1st page. tweaktown recipe code: Spoiler:
2nd question, I've started the pcper.com recipe and managed to get the multi-page to work on it. the problem on this is after the last page of the article they add a link that takes you back to the home page under the same tag that the pages were scraped from. The links for the pages all start with "article.php?" after the last page the link changes to "content_home.php?". So is there a way to make the soup only scrape the links that start with "article.php?"? Thanks |
|
![]() |
![]() |
#2049 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Aren't recipes fun!
Quote:
I believe the keep_only throws away the tags, during the initial page pull, but doesn't apply to the extra pages you are getting with the soup2 = self.index_to_soup(nexturl) step. I've certainly seen this before. There are lots of solutions, in fact, your recipe already uses one - extract()- to remove a tag. Just find the tags and extract them. I usually do this at the postprocess_html stage with something like this: Code:
for tag in soup.findAll('form', dict(attrs={'name':["comments_form"]})): tag.extract() for tag in soup.findAll('font', dict(attrs={'id':["cr-other-headlines"]})): tag.extract() Quote:
Code:
pager = soup.find('a',attrs={'class':'next'}) if pager: |
||
![]() |
![]() |
#2050 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
|
Thanks Krittika for the Psychology Today's recipe. I found that your recipe can't fetch entire article that spans more than one page. This article, http://www.psychologytoday.com/artic...ectations-trap, for example, spans 5 pages and your recipe could fetch only the first page. Can you help fix it?
I would love to see the recipe fetching the cover too just like what the recipe for Time magazine does. ![]() |
![]() |
![]() |
#2051 |
Member
![]() Posts: 11
Karma: 10
Join Date: Jun 2010
Device: PRS-505, TouchPad, iPad2
|
Hey, wondering if you guys are still taking requests?
I'm getting my dad a kobo reader for fathers day. I'm also hoping to cancel his paper subscription to save him a few extra bucks each year. He's still going to want to read some news though. I was hoping someone here could massage these RSS feeds into an aesthetically pleasing manner for me, so the transition is easier on him. Top Stories - http://rss.cbc.ca/lineup/topstories.xml World - http://rss.cbc.ca/lineup/world.xml National - http://rss.cbc.ca/lineup/canada.xml Manitoba - http://rss.cbc.ca/lineup/canada-manitoba.xml Politics - http://rss.cbc.ca/lineup/politics.xml Tech & Science - http://rss.cbc.ca/lineup/technology.xml Books - http://rss.cbc.ca/lineup/arts-books.xml Movies - http://rss.cbc.ca/lineup/arts-film.xml Winnipeg 7 day Forecast - http://text.www.weatheroffice.gc.ca/...ty/mb-38_e.xml Everything except weather shows up fine, but has a bunch of unnecessary text(Like there is no need for a page index, or the header to click back to the index page... Than their's the calibre footer. I'm guessing that isnt removable though, since kovidgoyal deserves credit for putting out a free application) I also notice a lot of redundancy. Stories that show up in Top Stories re-appear in national. If there was a way to have calibre ignore duplicate stories if its already been added to the file in a previous section, that would be pretty nifty. Also, is there a way to easily change the default cover image automatically when it creates the file? Id like to have a bit centered CBC logo, since thats where all the news is sourced. Thanks. I know this is asking a lot, and its made even worse since i'm a new here and have not contributed anything myself. |
![]() |
![]() |
#2052 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jun 2010
Device: Kindle
|
Anyone else having the problem that Gizmodo's feeds don't display the full content? I get the annoying (more) link. This would be an excellent resource, but I don't know if there's any way to work around it. It's likely a protection so anyone wanting the full articles HAS to go to their site?
|
![]() |
![]() |
#2053 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
#2054 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
This is fixed and it will be included in the next release of calibre
|
![]() |
![]() |
#2055 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
|
Quote:
Winnipeg weather is not from the same website so I guess it's not allowed to mix sources. If you can point to me a large enough picture for the cover page, maybe I can help. |
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |