Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 12-17-2009, 05:12 PM   #1
frankbaozhu
Junior Member
frankbaozhu began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2009
Device: Sony Touch Edition
Questions about downloaded WSJ and other paper and magzines.

I recently started to download newspapers and magazines using Calibre. My questions are:

1. a single newspaper or magazine contains too much overlapping content. For example, a story in WSJ could appear many times in a day's newspaper and as a result, a paper can contain up to 1500 pages !!! Does anyone notice this problem and has a solution not to download those repeated stories?

2. When I was reading WSJ on my PRS-600 Sony reader, there were crashes from time to time. Anyone had the same problem?

Thanks a lot.
frankbaozhu is offline   Reply With Quote
Old 12-17-2009, 06:03 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The reason articles are duplicated is that there are many different ways to get at them. You could be interested in only top stories or only stories from a particular region or whatever. A single article can often be classified in multiple categories.
kovidgoyal is offline   Reply With Quote
Advert
Old 12-17-2009, 06:18 PM   #3
frankbaozhu
Junior Member
frankbaozhu began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2009
Device: Sony Touch Edition
Thanks a lot Kovidgoyal! But could you please tell me how I can specify the items I want to get from a specific region? Many thanks!
frankbaozhu is offline   Reply With Quote
Old 12-17-2009, 06:30 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can edit the recipe and and simply comment out those feeds you are not interested in.
kovidgoyal is offline   Reply With Quote
Old 12-17-2009, 07:43 PM   #5
frankbaozhu
Junior Member
frankbaozhu began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2009
Device: Sony Touch Edition
Thanks again for the reply. I customized the feeds as following(just ruled out the feeds I don't want):

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

from calibre.web.feeds.news import BasicNewsRecipe

# http://online.wsj.com/page/us_in_todays_paper.html

class WallStreetJournal(BasicNewsRecipe):

title = 'The Wall Street Journal'
__author__ = 'Kovid Goyal and Sujata Raman'
description = 'News and current affairs.'
needs_subscription = True
language = 'en'

max_articles_per_feed = 10
timefmt = ' [%a, %b %d, %Y]'
no_stylesheets = True

extra_css = '''h1{color:#093D72 ; font-size:large ; font-family:Georgia,"Century Schoolbook","Times New Roman",Times,serif; }
h2{color:#474537; font-family:Georgia,"Century Schoolbook","Times New Roman",Times,serif; font-size:small; font-style:italic;}
.subhead{color:gray; font-family:Georgia,"Century Schoolbook","Times New Roman",Times,serif; font-size:small; font-style:italic;}
.insettipUnit {color:#666666; font-family:Arial,Sans-serif;font-size:xx-small }
.targetCaption{ font-size:x-small; color:#333333; font-family:Arial,Helvetica,sans-serif}
.article{font-family :Arial,Helvetica,sans-serif; font-size:x-small}
.tagline {color:#333333; font-size:xx-small}
.dateStamp {color:#666666; font-family:Arial,Helvetica,sans-serif}
h3{color:blue ;font-family:Arial,Helvetica,sans-serif; font-size:xx-small}
.byline{color:blue;font-family:Arial,Helvetica,sans-serif; font-size:xx-small}
h6{color:#333333; font-family:Georgia,"Century Schoolbook","Times New Roman",Times,serif; font-size:small;font-style:italic; }
.paperLocation{color:#666666; font-size:xx-small}'''

remove_tags_before = dict(name='h1')
remove_tags = [
dict(id=["articleTabs_tab_article", "articleTabs_tab_comments", "articleTabs_tab_interactive","articleTabs_tab_vid eo","articleTabs_tab_map","articleTabs_tab_slidesh ow"]),
{'class':['footer_columns','network','insetCol3wide','intera ctive','video','slideshow','map','insettip','inset Close','more_in', "insetContent", 'articleTools_bottom', 'aTools', "tooltip", "adSummary", "nav-inline"]},
dict(rel='shortcut icon'),
]
remove_tags_after = [dict(id="article_story_body"), {'class':"article story"},]


def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://commerce.wsj.com/auth/login')
br.select_form(nr=0)
br['user'] = self.username
br['password'] = self.password
br.submit()
return br

def postprocess_html(self, soup, first):
for tag in soup.findAll(name=['table', 'tr', 'td']):
tag.name = 'div'

for tag in soup.findAll('div', dict(id=["articleThumbnail_1", "articleThumbnail_2", "articleThumbnail_3", "articleThumbnail_4", "articleThumbnail_5", "articleThumbnail_6", "articleThumbnail_7"])):
tag.extract()

return soup

def get_article_url(self, article):
try:
return article.feedburner_origlink.split('?')[0]
except AttributeError:
return article.link.split('?')[0]

def cleanup(self):
self.browser.open('http://online.wsj.com/logout?url=http://online.wsj.com')

feeds = [

('Today\'s Newspaper - Page One', 'http://online.wsj.com/xml/rss/3_7205.xml'),
('Today\'s Newspaper - Marketplace', 'http://online.wsj.com/xml/rss/3_7206.xml'),
('Today\'s Newspaper - Money & Investing', 'http://online.wsj.com/xml/rss/3_7207.xml'),
('Today\'s Newspaper - Personal Journal', 'http://online.wsj.com/xml/rss/3_7208.xml'),
('Today\'s Newspaper - Weekend Journal', 'http://online.wsj.com/xml/rss/3_7209.xml'),

]

However, I tried again with the revised feeds but still got 1600+ pages. Could you please look at it and tell me what went wrong?,
frankbaozhu is offline   Reply With Quote
Advert
Old 12-17-2009, 08:08 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
When you customize a recipe it creates a new "custom recipe" when you're downloading make sure you download the custom recipe, not the builtin one.
kovidgoyal is offline   Reply With Quote
Old 12-17-2009, 08:29 PM   #7
frankbaozhu
Junior Member
frankbaozhu began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2009
Device: Sony Touch Edition
I did create a customized recipe for WSJ. However, it did not ask for username and password so I could not use it to download the content. Could you please point out how to add the user/pass for a customized recipe?
frankbaozhu is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PRS-900 WSJ subscription through Sony vs WSJ direct advocate2 Sony Reader 14 01-29-2010 11:52 AM
WSJ from Todays Paper (not RSS feeds) Bob Russell Calibre 6 12-26-2009 10:05 AM
Questions about downloaded WSJ and other paper and magzines. frankbaozhu Calibre 0 12-17-2009 01:50 PM
a bunch of questions about e-book and e-paper bitread News 3 12-08-2008 07:33 AM
Magzines on E-Ink Devices bbusybookworm Workshop 0 04-28-2008 11:07 AM


All times are GMT -4. The time now is 02:05 AM.


MobileRead.com is a privately owned, operated and funded community.