Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-30-2011, 04:45 PM   #1
awitko
Member
awitko began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
Wall Street Journal (Free)

I am getting up to speed on using calibre for serving news on a vps.

When I create an e-book for wall street journal (free), it seems to load all the articles, even the blocked articles. This is different than the new york times which only loads the articles available for free.

Does the recipe need to be updated? Would you share the updated recipe and if you don't mind explain how you fixed it. I'd like to learn how to do more of this myself.

Thanks!

Alex
awitko is offline   Reply With Quote
Old 11-01-2011, 09:36 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by awitko View Post
When I create an e-book for wall street journal (free), it seems to load all the articles, even the blocked articles. This is different than the new york times which only loads the articles available for free.

Does the recipe need to be updated? Would you share the updated recipe and if you don't mind explain how you fixed it. I'd like to learn how to do more of this myself.
It's not clear to me what you 're asking for. Do you want an updated recipe for NYT or WSJ? Why? Calibre only gets what the sites make freely available to it. Some sites have complex rules for what they send or don't send. They try to straddle the line of making some material available to draw in readers (particularly for a first search request), while limiting what you can retrieve when just reading the site - so you will subscribe. Sometimes those rules let Calibre retrieve more than a browser might see - other times they produce less.
Starson17 is offline   Reply With Quote
 
Enthusiast
Old 11-02-2011, 04:39 PM   #3
awitko
Member
awitko began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
I am just noting that the ny times recipe seems to successfully filter out the blocked articles so that everything I look at on the kindle is readable but the wall street journal article includes everything on the kindle so that it requires wading though mostly blocked articles to find the readable ones. I am using a free account for both of them.

To the extent that there is a technical problem that makes that the best the wall street journal recipe can do, I guess that will have to be acceptable. But if this is the result of a change in website such that a tweak of the recipe will make it as successful as the ny times recipe in filtering blocked articles, I guess I am asking someone familiar with the recipe to consider fixing it.

Alex
awitko is offline   Reply With Quote
Old 11-02-2011, 04:46 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by awitko View Post
the ny times recipe seems to successfully filter out the blocked articles so that everything I look at on the kindle is readable but the wall street journal article includes everything on the kindle so that it requires wading though mostly blocked articles
Now I understand what you want. When you said it loaded blocked articles, I thought you were saying it actually retrieved them even though you thought they were supposed to be blocked. I've seen that behavior before - blocked articles with a browser were retrieved with the recipe.

Is the blocked article completely blank? Or is there part of it, but not all?
Starson17 is offline   Reply With Quote
Old 11-02-2011, 05:48 PM   #5
awitko
Member
awitko began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
I reran the recipe to get a copy again. The articles show a short portion of the article and then an image that states subscriber only content...

And another minor issue. I was able to stick timefmt = '' in the other recipes so that it would not append the date on the title. But it doesn't seem to work for wsj. If someone could include an option to do that it would be appreciated.

Alex
awitko is offline   Reply With Quote
Old 11-02-2011, 06:48 PM   #6
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by awitko View Post
I reran the recipe to get a copy again. The articles show a short portion of the article and then an image that states subscriber only content...Alex
I contributed an earlier incarnation of the recipe that had a variable omit_paid_content which, if set to True, would skip the paid content articles. However, the recipe has been rewritten since then and that customization was removed (reason unknown).

You can create a custom version of the WSJ (free) recipe that omits the paid articles by adding the following code to the standard recipe
Code:
    
def preprocess_html(self,soup):
     article_title = self.tag_to_string(soup.title)
     # check if article is paid content
     divtag = soup.find('div','adSummary subscribePromo recipeNotABCShopAndBuy')
     if divtag:
         self.log("\nPaid article omitted (%s)" % article_title)
         return None
     return soup
nickredding is offline   Reply With Quote
Old 11-02-2011, 10:45 PM   #7
awitko
Member
awitko began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
Thank you Nick. I inserted that function and it works - no more blocked articles!

In order to remove the date appended after the title (this is redundant on the kindle since it attached the date) I replaced the following code

if date is not None:
self.timefmt = ' [%s]'%self.tag_to_string(date)

with

self.timefmt = ''

I noticed the other date next to the title on my kindle (the one not previously appended to the title) is later. It shows tomorrow's date even though it is at 4-5 hours before midnight here (pacific standard time). How is this controlled and how would I make it consistent with my local time? It seems to be using a date based on a different time zone than me or my computers. And even different than NY Times time zone - presumably EST.

Last edited by awitko; 11-03-2011 at 12:20 AM.
awitko is offline   Reply With Quote
Old 11-02-2011, 11:18 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,620
Karma: 4998447
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@nickredding: the reason it was removed is that with it I was getting bug reports bout missing articles.
kovidgoyal is offline   Reply With Quote
Old 11-02-2011, 11:40 PM   #9
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by kovidgoyal View Post
@nickredding: the reason it was removed is that with it I was getting bug reports bout missing articles.
No problem, I maintain my own WSJ custom recipe and it's true recipes need care and feeding as the publishers adjust their HTML formats.
nickredding is offline   Reply With Quote
Old 11-03-2011, 12:22 AM   #10
awitko
Member
awitko began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
@nickredding If there are other differences, would you mind sharing your custom wsj recipe so I and others could try it?
awitko is offline   Reply With Quote
Old 11-03-2011, 12:43 AM   #11
awitko
Member
awitko began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
And I still don't understand how it comes up with the date that is assigned on the kindle. It was about 7PM PST when I ran the script. The server date was correct. My client computer was correct. The NY Times server probably is at 10PM EST. Why did the date on my kindle use the next day? Its almost as if it is using a later time zone like GMT to determine the date? Any way to correct for this? Note that I use timefmt='' to remove the date appended to the title - here I am talking about the other date associated with the periodical.

Also I notice during testing running the scripts several times that my kindle seems to put the new version of the periodical in the back articles and leave the old one. I haven't really been paying too much attention to notice a pattern in these first couple days of lots of experimenting, but it has happened. I think others don't go into a back issue collection and multiple copies exist in the directory. Anyone have experience with this and can offer some things to look into to resolve this. I would think it would be helpful for the old issues for all the calibre recipes to go into back issues automatically if possible. It would be even better if there was some way to set an automatic deletion (on the kindle and on the personal document server). For example, it would be great to be able to implement a rule such as only 3 back issues are saved, after that the oldest one is deleted as each new one is generated. I suspect this is not currently possible, but it doesn't hurt to ask...

Last edited by awitko; 11-03-2011 at 12:48 AM.
awitko is offline   Reply With Quote
Old 11-03-2011, 12:45 AM   #12
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
My recipe skirts the pay wall so I don't want to post it here and risk getting the site into a pickle.
nickredding is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Free (Kindle UK) The Wall Street Journal Essential Guide to Management arcadata Deals, Freebies, and Resources (No Self-Promotion) 0 08-29-2011 07:00 PM
Wall Street Journal winterescape Recipes 8 08-14-2011 01:01 PM
Wall Street Journal, WSJ, Free version, recipe improvement for full text of all ar winterescape Recipes 16 02-07-2011 01:51 PM
Wall Street Journal dieterpops Sony Reader 0 12-20-2009 05:51 PM
Wall Street Journal free for 5 days Colin Dunstan Lounge 0 10-25-2004 08:53 PM


All times are GMT -4. The time now is 08:58 PM.


MobileRead.com is a privately owned, operated and funded community.