Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-12-2011, 10:58 AM   #1
grizliez
Junior Member
grizliez began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Nook
Politifact Recipe Problems

I'm having problems with the Politifact recipe. It posts the short descriptions fine in the section listing, but some of the actual articles are just a mess of symbols and special characters. Many of the articles come out fine. I've tried figuring out what is being interpreted in the python code that is causing this so that I can set the code to remove the offending tags, but without success. I'm new at this so I'm probably missing something. Ideas?
grizliez is offline   Reply With Quote
Old 03-13-2011, 11:11 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by grizliez View Post
Ideas?
I've seen this when ads are being randomly inserted. See if that's happening. I've also seen it when redirects occur and the processing isn't following quickly enough. Try adding a delay and running a single thread download:
Code:
             simultaneous_downloads = 1
             delay = 5
Starson17 is offline   Reply With Quote
Advert
Old 03-13-2011, 09:34 PM   #3
grizliez
Junior Member
grizliez began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Nook
Quote:
Originally Posted by Starson17 View Post
I've seen this when ads are being randomly inserted. See if that's happening. I've also seen it when redirects occur and the processing isn't following quickly enough. Try adding a delay and running a single thread download:
Code:
             simultaneous_downloads = 1
             delay = 5
I gave the suggestion a try and it didn't work. It does appear to be happening for the same stories each time, so it is repeatable. Still not sure what's going on.
grizliez is offline   Reply With Quote
Old 03-13-2011, 09:47 PM   #4
grizliez
Junior Member
grizliez began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Nook
Just found the problem after searching further and following one of the ideas in the reusable code section. It was all about the links in certain stories. I used the code segment that converts links to text and now no more problems. Code reprinted here for the next person

Spoiler:

def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup
grizliez is offline   Reply With Quote
Old 08-23-2011, 09:49 PM   #5
KNickerson
Member
KNickerson began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2011
Device: Nook
Is there a way to get this into the Calibre release? I'm seeing the same issue with the latest version.
KNickerson is offline   Reply With Quote
Advert
Old 08-24-2011, 09:40 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by KNickerson View Post
Is there a way to get this into the Calibre release? I'm seeing the same issue with the latest version.
The "fix" posted isn't designed to be a fix for this problem, even though it seems to have solved it in one case in the past. The code is designed to remove clickable links, while leaving the text. Many devices can use the clickable links, so that isn't a desirable solution in many cases.

I checked it out by adding that code to the Politifact recipe , and it doesn't solve the problem.
Starson17 is offline   Reply With Quote
Old 08-24-2011, 04:36 PM   #7
KNickerson
Member
KNickerson began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2011
Device: Nook
Quote:
Originally Posted by Starson17 View Post
I checked it out by adding that code to the Politifact recipe , and it doesn't solve the problem.
(Disclaimer: This is the first I've looked into the recipes).

I'm thinking it's a download problem. I copied the script off and ran
ebook-convert PolitifactKJN.recipe .epub -vv --debug-pipeline debug

Then I found a bad section, and hunted it down in debug\input.
The index.html there is just garbage. Isn't that the raw stuff downloaded before the recipe kicks in? If not, how do I get to the raw stuff?
KNickerson is offline   Reply With Quote
Old 08-24-2011, 05:00 PM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by KNickerson View Post
(Disclaimer: This is the first I've looked into the recipes).

I'm thinking it's a download problem. I copied the script off and ran
ebook-convert PolitifactKJN.recipe .epub -vv --debug-pipeline debug

Then I found a bad section, and hunted it down in debug\input.
The index.html there is just garbage. Isn't that the raw stuff downloaded before the recipe kicks in? If not, how do I get to the raw stuff?
That should be the raw stuff, but to be sure you can do this:
Code:
    def preprocess_html(self, soup):
        print 'The raw stuff is: ', soup  
        return soup
Is it always the same crud at the same point?
Starson17 is offline   Reply With Quote
Old 08-25-2011, 10:18 AM   #9
KNickerson
Member
KNickerson began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2011
Device: Nook
Quote:
Originally Posted by Starson17 View Post
Is it always the same crud at the same point?
It's not clear. Yesterday, I was looking at a check on Krugman, and in several runs it was always bad, but then it was ok. Today that one is still ok, but I've had four runs where a check on farm tractors is garbage.

It is the raw data though, the debug snippet you gave me shows the crud. I also see that all of the crud shows "WARNING: Encoding detection confidence 0%"

I captured the complete fetch with WireShark, and I can't find any garbage in the capture. I did find at least one reply that came in gzip'd though, I don't know if Calibre can handle a gzip'd response.
KNickerson is offline   Reply With Quote
Old 08-25-2011, 10:40 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
A server should never send gzip if the client doesn;t say it accepts it. But you can add gzip support to a particular recipe by adding:

Code:
def get_browser(self):
   br = BasicNewsRecipe.get_browser(self)
   br.set_handle_gzip(True)
   return br
kovidgoyal is offline   Reply With Quote
Old 08-25-2011, 10:57 AM   #11
KNickerson
Member
KNickerson began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2011
Device: Nook
Quote:
Originally Posted by kovidgoyal View Post
A server should never send gzip if the client doesn;t say it accepts it. But you can add gzip support to a particular recipe by adding:

Code:
def get_browser(self):
   br = BasicNewsRecipe.get_browser(self)
   br.set_handle_gzip(True)
   return br
That looks good. I added it in, and got a perfect epub output. Granted, given the variation it might be just luck, but I'm hopeful.

Given that gzip is possible, is there any reason to not decode gzip even if it wasn't requested?
KNickerson is offline   Reply With Quote
Old 08-25-2011, 11:26 AM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
If it continues to work, give us an update. I can't recall seeing anything quite like this before, but it's a handy tool to know about.
Starson17 is offline   Reply With Quote
Old 08-26-2011, 11:33 AM   #13
KNickerson
Member
KNickerson began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2011
Device: Nook
Quote:
Originally Posted by Starson17 View Post
If it continues to work, give us an update. I can't recall seeing anything quite like this before, but it's a handy tool to know about.
I'm totally convinced this is the fix. I reverted back to the original recipe, and did another packet capture with WireShark. I searched that capture for gzipped data. There was a perfect one-to-one match: All gzipped data was garbage in the epub, and all garbage in the epub was gzipped data.

I also see that the Obamameter feed isn't right, it needs to somehow follow another link in, but that's not a very interesting feed to me.

Anyone know what procedural hoops I need to go through to get this into the official release? (Yes, I should just RTM)
KNickerson is offline   Reply With Quote
Old 08-26-2011, 02:25 PM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by KNickerson View Post
Anyone know what procedural hoops I need to go through to get this into the official release? (Yes, I should just RTM)
Kovid will usually pick it up here. He probably prefers a complete tested recipe, rather than a code chunk to add in, which may require more testing from him, but this one's pretty simple.
Starson17 is offline   Reply With Quote
Old 08-26-2011, 02:57 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This is already in 0.8.16
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Nook (classic) problems with Sports Illustrated Recipe spedinfargo Recipes 2 02-03-2011 06:41 PM
Recipe problems aessedai44 Recipes 0 10-27-2010 12:17 AM
Problems with economist recipe lady kay Calibre 1 08-06-2010 07:49 AM
Problems with Economist recipe 0.5.1 MTBSJC Calibre 7 03-23-2009 01:54 PM
Problems writing recipe kiklop74 Calibre 9 10-28-2008 06:58 PM


All times are GMT -4. The time now is 06:06 PM.


MobileRead.com is a privately owned, operated and funded community.