Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 03-10-2010, 08:29 PM   #1
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Simulate Calibre recipe with browser?

I've got a recipe that works, but it pulls an image that is slightly different from the image I see when I go to the page the image came from. The log shows that the recipe is actually fetching the image I want, but Calibre doesn't get that image. It gets one similar, but different. When I use a browser to directly view the URL of the image that the recipe is fetching, I see the image I want, not the one that Calibre gets.

I've tried turning off cookies in the browser, clearing cookies, clearing the cache and fetching the image again, but I always get the image I want in the browser, and never get the image that Calibre's recipe gets. I've tried changing the useragent string, blocking the referrer, etc., but I can't seem to simulate Calibre with the browser closely enough that I get the same image that Calibre gets.

What am I missing? How does the site know that a Calibre recipe is grabbing the image? Comments? Thanks.
Starson17 is offline   Reply With Quote
Old 03-11-2010, 12:26 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,445
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try using the TamperData firefox extension to see exactly what happens when you fetch with a browser.
kovidgoyal is online now   Reply With Quote
Old 03-11-2010, 10:59 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
Try using the TamperData firefox extension to see exactly what happens when you fetch with a browser.
I've looked with Live HTTP Headers, but since I'm not sure what's happening with Calibre, it's hard to spot a difference. I suppose I could set up a packet sniffer, but that's a bit more effort than I want to expend. Alternatively, I may just use wget to see whether it pulls the same images that the browser pulls or the images that Calibre gets. That may give me a clue.
Starson17 is offline   Reply With Quote
Old 03-11-2010, 11:53 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,445
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
you can have claibre dump the eaders of the requests it sends as well. I don't recall the exact commands for that off the top of my head, but just google python mechanize
kovidgoyal is online now   Reply With Quote
Old 03-11-2010, 04:01 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
you can have claibre dump the eaders of the requests it sends as well. I don't recall the exact commands for that off the top of my head, but just google python mechanize
Thanks!
Starson17 is offline   Reply With Quote
Old 03-12-2010, 02:05 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
Try using the TamperData firefox extension to see exactly what happens when you fetch with a browser.
OK, the puzzle is solved - sort of. There were a couple of twists and turns. First, the difference was in the referrer. The site sends the image I don't want when the referrer is missing or not pointed to at least the root of the site.

I had tried blocking the referrer with the RefControl plugin, but it turns out that Firefox (or the plugin) will still send the referrer unless you shut it down first. That's why I was having trouble getting Firefox to emulate Calibre's recipe and that was tricky part #1.

The second tricky part was that TamperData seems to lie about the referrer. Apparently, it was showing the referrer FF would have sent, if not for the blocking of RefControl. Live HTTP Headers, however, was showing what was actually being sent.

For Firefox to get the same images that Calibre was getting, I had to clear the cache, <block> referrer with RefControl, then close FF and restart. (I was also removing cookies, but I'm not sure if that was necessary). To see what was really happening in FF, I had to watch with Live HTTP Headers.

What I'm not sure about is what referrer, if any, Calibre sends as a default. I haven't yet figured out how to watch the handshaking with mechanize. I tried some get_browser mods in the recipe to use the correct referrer, but so far it hasn't worked.
Starson17 is offline   Reply With Quote
Old 03-12-2010, 02:14 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,445
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
When downloading articles it doesn't send any referrer. Each request leaves the browser state unchanged (this is so that the download can happen in multiple threads while using the same browser instance).

One possibility is to monkey path the open_novisit method on the browser instance to send the required referrer. so something like this

Code:
def get_browser(self):
    br = BasicNewsRecipe.get_browser(self)
    orig_open_novisit = br.open_novisit

    def my_open_no_visit(self, url, **kwargs):
        data = # add the referrer to the header
        return orig_open_novisit(url, data=data)

     br.open_novisit = my_open_no_visit
     return br
kovidgoyal is online now   Reply With Quote
Old 03-13-2010, 04:42 PM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
One possibility is to monkey path the open_novisit method on the browser instance to send the required referrer. so something like this

Code:
def get_browser(self):
    br = BasicNewsRecipe.get_browser(self)
    orig_open_novisit = br.open_novisit

    def my_open_no_visit(self, url, **kwargs):
        data = # add the referrer to the header
        return orig_open_novisit(url, data=data)

     br.open_novisit = my_open_no_visit
     return br
Thanks. I could never have solved this without your pseudocode tip. It still took a while to figure out, but it was fun. The line:

def my_open_no_visit(self, url, **kwargs):

had to be changed to:
def my_open_no_visit(url, **kwargs):

(complaints about number of arguments),

and the lines:

data = # add the referrer to the header
return orig_open_novisit(url, data=data)

were changed to :

req = mechanize.Request(url, headers = {'Referer':'http://referer_site.com/'})
return orig_open_novisit(req)

At least I got a chance to learn a bit more about mechanize. Thanks again for the tip, and enjoy your return home.

Last edited by Starson17; 03-13-2010 at 05:33 PM.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New to Calibre - Recipe/HTML question cklls Calibre 3 07-23-2010 11:53 AM
NY Times Recipe in Calibre 6.36 Fails keyrunner Calibre 1 01-28-2010 11:56 AM
Broken SMH recipe in new Calibre AprilHare Calibre 1 09-20-2008 11:15 AM
[calibre] recipe - smaller font? moneytoo Calibre 0 06-01-2008 08:00 AM
Calibre recipe Question astrodad Calibre 3 05-23-2008 01:05 PM


All times are GMT -4. The time now is 09:53 AM.


MobileRead.com is a privately owned, operated and funded community.