Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 03-01-2011, 10:06 PM   #31
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Sure go ahead, I'll be interested to see what you come up with.

I'm afraid I've only ever use addheaders, if that isn't working, I have no clue what else you could do.
kovidgoyal is online now   Reply With Quote
Old 03-01-2011, 10:18 PM   #32
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by kovidgoyal View Post
I'm afraid I've only ever use addheaders, if that isn't working, I have no clue what else you could do.
I'll keep digging then - they also have a legacy search interface that might work, I just liked the JSON option as no scraping was required.
ldolse is offline   Reply With Quote
Advert
Old 03-01-2011, 10:34 PM   #33
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Looking at the mechanize source code, all you have to do is construct a Request object and manually add the content-type header to it. If the request object has the content-type header it will not be overridden
kovidgoyal is online now   Reply With Quote
Old 03-01-2011, 11:05 PM   #34
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by kovidgoyal View Post
Looking at the mechanize source code, all you have to do is construct a Request object and manually add the content-type header to it. If the request object has the content-type header it will not be overridden
By construct a request object, you mean something roughly equivalent to this? (from gui2.update.py):
Code:
                br = browser()
                req = mechanize.Request(URL)
                req.add_header('CALIBRE_VERSION', __version__)
                req.add_header('CALIBRE_OS',
                        'win' if iswindows else 'osx' if isosx else 'oth')
                req.add_header('CALIBRE_INSTALL_UUID', prefs['installation_uuid'])
                version = br.open(req).read().strip()
ldolse is offline   Reply With Quote
Old 03-01-2011, 11:11 PM   #35
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
yes .
kovidgoyal is online now   Reply With Quote
Advert
Old 03-02-2011, 03:24 AM   #36
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
That did the trick for the JSON Query, next and hopefully final major stumbling block.


Edit, I think maybe the best way to fix the problem below is to delete the last cookie in the cookiejar, br._ua_handlers['_cookies'].cookiejar. Looks like this printed as a string:
Code:
<cookielib.CookieJar[<Cookie ASP.NET_SessionId=jfvfj1554sbio555e3nrfwjd for search.overdrive.com/>, <Cookie expires=1298969952 for search.overdrive.com/>]>
Not sure how to go about actually doing that though, as it's an instance and not a list object. I tried to use cookielib's clear() function, but it doesn't seem to work, probably because this cookie is corrupted in the first place and doesn't use the structure mechanize/cookielib expects.

The other option would be to create a separate copy of the cookiejar and use a separate browser object to load the bad page. But I'm struggling to figure out how to duplicate a cookiejar object as well. I've got the separate page loader working with urllib2.


== original description ==

Weird problem, not sure how to fix it. Basically one of the pages I have to retrieve sets a cookie with no name:
Code:
Set-Cookie: ; expires=Tue, 01-Mar-2011 08:15:21 GMT; path=/
And this causes mechanize to barf when it moves on to the next request:
Code:
Traceback (most recent call last):
  File "/Users/ldolse/calibredev/heuristics/src/calibre/ebooks/metadata/overdrive.py", line 112, in to_ovrdrv_data
    ovrdrv_data = find_ovrdrv_data(br, title, author, isbn)
  File "/Users/ldolse/calibredev/heuristics/src/calibre/ebooks/metadata/overdrive.py", line 95, in find_ovrdrv_data
    return overdrive_search(br, q, title, author)
  File "/Users/ldolse/calibredev/heuristics/src/calibre/ebooks/metadata/overdrive.py", line 53, in overdrive_search
    raw = br.open_novisit(xreq).read()
  File "site-packages/mechanize/_mechanize.py", line 199, in open_novisit
  File "site-packages/mechanize/_mechanize.py", line 230, in _mech_open
  File "site-packages/mechanize/_opener.py", line 188, in open
  File "site-packages/mechanize/_urllib2_fork.py", line 1188, in http_request
  File "lib/python2.7/cookielib.py", line 1331, in add_cookie_header
  File "lib/python2.7/cookielib.py", line 1290, in _cookie_attrs
TypeError: expected string or buffer

At least that's my assumption - this is the only page that sets a cookie header like that, and it sets it for a regular browser as well - it's not related to the plugin. Any way to get Mechanize to ignore the garbage set-cookie header?

Last edited by ldolse; 03-02-2011 at 06:28 AM.
ldolse is offline   Reply With Quote
Old 03-02-2011, 07:01 AM   #37
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I found a solution, not sure if it's the best one, but it's working. Figured out how to initialize a new cookiejar, copied the good cookie into that. Opened the bad page (corrupting the cookiejar) and replaced the corrupted cookiejar with the clean one.

Code:
    import copy

    goodcookies = br._ua_handlers['_cookies'].cookiejar
    clean_cj = mechanize.CookieJar()
    cookies_to_copy = []
    for cookie in goodcookies:
        copied_cookie = copy.deepcopy(cookie)
        cookies_to_copy.append(copied_cookie)
    for copied_cookie in cookies_to_copy:
        clean_cj.set_cookie(copied_cookie)
    
    # request that corrupts the cookiejar
    br.open(q_init_search)
    
    br.set_cookiejar(clean_cj)

Last edited by ldolse; 03-03-2011 at 01:33 AM. Reason: latest fix
ldolse is offline   Reply With Quote
Old 03-02-2011, 11:26 AM   #38
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Why not just set a new cookiejar on the browser object with set_cookiejar?
kovidgoyal is online now   Reply With Quote
Old 03-02-2011, 11:37 AM   #39
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Primarily because I didn't see that example in the Googling I was doing for possible solutions - 'copy cookiejar', 'new cookiejar', 'initialize', etc didn't return useful results. I'm not sure it would result in much less code though - part of what needs to happen is that the original session cookie needs to be maintained across all the requests. So it would still need to be copied into the new cookiejar. That said, I think that should let me avoid importing urllib2, so I'll give it a shot.
ldolse is offline   Reply With Quote
Old 03-02-2011, 11:40 AM   #40
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You dont want to use urllib2 as the calibre browser object automatically supports proxies and various other niceties.
kovidgoyal is online now   Reply With Quote
Old 03-02-2011, 11:44 AM   #41
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Yeah - set_cookiejar worked fine - only eliminated one line of code, but it does let me re-use the browser session and avoid urllib2.
ldolse is offline   Reply With Quote
Old 03-02-2011, 03:09 PM   #42
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by ldolse View Post
Yeah - set_cookiejar worked fine - only eliminated one line of code, but it does let me re-use the browser session and avoid urllib2.
I don't know if you will find it to be of any value, but defining Request objects, using cookiejars, addheader and add_header are used in a variety of recipes. Off the top of my head, the Economist and my Skeptic and GoComic recipes do some of those things.
Starson17 is offline   Reply With Quote
Old 03-02-2011, 08:31 PM   #43
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Starson17 View Post
I don't know if you will find it to be of any value, but defining Request objects, using cookiejars, addheader and add_header are used in a variety of recipes. Off the top of my head, the Economist and my Skeptic and GoComic recipes do some of those things.
I had assumed my searches of the source tree were including recipes when I was working through it. You just prompted me to double-check and I see the .recipe extension wasn't considered a text file type to search... I've fixed that, and I do see some useful examples there for general scraping code now. A couple initialize their own cookie jars, but apparently this website is fairly unique in it's ability to trip up mechanize, because none delete cookies or manipulate them the way I'm trying to do.
ldolse is offline   Reply With Quote
Old 03-02-2011, 09:11 PM   #44
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I'm nearly done with the plugin now, basically just need to clean things up for more robust string handling.

However what I've got working makes me wonder whether I should drop all the work I did with the library scraping. Basically three http requests directly to overdrive.com provides a list object that contains Title, Author, Series info, Publisher, Cover URL, Overdrive ID, ebook edition ISBN, and more.

The plugin doesn't work off of ISBN, it can't really, as Googlebooks/ISBNDB only seem to provide ISBNs for printed editions. Thus far in my testing there has never been an ISBN in their databases which matches the Overdrive ebook edition ISBN - I'm thinking now that this makes the xisbn cross referencing moot, correct? In that case Amazon's ASIN to ISBN combo matches one of the Googlebooks/ISBNDB records, but in this case it never does.

Since finding the record relies on Title/Author, and returns a fairly comprehensive list of Metadata, would this plugin be more appropriate to use in the discovery phase?

Last edited by ldolse; 03-02-2011 at 09:19 PM.
ldolse is offline   Reply With Quote
Old 03-02-2011, 10:39 PM   #45
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yes it sounds like a good match for discovery. I'd suggest you hold on for a bit, one of the goals of the new metadata infrastructure is to support the case of ebooks with a dedicated/no ISBN.
kovidgoyal is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Covers Plugin] Goodreads Covers **Deprecated** kiwidude Plugins 13 04-17-2011 05:09 PM
Is all Overdrive the same? CWatkinsNash General Discussions 3 12-28-2010 04:01 PM
Covers, covers and damn statistics (wait, I got that wrong). Moejoe Writers' Corner 86 11-29-2010 08:34 PM
Stop Using Overdrive Fat Abe General Discussions 19 09-11-2010 08:30 PM
Overdrive Overseas Honch Which one should I buy? 3 12-08-2009 08:21 AM


All times are GMT -4. The time now is 09:59 PM.


MobileRead.com is a privately owned, operated and funded community.