Calibre StorePlugin: Download Chunked Response

itsWeller · 12-30-2014, 08:11 PM

I'm developing a StorePlugin for Calibre and I've managed to get the bulk of the parsing completed, but when it gets down to the actual downloading of the ebook, I'm running into some issues.

The page I'm scraping from to retrieve download links supplies them in the format "http://server.com/get.php?fileID=XXXX". Checking the headers, it's giving a chunked response. Here's the info:

Connection → keep-alive
Content-Encoding → gzip
Transfer-Encoding → chunked

Throwing any download link like this in Calibre throws a ValueError (I'm assuming because it's just saving empty page as the file, not the chunked data referred to by the header.)

Any ideas on how to tackle this so I can either provide a proper link or somehow patch in chunked response support?

kovidgoyal · 12-30-2014, 08:55 PM

As far as I know, httplib/urllib/mechanize all support chunked transfer encoding. While I am not the maintainer for Get Books, IIRC Get Books use mechanize, so there should be no problem with chunked transfer encoding.

itsWeller · 12-31-2014, 12:54 AM

Interesting, so something like

s = SearchResult()
s.downloads = { 'FORMAT_HERE': 'http://server.com/get.php?fileID=FILE_ID' }

should be running along its merry way? I'll poke around a bit more and see if I can't get it working, and I'll get in contact with the Get Books maintainer if all else fails. Thanks.

kovidgoyal · 12-31-2014, 01:11 AM

As far as I know, it should. Look at ebook_download.py for details.

And note that if the server requires authentication, then you will need to provide a cookie file as well.

itsWeller · 01-02-2015, 07:34 PM

Sure enough, after a bit of probing, you were right - the issue isn't with chunked transfer. It actually appears to be checking the referer header before authorizing the download, and dropping

br.addheaders = [("Referer", "http://servername.com")]

in ebook_download.py is enough to get the plugin off the ground and enable downloads. Now the next challenge is, how can I specify headers from within my plugin so I can do this the *right* way? I see I can supply cookies for authentication, but I don't see any way to change the headers of the download request.

kovidgoyal · 01-02-2015, 09:50 PM

Like I said I'm not the maintainer of get books, so I cant say. To me, the best way to proceed is to allow the storeplugins to specify a function that returns the browser object to use for downloads. The default (base class) implementation of this function should just do what is done currently.

You can try contacting john and asking him for his opinion, his email is at the top of ebook_download.py

Or open a bug report in launchpad which will notify him, when I assign to him.

kovidgoyal · 01-02-2015, 10:44 PM

I had a few minutes, so I implemented it,

https://github.com/kovidgoyal/calibr...2d4c2bac0ddc52

12-30-2014, 08:11 PM	#1
itsWeller Junior Member Posts: 3 Karma: 10 Join Date: Dec 2014 Device: Kindle Paperwhite	Calibre StorePlugin: Download Chunked Response I'm developing a StorePlugin for Calibre and I've managed to get the bulk of the parsing completed, but when it gets down to the actual downloading of the ebook, I'm running into some issues. The page I'm scraping from to retrieve download links supplies them in the format "http://server.com/get.php?fileID=XXXX". Checking the headers, it's giving a chunked response. Here's the info: Connection → keep-alive Content-Encoding → gzip Transfer-Encoding → chunked Throwing any download link like this in Calibre throws a ValueError (I'm assuming because it's just saving empty page as the file, not the chunked data referred to by the header.) Any ideas on how to tackle this so I can either provide a proper link or somehow patch in chunked response support?

12-31-2014, 12:54 AM	#3
itsWeller Junior Member Posts: 3 Karma: 10 Join Date: Dec 2014 Device: Kindle Paperwhite	Interesting, so something like s = SearchResult() s.downloads = { 'FORMAT_HERE': 'http://server.com/get.php?fileID=FILE_ID' } should be running along its merry way? I'll poke around a bit more and see if I can't get it working, and I'll get in contact with the Get Books maintainer if all else fails. Thanks. Last edited by itsWeller; 12-31-2014 at 12:57 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Troubleshooting My Kindle 3 has no response at all!	boohockey	Amazon Kindle	4	09-28-2014 07:24 PM
Can Calibre Companion Download books from Nook to Calibre?	Rika24	Library Management	5	10-03-2013 12:55 AM
StorePlugin questions	fenuks	Development	2	11-02-2011 01:27 PM

12-30-2014, 08:55 PM	#2
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	As far as I know, httplib/urllib/mechanize all support chunked transfer encoding. While I am not the maintainer for Get Books, IIRC Get Books use mechanize, so there should be no problem with chunked transfer encoding.

12-31-2014, 01:11 AM	#4
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	As far as I know, it should. Look at ebook_download.py for details. And note that if the server requires authentication, then you will need to provide a cookie file as well.

01-02-2015, 07:34 PM	#5
itsWeller Junior Member Posts: 3 Karma: 10 Join Date: Dec 2014 Device: Kindle Paperwhite	Sure enough, after a bit of probing, you were right - the issue isn't with chunked transfer. It actually appears to be checking the referer header before authorizing the download, and dropping br.addheaders = [("Referer", "http://servername.com")] in ebook_download.py is enough to get the plugin off the ground and enable downloads. Now the next challenge is, how can I specify headers from within my plugin so I can do this the right way? I see I can supply cookies for authentication, but I don't see any way to change the headers of the download request.

01-02-2015, 09:50 PM	#6
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Like I said I'm not the maintainer of get books, so I cant say. To me, the best way to proceed is to allow the storeplugins to specify a function that returns the browser object to use for downloads. The default (base class) implementation of this function should just do what is done currently. You can try contacting john and asking him for his opinion, his email is at the top of ebook_download.py Or open a bug report in launchpad which will notify him, when I assign to him.

01-02-2015, 10:44 PM	#7
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I had a few minutes, so I implemented it, https://github.com/kovidgoyal/calibr...2d4c2bac0ddc52

Advert

Advert