Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-10-2018, 06:38 PM   #46
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,082
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by Corpsegoddess View Post
I started experiencing this last night, and I didn't install the 3.31 patch until about an hour ago. Deleting the cache worked for the first book I tried downloading metadata for, but doesn't work with any subsequent attempts, even if I have an isbn for them.
See these two posts.


https://www.mobileread.com/forums/sh...90&postcount=8

https://www.mobileread.com/forums/sh...0&postcount=14

bernie
gbm is offline   Reply With Quote
Old 09-10-2018, 06:39 PM   #47
Corpsegoddess
Groupie
Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.
 
Corpsegoddess's Avatar
 
Posts: 190
Karma: 168826
Join Date: Jul 2011
Location: Vancouver, BC
Device: Kobo Aura One
Thank you! I missed those.

I always appreciate the help and patience on these boards. You guys rock.
Corpsegoddess is offline   Reply With Quote
Advert
Old 09-10-2018, 06:43 PM   #48
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,082
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by BetterRed View Post
But, that change was made after 3.31 was released, in fact only a few hours ago. How could it immediately affect everyone on every platform.

Surely Kovid wouldn't be pushing changes. Timebomb. The metadata-sources-cache.json contains code that appears to import from polyglot.builtins

Be interesting to know if anyone who runs calibre from very latest source is getting the problem.

BR
Take a look at my debug logs.

bernie
gbm is offline   Reply With Quote
Old 09-10-2018, 06:50 PM   #49
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,567
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Corpsegoddess View Post
I started experiencing this last night, and I didn't install the 3.31 patch until about an hour ago. Deleting the cache worked for the first book I tried downloading metadata for, but doesn't work with any subsequent attempts, even if I have an isbn for them.
Close calibre and try try replacing it with this one ==>> http://www.mediafire.com/file/q7yaba...ache.json/file

BR
BetterRed is offline   Reply With Quote
Old 09-10-2018, 06:56 PM   #50
Corpsegoddess
Groupie
Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.Corpsegoddess can program the VCR without an owner's manual.
 
Corpsegoddess's Avatar
 
Posts: 190
Karma: 168826
Join Date: Jul 2011
Location: Vancouver, BC
Device: Kobo Aura One
Thank you!
Corpsegoddess is offline   Reply With Quote
Advert
Old 09-10-2018, 07:04 PM   #51
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,567
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by gbm View Post
Take a look at my debug logs.

bernie
Thanks, but whilst they may mean something to a programmer, they mean nothing to an ordinary person like me

BR
BetterRed is offline   Reply With Quote
Old 09-10-2018, 07:48 PM   #52
rdorton
Junior Member
rdorton began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2012
Device: Kindle
Exclamation Can't download metadata

Hello, I'm Ray. I check this forum occasionally when I'm searching for information and solutions. I've found this site is a great source-thanks!

I am having trouble downloading metadata. I experienced this problem using V3.30. I updated to V3.31-same problem. Any help would be appreciated.
I get the following error: Failed to download metadata. Show details to see details.

DETAILS:

calibre, version 3.31.0
ERROR: Download failed: Failed to download metadata. Click Show Details to see details

Traceback (most recent call last):
File "site-packages\calibre\utils\ipc\simple_worker.py", line 289, in main
File "site-packages\calibre\ebooks\metadata\sources\worker.py ", line 102, in single_identify
File "site-packages\calibre\ebooks\metadata\sources\update.py ", line 79, in patch_plugins
File "site-packages\calibre\ebooks\metadata\sources\update.py ", line 62, in patch_search_engines
File "<string>", line 11, in <module>
ImportError: No module named polyglot.builtins
rdorton is offline   Reply With Quote
Old 09-10-2018, 08:14 PM   #53
Chris_Snow
Zealot
Chris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipse
 
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
I've also been busted with these issue. Guess I'll wait for a calibre update.
Chris_Snow is offline   Reply With Quote
Old 09-10-2018, 08:21 PM   #54
PacificNW
Junior Mint
PacificNW began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Sep 2018
Device: Kindle of DOOM
Quote:
Originally Posted by BetterRed View Post
Close calibre and try try replacing it with this one ==>> http://www.mediafire.com/file/q7yaba...ache.json/file

BR
Hey! That was the solution I posted several hours ago in the other thread.

https://www.mobileread.com/forums/sh...0&postcount=14

Last edited by PacificNW; 09-10-2018 at 08:26 PM. Reason: URL of plagiarised fix fix. Try try!
PacificNW is offline   Reply With Quote
Old 09-10-2018, 08:32 PM   #55
kenmac999
Member
kenmac999 began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Sep 2013
Location: Oklahoma
Device: FBReader running on Android phone(s), Chromebook, Kindle Fire 7
I have same problem. After reading this and the other threads, I tried replacing my original
metadata-sources-cache.json
Spoiler:
[CODE{
"amazon": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\n# License: GPLv3 Copyright: 2011, Kovid Goyal <kovid at kovidgoyal.net>\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport re\nimport socket\nimport time\nfrom functools import partial\nfrom Queue import Empty, Queue\nfrom threading import Thread\nfrom urlparse import urlparse\n\nfrom calibre import as_unicode, browser, random_user_agent\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.book.base import Metadata\nfrom calibre.ebooks.metadata.sources.base import Option, Source, fixauthors, fixcase\nfrom calibre.utils.localization import canonicalize_lang\nfrom calibre.utils.random_ua import accept_header_for_ua, all_user_agents\n\n\nclass CaptchaError(Exception):\n pass\n\n\nclass SearchFailed(ValueError):\n pass\n\n\nua_index = -1\n\n\ndef parse_html(raw):\n try:\n from html5_parser import parse\n except ImportError:\n # Old versions of calibre\n import html5lib\n return html5lib.parse(raw, treebuilder='lxml', namespaceHTMLElements=False)\n else:\n return parse(raw)\n\n\ndef parse_details_page(url, log, timeout, browser, domain):\n from calibre.utils.cleantext import clean_ascii_chars\n from calibre.ebooks.chardet import xml_to_unicode\n from lxml.html import tostring\n log('Getting details from:', url)\n try:\n raw = browser.open_novisit(url, timeout=timeout).read().strip()\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and \\\n e.getcode() == 404:\n log.error('URL malformed: %r' % url)\n return\n attr = getattr(e, 'args', [None])\n attr = attr if attr else [None]\n if isinstance(attr[0], socket.timeout):\n msg = 'Details page timed out. Try again later.'\n log.error(msg)\n else:\n msg = 'Failed to make details query: %r' % url\n log.exception(msg)\n return\n\n oraw = raw\n if 'amazon.com.br' in url:\n # amazon.com.br serves utf-8 but has an incorrect latin1 <meta> tag\n raw = raw.decode('utf-8')\n raw = xml_to_unicode(raw, strip_encoding_pats=True,\n resolve_entities=True)[0]\n if '<title>404 - ' in raw:\n raise ValueError('URL malformed: %r' % url)\n if '>Could not find the requested document in the cache.<' in raw:\n raise ValueError('No cached entry for %s found' % url)\n\n try:\n root = parse_html(clean_ascii_chars(raw))\n except Exception:\n msg = 'Failed to parse amazon details page: %r' % url\n log.exception(msg)\n return\n if domain == 'jp':\n for a in root.xpath('//a[@href]'):\n if 'black-curtain-redirect.html' in a.get('href'):\n url = 'https://amazon.co.jp' + a.get('href')\n log('Black curtain redirect found, following')\n return parse_details_page(url, log, timeout, browser, domain)\n\n errmsg = root.xpath('//*[@id=\"errorMessage\"]')\n if errmsg:\n msg = 'Failed to parse amazon details page: %r' % url\n msg += tostring(errmsg, method='text', encoding=unicode).strip()\n log.error(msg)\n return\n\n from css_selectors import Select\n selector = Select(root)\n return oraw, root, selector\n\n\ndef parse_asin(root, log, url):\n try:\n link = root.xpath('//link[@rel=\"canonical\" and @href]')\n for l in link:\n return l.get('href').rpartition('/')[-1]\n except Exception:\n log.exception('Error parsing ASIN for url: %r' % url)\n\n\nclass Worker(Thread): # Get details {{{\n\n '''\n Get book details from amazons book page in a separate thread\n '''\n\n def __init__(self, url, result_queue, browser, log, relevance, domain,\n plugin, timeout=20, testing=False, preparsed_root=None,\n cover_url_processor=None, filter_result=None):\n Thread.__init__(self)\n self.cover_url_processor = cover_url_processor\n self.preparsed_root = preparsed_root\n self.daemon = True\n self.testing = testing\n self.url, self.result_queue = url, result_queue\n self.log, self.timeout = log, timeout\n self.filter_result = filter_result or (lambda x, log: True)\n self.relevance, self.plugin = relevance, plugin\n self.browser = browser\n self.cover_url = self.amazon_id = self.isbn = None\n self.domain = domain\n from lxml.html import tostring\n self.tostring = tostring\n\n months = { # {{{\n 'de': {\n 1: ['j\u00e4n', 'januar'],\n 2: ['februar'],\n 3: ['m\u00e4rz'],\n 5: ['mai'],\n 6: ['juni'],\n 7: ['juli'],\n 10: ['okt', 'oktober'],\n 12: ['dez', 'dezember']\n },\n 'it': {\n 1: ['gennaio', 'enn'],\n 2: ['febbraio', 'febbr'],\n 3: ['marzo'],\n 4: ['aprile'],\n 5: ['maggio', 'magg'],\n 6: ['giugno'],\n 7: ['luglio'],\n 8: ['agosto', 'ag'],\n 9: ['settembre', 'sett'],\n 10: ['ottobre', 'ott'],\n 11: ['novembre'],\n 12: ['dicembre', 'dic'],\n },\n 'fr': {\n 1: ['janv'],\n 2: ['f\u00e9vr'],\n 3: ['mars'],\n 4: ['avril'],\n 5: ['mai'],\n 6: ['juin'],\n 7: ['juil'],\n 8: ['ao\u00fbt'],\n 9: ['sept'],\n 12: ['d\u00e9c'],\n },\n 'br': {\n 1: ['janeiro'],\n 2: ['fevereiro'],\n 3: ['mar\u00e7o'],\n 4: ['abril'],\n 5: ['maio'],\n 6: ['junho'],\n 7: ['julho'],\n 8: ['agosto'],\n 9: ['setembro'],\n 10: ['outubro'],\n 11: ['novembro'],\n 12: ['dezembro'],\n },\n 'es': {\n 1: ['enero'],\n 2: ['febrero'],\n 3: ['marzo'],\n 4: ['abril'],\n 5: ['mayo'],\n 6: ['junio'],\n 7: ['julio'],\n 8: ['agosto'],\n 9: ['septiembre', 'setiembre'],\n 10: ['octubre'],\n 11: ['noviembre'],\n 12: ['diciembre'],\n },\n 'jp': {\n 1: [u'1\u6708'],\n 2: [u'2\u6708'],\n 3: [u'3\u6708'],\n 4: [u'4\u6708'],\n 5: [u'5\u6708'],\n 6: [u'6\u6708'],\n 7: [u'7\u6708'],\n 8: [u'8\u6708'],\n 9: [u'9\u6708'],\n 10: [u'10\u6708'],\n 11: [u'11\u6708'],\n 12: [u'12\u6708'],\n },\n 'nl': {\n 1: ['januari'], 2: ['februari'], 3: ['maart'], 5: ['mei'], 6: ['juni'], 7: ['juli'], 8: ['augustus'], 10: ['oktober'],\n }\n\n } # }}}\n\n self.english_months = [None, 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',\n 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']\n self.months = months.get(self.domain, {})\n\n self.pd_xpath = '''\n //h2[text()=\"Product Details\" or \\\n text()=\"Produktinformation\" or \\\n text()=\"Dettagli prodotto\" or \\\n text()=\"Product details\" or \\\n text()=\"D\u00e9tails sur le produit\" or \\\n text()=\"Detalles del producto\" or \\\n text()=\"Detalhes do produto\" or \\\n text()=\"Productgegevens\" or \\\n text()=\"\u57fa\u672c\u4fe1\u606f\" or \\\n starts-with(text(), \"\u767b\u9332\u60c5\u5831\")]/../div[@class=\"content\"]\n '''\n # Editor: is for Spanish\n self.publisher_xpath = '''\n descendant::*[starts-with(text(), \"Publisher:\") or \\\n starts-with(text(), \"Verlag:\") or \\\n starts-with(text(), \"Editore:\") or \\\n starts-with(text(), \"Editeur\") or \\\n starts-with(text(), \"Editor:\") or \\\n starts-with(text(), \"Editora:\") or \\\n starts-with(text(), \"Uitgever:\") or \\\n starts-with(text(), \"\u51fa\u7248\u793e:\")]\n '''\n self.publisher_names = {'Publisher', 'Uitgever', 'Verlag',\n 'Editore', 'Editeur', 'Editor', 'Editora', '\u51fa\u7248\u793e'}\n\n self.language_xpath = '''\n descendant::*[\n starts-with(text(), \"Language:\") \\\n or text() = \"Language\" \\\n or text() = \"Sprache:\" \\\n or text() = \"Lingua:\" \\\n or text() = \"Idioma:\" \\\n or starts-with(text(), \"Langue\") \\\n or starts-with(text(), \"\u8a00\u8a9e\") \\\n or starts-with(text(), \"\u8bed\u79cd\")\n ]\n '''\n self.language_names = {'Language', 'Sprache',\n 'Lingua', 'Idioma', 'Langue', '\u8a00\u8a9e', 'Taal', '\u8bed\u79cd'}\n\n self.tags_xpath = '''\n descendant::h2[\n text() = \"Look for Similar Items by Category\" or\n text() = \"\u00c4hnliche Artikel finden\" or\n text() = \"Buscar productos similares por categor\u00eda\" or\n text() = \"Ricerca articoli simili per categoria\" or\n text() = \"Rechercher des articles similaires par rubrique\" or\n text() = \"Procure por itens similares por categoria\" or\n text() = \"\u95a2\u9023\u5546\u54c1\u3092\u63a2\u3059\"\ n ]/../descendant::ul/li\n '''\n\n self.ratings_pat = re.compile(\n r'([0-9.]+) ?(out of|von|van|su|\u00e9toiles sur|\u3064\u661f\u306e\u3046\u3061|de un m\u00e1ximo de|de) ([\\d\\.]+)( (stars|Sternen|stelle|estrellas|estrelas|sterren)) {0,1}')\n self.ratings_pat_cn = re.compile('\u5e73\u5747([0-9.]+)')\n\n lm = {\n 'eng': ('English', 'Englisch', 'Engels'),\n 'fra': ('French', 'Fran\u00e7ais'),\n 'ita': ('Italian', 'Italiano'),\n 'deu': ('German', 'Deutsch'),\n 'spa': ('Spanish', 'Espa\\xf1ol', 'Espaniol'),\n 'jpn': ('Japanese', u'\u65e5\u672c\u8a9e'),\n 'por': ('Portuguese', 'Portugu\u00eas'),\n 'nld': ('Dutch', 'Nederlands',),\n 'chs': ('Chinese', u'\u4e2d\u6587', u'\u7b80\u4f53\u4e2d\u6587'),\n }\n self.lang_map = {}\n for code, names in lm.iteritems():\n for name in names:\n self.lang_map[name] = code\n\n self.series_pat = re.compile(\n r'''\n \\|\\s* # Prefix\n (Series)\\s*:\\s* # Series declaration\n (?P<series>.+?)\\s+ # The series name\n \\((Book)\\s* # Book declaration\n (?P<index>[0-9.]+) # Series index\n \\s*\\)\n ''', re.X)\n\n def delocalize_datestr(self, raw):\n if self.domain == 'cn':\n return raw.replace('\u5e74', '-').replace('\u6708', '-').replace('\u65e5', '')\n if not self.months:\n return raw\n ans = raw.lower()\n for i, vals in self.months.iteritems():\n for x in vals:\n ans = ans.replace(x, self.english_months[i])\n ans = ans.replace(' de ', ' ')\n return ans\n\n def run(self):\n try:\n self.get_details()\n except:\n self.log.exception('get_details failed for url: %r' % self.url)\n\n def get_details(self):\n if self.preparsed_root is None:\n raw, root, selector = parse_details_page(\n self.url, self.log, self.timeout, self.browser, self.domain)\n else:\n raw, root, selector = self.preparsed_root\n\n from css_selectors import Select\n self.selector = Select(root)\n self.parse_details(raw, root)\n\n def parse_details(self, raw, root):\n asin = parse_asin(root, self.log, self.url)\n if not asin and root.xpath('//form[@action=\"/errors/validateCaptcha\"]'):\n raise CaptchaError(\n 'Amazon returned a CAPTCHA page, probably because you downloaded too many books. Wait for some time and try again.')\n if self.testing:\n import tempfile\n import uuid\n with tempfile.NamedTemporaryFile(prefix=(asin or str(uuid.uuid4())) + '_',\n suffix='.html', delete=False) as f:\n f.write(raw)\n print ('Downloaded html for', asin, 'saved in', f.name)\n\n try:\n title = self.parse_title(root)\n except:\n self.log.exception('Error parsing title for url: %r' % self.url)\n title = None\n\n try:\n authors = self.parse_authors(root)\n except:\n self.log.exception('Error parsing authors for url: %r' % self.url)\n authors = []\n\n if not title or not authors or not asin:\n self.log.error(\n 'Could not find title/authors/asin for %r' % self.url)\n self.log.error('ASIN: %r Title: %r Authors: %r' % (asin, title,\n authors))\n return\n\n mi = Metadata(title, authors)\n idtype = 'amazon' if self.domain == 'com' else 'amazon_' + self.domain\n mi.set_identifier(idtype, asin)\n self.amazon_id = asin\n\n try:\n mi.rating = self.parse_rating(root)\n except:\n self.log.exception('Error parsing ratings for url: %r' % self.url)\n\n try:\n mi.comments = self.parse_comments(root, raw)\n except:\n self.log.exception('Error parsing comments for url: %r' % self.url)\n\n try:\n series, series_index = self.parse_series(root)\n if series:\n mi.series, mi.series_index = series, series_index\n elif self.testing:\n mi.series, mi.series_index = 'Dummy series for testing', 1\n except:\n self.log.exception('Error parsing series for url: %r' % self.url)\n\n try:\n mi.tags = self.parse_tags(root)\n except:\n self.log.exception('Error parsing tags for url: %r' % self.url)\n\n try:\n self.cover_url = self.parse_cover(root, raw)\n except:\n self.log.exception('Error parsing cover for url: %r' % self.url)\n if self.cover_url_processor is not None and self.cover_url.startswith('/'):\n self.cover_url = self.cover_url_processor(self.cover_url)\n mi.has_cover = bool(self.cover_url)\n\n non_hero = tuple(self.selector(\n 'div#bookDetails_container_div div#nonHeroSection'))\n if non_hero:\n # New style markup\n try:\n self.parse_new_details(root, mi, non_hero[0])\n except:\n self.log.exception(\n 'Failed to parse new-style book details section')\n else:\n pd = root.xpath(self.pd_xpath)\n if pd:\n pd = pd[0]\n\n try:\n isbn = self.parse_isbn(pd)\n if isbn:\n self.isbn = mi.isbn = isbn\n except:\n self.log.exception(\n 'Error parsing ISBN for url: %r' % self.url)\n\n try:\n mi.publisher = self.parse_publisher(pd)\n except:\n self.log.exception(\n 'Error parsing publisher for url: %r' % self.url)\n\n try:\n mi.pubdate = self.parse_pubdate(pd)\n except:\n self.log.exception(\n 'Error parsing publish date for url: %r' % self.url)\n\n try:\n lang = self.parse_language(pd)\n if lang:\n mi.language = lang\n except:\n self.log.exception(\n 'Error parsing language for url: %r' % self.url)\n\n else:\n self.log.warning(\n 'Failed to find product description for url: %r' % self.url)\n\n mi.source_relevance = self.relevance\n\n if self.amazon_id:\n if self.isbn:\n self.plugin.cache_isbn_to_identifier(self.isbn, self.amazon_id)\n if self.cover_url:\n self.plugin.cache_identifier_to_cover_url(self.ama zon_id,\n self.cover_url)\n\n self.plugin.clean_downloaded_metadata(mi)\n\n if self.filter_result(mi, self.log):\n self.result_queue.put(mi)\n\n def totext(self, elem):\n return self.tostring(elem, encoding=unicode, method='text').strip()\n\n def parse_title(self, root):\n h1 = root.xpath('//h1[@id=\"title\"]')\n if h1:\n h1 = h1[0]\n for child in h1.xpath('./*[contains(@class, \"a-color-secondary\")]'):\n h1.remove(child)\n return self.totext(h1)\n tdiv = root.xpath('//h1[contains(@class, \"parseasinTitle\")]')[0]\n actual_title = tdiv.xpath('descendant::*[@id=\"btAsinTitle\"]')\n if actual_title:\n title = self.tostring(actual_title[0], encoding=unicode,\n method='text').strip()\n else:\n title = self.tostring(tdiv, encoding=unicode,\n method='text').strip()\n ans = re.sub(r'[(\\[].*[)\\]]', '', title).strip()\n if not ans:\n ans = title.rpartition('[')[0].strip()\n return ans\n\n def parse_authors(self, root):\n for sel in (\n '#byline .author .contributorNameID',\n '#byline .author a.a-link-normal',\n '#bylineInfo .author .contributorNameID',\n '#bylineInfo .author a.a-link-normal'\n ):\n matches = tuple(self.selector(sel))\n if matches:\n authors = [self.totext(x) for x in matches]\n return [a for a in authors if a]\n\n x = '//h1[contains(@class, \"parseasinTitle\")]/following-sibling::span/*[(name()=\"a\" and @href) or (name()=\"span\" and @class=\"contributorNameTrigger\")]'\n aname = root.xpath(x)\n if not aname:\n aname = root.xpath('''\n //h1[contains(@class, \"parseasinTitle\")]/following-sibling::*[(name()=\"a\" and @href) or (name()=\"span\" and @class=\"contributorNameTrigger\")]\n ''')\n for x in aname:\n x.tail = ''\n authors = [self.tostring(x, encoding=unicode, method='text').strip() for x\n in aname]\n authors = [a for a in authors if a]\n return authors\n\n def parse_rating(self, root):\n for x in root.xpath('//div[@id=\"cpsims-feature\" or @id=\"purchase-sims-feature\" or @id=\"rhf\"]'):\n # Remove the similar books section as it can cause spurious\n # ratings matches\n x.getparent().remove(x)\n\n rating_paths = ('//div[@data-feature-name=\"averageCustomerReviews\" or @id=\"averageCustomerReviews\"]',\n '//div[@class=\"jumpBar\"]/descendant::span[contains(@class,\"asinReviewsSummary\")]',\n '//div[@class=\"buying\"]/descendant::span[contains(@class,\"asinReviewsSummary\")]',\n '//span[@class=\"crAvgStars\"]/descendant::span[contains(@class,\"asinReviewsSummary\")]')\n ratings = None\n for p in rating_paths:\n ratings = root.xpath(p)\n if ratings:\n break\n if ratings:\n for elem in ratings[0].xpath('descendant::*[@title]'):\n t = elem.get('title').strip()\n if self.domain == 'cn':\n m = self.ratings_pat_cn.match(t)\n if m is not None:\n return float(m.group(1))\n else:\n m = self.ratings_pat.match(t)\n if m is not None:\n return float(m.group(1)) / float(m.group(3)) * 5\n\n def _render_comments(self, desc):\n from calibre.library.comments import sanitize_comments_html\n\n for c in desc.xpath('descendant::noscript'):\n c.getparent().remove(c)\n for c in desc.xpath('descendant::*[@class=\"seeAll\" or'\n ' @class=\"emptyClear\" or @id=\"collapsePS\" or'\n ' @id=\"expandPS\"]'):\n c.getparent().remove(c)\n for b in desc.xpath('descendant::b[@style]'):\n # Bing highlights search results\n s = b.get('style', '')\n if 'color' in s:\n b.tag = 'span'\n del b.attrib['style']\n\n for a in desc.xpath('descendant::a[@href]'):\n del a.attrib['href']\n a.tag = 'span'\n desc = self.tostring(desc, method='html', encoding=unicode).strip()\n\n # Encoding bug in Amazon data U+fffd (replacement char)\n # in some examples it is present in place of '\n desc = desc.replace('\\ufffd', \"'\")\n # remove all attributes from tags\n desc = re.sub(r'<([a-zA-Z0-9]+)\\s[^>]+>', r'<\\1>', desc)\n # Collapse whitespace\n # desc = re.sub('\\n+', '\\n', desc)\n # desc = re.sub(' +', ' ', desc)\n # Remove the notice about text referring to out of print editions\n desc = re.sub(r'(?s)<em>--This text ref.*?</em>', '', desc)\n # Remove comments\n desc = re.sub(r'(?s)<!--.*?-->', '', desc)\n return sanitize_comments_html(desc)\n\n def parse_comments(self, root, raw):\n from urllib import unquote\n ans = ''\n ns = tuple(self.selector('#bookDescription_feature_div noscript'))\n if ns:\n ns = ns[0]\n if len(ns) == 0 and ns.text:\n import html5lib\n # html5lib parsed noscript as CDATA\n ns = html5lib.parseFragment(\n '<div>%s</div>' % (ns.text), treebuilder='lxml', namespaceHTMLElements=False)[0]\n else:\n ns.tag = 'div'\n ans = self._render_comments(ns)\n else:\n desc = root.xpath('//div[@id=\"ps-content\"]/div[@class=\"content\"]')\n if desc:\n ans = self._render_comments(desc[0])\n\n desc = root.xpath(\n '//div[@id=\"productDescription\"]/*[@class=\"content\"]')\n if desc:\n ans += self._render_comments(desc[0])\n else:\n # Idiot chickens from amazon strike again. This data is now stored\n # in a JS variable inside a script tag URL encoded.\n m = re.search(br'var\\s+iframeContent\\s*=\\s*\"([^\"]+)\"', raw)\n if m is not None:\n try:\n text = unquote(m.group(1)).decode('utf-8')\n nr = parse_html(text)\n desc = nr.xpath(\n '//div[@id=\"productDescription\"]/*[@class=\"content\"]')\n if desc:\n ans += self._render_comments(desc[0])\n except Exception as e:\n self.log.warn(\n 'Parsing of obfuscated product description failed with error: %s' % as_unicode(e))\n\n return ans\n\n def parse_series(self, root):\n ans = (None, None)\n\n # This is found on the paperback/hardback pages for books on amazon.com\n series = root.xpath('//div[@data-feature-name=\"seriesTitle\"]')\n if series:\n series = series[0]\n spans = series.xpath('./span')\n if spans:\n raw = self.tostring(\n spans[0], encoding=unicode, method='text', with_tail=False).strip()\n m = re.search(r'\\s+([0-9.]+)$', raw.strip())\n if m is not None:\n series_index = float(m.group(1))\n s = series.xpath('./a[@id=\"series-page-link\"]')\n if s:\n series = self.tostring(\n s[0], encoding=unicode, method='text', with_tail=False).strip()\n if series:\n ans = (series, series_index)\n # This is found on Kindle edition pages on amazon.com\n if ans == (None, None):\n for span in root.xpath('//div[@id=\"aboutEbooksSection\"]//li/span'):\n text = (span.text or '').strip()\n m = re.match(r'Book\\s+([0-9.]+)', text)\n if m is not None:\n series_index = float(m.group(1))\n a = span.xpath('./a[@href]')\n if a:\n series = self.tostring(\n a[0], encoding=unicode, method='text', with_tail=False).strip()\n if series:\n ans = (series, series_index)\n # This is found on newer Kindle edition pages on amazon.com\n if ans == (None, None):\n for b in root.xpath('//div[@id=\"reviewFeatureGroup\"]/span/b'):\n text = (b.text or '').strip()\n m = re.match(r'Book\\s+([0-9.]+)', text)\n if m is not None:\n series_index = float(m.group(1))\n a = b.getparent().xpath('./a[@href]')\n if a:\n series = self.tostring(\n a[0], encoding=unicode, method='text', with_tail=False).partition('(')[0].strip()\n if series:\n ans = series, series_index\n\n if ans == (None, None):\n desc = root.xpath('//div[@id=\"ps-content\"]/div[@class=\"buying\"]')\n if desc:\n raw = self.tostring(desc[0], method='text', encoding=unicode)\n raw = re.sub(r'\\s+', ' ', raw)\n match = self.series_pat.search(raw)\n if match is not None:\n s, i = match.group('series'), float(match.group('index'))\n if s:\n ans = (s, i)\n if ans[0]:\n ans = (re.sub(r'\\s+Series$', '', ans[0]).strip(), ans[1])\n ans = (re.sub(r'\\(.+?\\s+Series\\)$', '', ans[0]).strip(), ans[1])\n return ans\n\n def parse_tags(self, root):\n ans = []\n exclude_tokens = {'kindle', 'a-z'}\n exclude = {'special features', 'by authors',\n 'authors & illustrators', 'books', 'new; used & rental textbooks'}\n seen = set()\n for li in root.xpath(self.tags_xpath):\n for i, a in enumerate(li.iterdescendants('a')):\n if i > 0:\n # we ignore the first category since it is almost always\n # too broad\n raw = (a.text or '').strip().replace(',', ';')\n lraw = icu_lower(raw)\n tokens = frozenset(lraw.split())\n if raw and lraw not in exclude and not tokens.intersection(exclude_tokens) and lraw not in seen:\n ans.append(raw)\n seen.add(lraw)\n return ans\n\n def parse_cover(self, root, raw=b\"\"):\n # Look for the image URL in javascript, using the first image in the\n # image gallery as the cover\n import json\n imgpat = re.compile(r\"\"\"'imageGalleryData'\\s*:\\s*(\\[\\s*{.+])\"\"\")\n for script in root.xpath('//script'):\n m = imgpat.search(script.text or '')\n if m is not None:\n try:\n return json.loads(m.group(1))[0]['mainUrl']\n except Exception:\n continue\n\n def clean_img_src(src):\n parts = src.split('/')\n if len(parts) > 3:\n bn = parts[-1]\n sparts = bn.split('_')\n if len(sparts) > 2:\n bn = re.sub(r'\\.\\.jpg$', '.jpg', (sparts[0] + sparts[-1]))\n return ('/'.join(parts[:-1])) + '/' + bn\n\n imgpat2 = re.compile(r'var imageSrc = \"([^\"]+)\"')\n for script in root.xpath('//script'):\n m = imgpat2.search(script.text or '')\n if m is not None:\n src = m.group(1)\n url = clean_img_src(src)\n if url:\n return url\n\n imgs = root.xpath(\n '//img[(@id=\"prodImage\" or @id=\"original-main-image\" or @id=\"main-image\" or @id=\"main-image-nonjs\") and @src]')\n if not imgs:\n imgs = (\n root.xpath('//div[@class=\"main-image-inner-wrapper\"]/img[@src]') or\n root.xpath('//div[@id=\"main-image-container\" or @id=\"ebooks-main-image-container\"]//img[@src]') or\n root.xpath(\n '//div[@id=\"mainImageContainer\"]//img[@data-a-dynamic-image]')\n )\n for img in imgs:\n try:\n idata = json.loads(img.get('data-a-dynamic-image'))\n except Exception:\n imgs = ()\n else:\n mwidth = 0\n try:\n url = None\n for iurl, (width, height) in idata.iteritems():\n if width > mwidth:\n mwidth = width\n url = iurl\n return url\n except Exception:\n pass\n\n for img in imgs:\n src = img.get('src')\n if 'data:' in src:\n continue\n if 'loading-' in src:\n js_img = re.search(br'\"largeImage\":\"(https?://[^\"]+)\",', raw)\n if js_img:\n src = js_img.group(1).decode('utf-8')\n if ('/no-image-avail' not in src and 'loading-' not in src and '/no-img-sm' not in src):\n self.log('Found image: %s' % src)\n url = clean_img_src(src)\n if url:\n return url\n\n def parse_new_details(self, root, mi, non_hero):\n table = non_hero.xpath('descendant::table')[0]\n for tr in table.xpath('descendant::tr'):\n cells = tr.xpath('descendant::td')\n if len(cells) == 2:\n name = self.totext(cells[0])\n val = self.totext(cells[1])\n if not val:\n continue\n if name in self.language_names:\n ans = self.lang_map.get(val, None)\n if not ans:\n ans = canonicalize_lang(val)\n if ans:\n mi.language = ans\n elif name in self.publisher_names:\n pub = val.partition(';')[0].partition('(')[0].strip()\n if pub:\n mi.publisher = pub\n date = val.rpartition('(')[-1].replace(')', '').strip()\n try:\n from calibre.utils.date import parse_only_date\n date = self.delocalize_datestr(date)\n mi.pubdate = parse_only_date(date, assume_utc=True)\n except:\n self.log.exception('Failed to parse pubdate: %s' % val)\n elif name in {'ISBN', 'ISBN-10', 'ISBN-13'}:\n ans = check_isbn(val)\n if ans:\n self.isbn = mi.isbn = ans\n\n def parse_isbn(self, pd):\n items = pd.xpath(\n 'descendant::*[starts-with(text(), \"ISBN\")]')\n if not items:\n items = pd.xpath(\n 'descendant::b[contains(text(), \"ISBN:\")]')\n for x in reversed(items):\n if x.tail:\n ans = check_isbn(x.tail.strip())\n if ans:\n return ans\n\n def parse_publisher(self, pd):\n for x in reversed(pd.xpath(self.publisher_xpath)):\n if x.tail:\n ans = x.tail.partition(';')[0]\n return ans.partition('(')[0].strip()\n\n def parse_pubdate(self, pd):\n for x in reversed(pd.xpath(self.publisher_xpath)):\n if x.tail:\n from calibre.utils.date import parse_only_date\n ans = x.tail\n date = ans.rpartition('(')[-1].replace(')', '').strip()\n date = self.delocalize_datestr(date)\n return parse_only_date(date, assume_utc=True)\n\n def parse_language(self, pd):\n for x in reversed(pd.xpath(self.language_xpath)):\n if x.tail:\n raw = x.tail.strip().partition(',')[0].strip()\n ans = self.lang_map.get(raw, None)\n if ans:\n return ans\n ans = canonicalize_lang(ans)\n if ans:\n return ans\n# }}}\n\n\nclass Amazon(Source):\n\n name = 'Amazon.com'\n version = (1, 2, 3)\n minimum_calibre_version = (2, 82, 0)\n description = _('Downloads metadata and covers from Amazon')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset(['title', 'authors', 'identifier:amazon',\n 'rating', 'comments', 'publisher', 'pubdate',\n 'languages', 'series', 'tags'])\n has_html_comments = True\n supports_gzip_transfer_encoding = True\n prefer_results_with_isbn = False\n\n AMAZON_DOMAINS = {\n 'com': _('US'),\n 'fr': _('France'),\n 'de': _('Germany'),\n 'uk': _('UK'),\n 'au': _('Australia'),\n 'it': _('Italy'),\n 'jp': _('Japan'),\n 'es': _('Spain'),\n 'br': _('Brazil'),\n 'nl': _('Netherlands'),\n 'cn': _('China'),\n 'ca': _('Canada'),\n }\n\n SERVERS = {\n 'auto': _('Choose server automatically'),\n 'amazon': _('Amazon servers'),\n 'bing': _('Bing search cache'),\n 'google': _('Google search cache'),\n 'wayback': _('Wayback machine cache (slow)'),\n }\n\n options = (\n Option('domain', 'choices', 'com', _('Amazon country website to use:'),\n _('Metadata from Amazon will be fetched using this '\n 'country\\'s Amazon website.'), choices=AMAZON_DOMAINS),\n Option('server', 'choices', 'auto', _('Server to get data from:'),\n _(\n 'Amazon has started blocking attempts to download'\n ' metadata from its servers. To get around this problem,'\n ' calibre can fetch the Amazon data from many different'\n ' places where it is cached. Choose the source you prefer.'\n ), choices=SERVERS),\n )\n\n def __init__(self, *args, **kwargs):\n Source.__init__(self, *args, **kwargs)\n self.set_amazon_id_touched_fields()\n\n def test_fields(self, mi):\n '''\n Return the first field from self.touched_fields that is null on the\n mi object\n '''\n for key in self.touched_fields:\n if key.startswith('identifier:'):\n key = key.partition(':')[-1]\n if key == 'amazon':\n if self.domain != 'com':\n key += '_' + self.domain\n if not mi.has_identifier(key):\n return 'identifier: ' + key\n elif mi.is_null(key):\n return key\n\n @property\n def browser(self):\n global ua_index\n if self.use_search_engine:\n if self._browser is None:\n ua = random_user_agent(allow_ie=False)\n self._browser = br = browser(user_agent=ua)\n br.set_handle_gzip(True)\n br.addheaders += [\n ('Accept', accept_header_for_ua(ua)),\n ('Upgrade-insecure-requests', '1'),\n ]\n br = self._browser\n else:\n all_uas = all_user_agents()\n ua_index = (ua_index + 1) % len(all_uas)\n ua = all_uas[ua_index]\n self._browser = br = browser(user_agent=ua)\n br.set_handle_gzip(True)\n br.addheaders += [\n ('Accept', accept_header_for_ua(ua)),\n ('Upgrade-insecure-requests', '1'),\n ('Referer', self.referrer_for_domain()),\n ]\n return br\n\n def save_settings(self, *args, **kwargs):\n Source.save_settings(self, *args, **kwargs)\n self.set_amazon_id_touched_fields()\n\n def set_amazon_id_touched_fields(self):\n ident_name = \"identifier:amazon\"\n if self.domain != 'com':\n ident_name += '_' + self.domain\n tf = [x for x in self.touched_fields if not\n x.startswith('identifier:amazon')] + [ident_name]\n self.touched_fields = frozenset(tf)\n\n def get_domain_and_asin(self, identifiers, extra_domains=()):\n for key, val in identifiers.iteritems():\n key = key.lower()\n if key in ('amazon', 'asin'):\n return 'com', val\n if key.startswith('amazon_'):\n domain = key.partition('_')[-1]\n if domain and (domain in self.AMAZON_DOMAINS or domain in extra_domains):\n return domain, val\n return None, None\n\n def referrer_for_domain(self, domain=None):\n domain = domain or self.domain\n return {\n 'uk': 'https://www.amazon.co.uk/',\n 'au': 'https://www.amazon.com.au/',\n 'br': 'https://www.amazon.com.br/',\n }.get(domain, 'https://www.amazon.%s/' % domain)\n\n def _get_book_url(self, identifiers): # {{{\n domain, asin = self.get_domain_and_asin(\n identifiers, extra_domains=('in', 'au', 'ca'))\n if domain and asin:\n url = None\n r = self.referrer_for_domain(domain)\n if r is not None:\n url = r + 'dp/' + asin\n if url:\n idtype = 'amazon' if domain == 'com' else 'amazon_' + domain\n return domain, idtype, asin, url\n\n def get_book_url(self, identifiers):\n ans = self._get_book_url(identifiers)\n if ans is not None:\n return ans[1:]\n\n def get_book_url_name(self, idtype, idval, url):\n if idtype == 'amazon':\n return self.name\n return 'A' + idtype.replace('_', '.')[1:]\n # }}}\n\n @property\n def domain(self):\n x = getattr(self, 'testing_domain', None)\n if x is not None:\n return x\n domain = self.prefs['domain']\n if domain not in self.AMAZON_DOMAINS:\n domain = 'com'\n\n return domain\n\n @property\n def server(self):\n x = getattr(self, 'testing_server', None)\n if x is not None:\n return x\n server = self.prefs['server']\n if server not in self.SERVERS:\n server = 'auto'\n return server\n\n @property\n def use_search_engine(self):\n return self.server != 'amazon'\n\n def clean_downloaded_metadata(self, mi):\n docase = (\n mi.language == 'eng' or\n (mi.is_null('language') and self.domain in {'com', 'uk', 'au'})\n )\n if mi.title and docase:\n # Remove series information from title\n m = re.search(r'\\S+\\s+(\\(.+?\\s+Book\\s+\\d+\\))$', mi.title)\n if m is not None:\n mi.title = mi.title.replace(m.group(1), '').strip()\n mi.title = fixcase(mi.title)\n mi.authors = fixauthors(mi.authors)\n if mi.tags and docase:\n mi.tags = list(map(fixcase, mi.tags))\n mi.isbn = check_isbn(mi.isbn)\n if mi.series and docase:\n mi.series = fixcase(mi.series)\n if mi.title and mi.series:\n for pat in (r':\\s*Book\\s+\\d+\\s+of\\s+%s$', r'\\(%s\\)$', r':\\s*%s\\s+Book\\s+\\d+$'):\n pat = pat % re.escape(mi.series)\n q = re.sub(pat, '', mi.title, flags=re.I).strip()\n if q and q != mi.title:\n mi.title = q\n break\n\n def get_website_domain(self, domain):\n return {'uk': 'co.uk', 'jp': 'co.jp', 'br': 'com.br', 'au': 'com.au'}.get(domain, domain)\n\n def create_query(self, log, title=None, authors=None, identifiers={}, # {{{\n domain=None, for_amazon=True):\n from urllib import urlencode\n if domain is None:\n domain = self.domain\n\n idomain, asin = self.get_domain_and_asin(identifiers)\n if idomain is not None:\n domain = idomain\n\n # See the amazon detailed search page to get all options\n terms = []\n q = {'search-alias': 'aps',\n 'unfiltered': '1',\n }\n\n if domain == 'com':\n q['sort'] = 'relevanceexprank'\n else:\n q['sort'] = 'relevancerank'\n\n isbn = check_isbn(identifiers.get('isbn', None))\n\n if asin is not None:\n q['field-keywords'] = asin\n terms.append(asin)\n elif isbn is not None:\n q['field-isbn'] = isbn\n if len(isbn) == 13:\n terms.extend('({} OR {}-{})'.format(isbn, isbn[:3], isbn[3:]).split())\n else:\n terms.append(isbn)\n else:\n # Only return book results\n q['search-alias'] = {'br': 'digital-text',\n 'nl': 'aps'}.get(domain, 'stripbooks')\n if title:\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n q['field-title'] = ' '.join(title_tokens)\n terms.extend(title_tokens)\n if authors:\n author_tokens = list(self.get_author_tokens(authors,\n only_first_author=True))\n if author_tokens:\n q['field-author'] = ' '.join(author_tokens)\n terms.extend(author_tokens)\n\n if not ('field-keywords' in q or 'field-isbn' in q or\n ('field-title' in q)):\n # Insufficient metadata to make an identify query\n return None, None\n\n if not for_amazon:\n return terms, domain\n\n # magic parameter to enable Japanese Shift_JIS encoding.\n if domain == 'jp':\n q['__mk_ja_JP'] = u'\u30ab\u30bf\u30ab\u30ca'\n if domain == 'nl':\n q['__mk_nl_NL'] = u'\u00c5M\u00c5\u017d\u00d5\u00d1'\n if 'field-keywords' not in q:\n q['field-keywords'] = ''\n for f in 'field-isbn field-title field-author'.split():\n q['field-keywords'] += ' ' + q.pop(f, '')\n q['field-keywords'] = q['field-keywords'].strip()\n\n if domain == 'jp':\n encode_to = 'Shift_JIS'\n elif domain == 'nl' or domain == 'cn':\n encode_to = 'utf-8'\n else:\n encode_to = 'latin1'\n encoded_q = dict([(x.encode(encode_to, 'ignore'), y.encode(encode_to,\n 'ignore')) for x, y in\n q.iteritems()])\n url = 'https://www.amazon.%s/s/?' % self.get_website_domain(\n domain) + urlencode(encoded_q)\n return url, domain\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n domain, asin = self.get_domain_and_asin(identifiers)\n if asin is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n asin = self.cached_isbn_to_identifier(isbn)\n if asin is not None:\n url = self.cached_identifier_to_cover_url(asin)\n\n return url\n # }}}\n\n def parse_results_page(self, root, domain): # {{{\n from lxml.html import tostring\n\n matches = []\n\n def title_ok(title):\n title = title.lower()\n bad = ['bulk pack', '[audiobook]', '[audio cd]',\n '(a book companion)', '( slipcase with door )', ': free sampler']\n if self.domain == 'com':\n bad.extend(['(%s edition)' % x for x in ('spanish', 'german')])\n for x in bad:\n if x in title:\n return False\n if title and title[0] in '[{' and re.search(r'\\(\\s*author\\s*\\)', title) is not None:\n # Bad entries in the catalog\n return False\n return True\n\n for a in root.xpath(r'//li[starts-with(@id, \"result_\")]//a[@href and contains(@class, \"s-access-detail-page\")]'):\n title = tostring(a, method='text', encoding=unicode)\n if title_ok(title):\n url = a.get('href')\n if url.startswith('/'):\n url = 'https://www.amazon.%s%s' % (\n self.get_website_domain(domain), url)\n matches.append(url)\n\n if not matches:\n # Previous generation of results page markup\n for div in root.xpath(r'//div[starts-with(@id, \"result_\")]'):\n links = div.xpath(r'descendant::a[@class=\"title\" and @href]')\n if not links:\n # New amazon markup\n links = div.xpath('descendant::h3/a[@href]')\n for a in links:\n title = tostring(a, method='text', encoding=unicode)\n if title_ok(title):\n url = a.get('href')\n if url.startswith('/'):\n url = 'https://www.amazon.%s%s' % (\n self.get_website_domain(domain), url)\n matches.append(url)\n break\n\n if not matches:\n # This can happen for some user agents that Amazon thinks are\n # mobile/less capable\n for td in root.xpath(\n r'//div[@id=\"Results\"]/descendant::td[starts-with(@id, \"search:Td:\")]'):\n for a in td.xpath(r'descendant::td[@class=\"dataColumn\"]/descendant::a[@href]/span[@class=\"srTitle\"]/..'):\n title = tostring(a, method='text', encoding=unicode)\n if title_ok(title):\n url = a.get('href')\n if url.startswith('/'):\n url = 'https://www.amazon.%s%s' % (\n self.get_website_domain(domain), url)\n matches.append(url)\n break\n if not matches and root.xpath('//form[@action=\"/errors/validateCaptcha\"]'):\n raise CaptchaError('Amazon returned a CAPTCHA page. Recently Amazon has begun using statistical'\n ' profiling to block access to its website. As such this metadata plugin is'\n ' unlikely to ever work reliably.')\n\n # Keep only the top 3 matches as the matches are sorted by relevance by\n # Amazon so lower matches are not likely to be very relevant\n return matches[:3]\n # }}}\n\n def search_amazon(self, br, testing, log, abort, title, authors, identifiers, timeout): # {{{\n from calibre.utils.cleantext import clean_ascii_chars\n from calibre.ebooks.chardet import xml_to_unicode\n matches = []\n query, domain = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers)\n if query is None:\n log.error('Insufficient metadata to construct query')\n raise SearchFailed()\n try:\n raw = br.open_novisit(query, timeout=timeout).read().strip()\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and \\\n e.getcode() == 404:\n log.error('Query malformed: %r' % query)\n raise SearchFailed()\n attr = getattr(e, 'args', [None])\n attr = attr if attr else [None]\n if isinstance(attr[0], socket.timeout):\n msg = _('Amazon timed out. Try again later.')\n log.error(msg)\n else:\n msg = 'Failed to make identify query: %r' % query\n log.exception(msg)\n raise SearchFailed()\n\n raw = clean_ascii_chars(xml_to_unicode(raw,\n strip_encoding_pats=True, resolve_entities=True)[0])\n\n if testing:\n import tempfile\n with tempfile.NamedTemporaryFile(prefix='amazon_results _',\n suffix='.html', delete=False) as f:\n f.write(raw.encode('utf-8'))\n print ('Downloaded html for results page saved in', f.name)\n\n matches = []\n found = '<title>404 - ' not in raw\n\n if found:\n try:\n root = parse_html(raw)\n except Exception:\n msg = 'Failed to parse amazon page for query: %r' % query\n log.exception(msg)\n raise SearchFailed()\n\n matches = self.parse_results_page(root, domain)\n\n return matches, query, domain, None\n # }}}\n\n def search_search_engine(self, br, testing, log, abort, title, authors, identifiers, timeout, override_server=None): # {{{\n from calibre.ebooks.metadata.sources.update import search_engines_module\n terms, domain = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers, for_amazon=False)\n site = self.referrer_for_domain(\n domain)[len('https://'):].partition('/')[0]\n matches = []\n se = search_engines_module()\n server = override_server or self.server\n if server in ('bing',):\n urlproc, sfunc = se.bing_url_processor, se.bing_search\n elif server in ('auto', 'google'):\n urlproc, sfunc = se.google_url_processor, se.google_search\n elif server == 'wayback':\n urlproc, sfunc = se.wayback_url_processor, se.ddg_search\n results, qurl = sfunc(terms, site, log=log, br=br, timeout=timeout)\n br.set_current_header('Referer', qurl)\n for result in results:\n if abort.is_set():\n return matches, terms, domain, None\n\n purl = urlparse(result.url)\n if '/dp/' in purl.path and site in purl.netloc:\n url = result.cached_url\n if url is None:\n url = se.wayback_machine_cached_url(\n result.url, br, timeout=timeout)\n if url is None:\n log('Failed to find cached page for:', result.url)\n continue\n if url not in matches:\n matches.append(url)\n if len(matches) >= 3:\n break\n else:\n log('Skipping non-book result:', result)\n if not matches:\n log('No search engine results for terms:', ' '.join(terms))\n if urlproc is se.google_url_processor:\n # Google does not cache adult titles\n log('Trying the bing search engine instead')\n return self.search_search_engine(br, testing, log, abort, title, authors, identifiers, timeout, 'bing')\n return matches, terms, domain, urlproc\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=60):\n '''\n Note this method will retry without identifiers automatically if no\n match is found with identifiers.\n '''\n\n testing = getattr(self, 'running_a_test', False)\n\n udata = self._get_book_url(identifiers)\n br = self.browser\n log('User-agent:', br.current_user_agent())\n log('Server:', self.server)\n if testing:\n print('User-agent:', br.current_user_agent())\n if udata is not None and not self.use_search_engine:\n # Try to directly get details page instead of running a search\n # Cannot use search engine as the directly constructed URL is\n # usually redirected to a full URL by amazon, and is therefore\n # not cached\n domain, idtype, asin, durl = udata\n if durl is not None:\n preparsed_root = parse_details_page(\n durl, log, timeout, br, domain)\n if preparsed_root is not None:\n qasin = parse_asin(preparsed_root[1], log, durl)\n if qasin == asin:\n w = Worker(durl, result_queue, br, log, 0, domain,\n self, testing=testing, preparsed_root=preparsed_root, timeout=timeout)\n try:\n w.get_details()\n return\n except Exception:\n log.exception(\n 'get_details failed for url: %r' % durl)\n func = self.search_search_engine if self.use_search_engine else self.search_amazon\n try:\n matches, query, domain, cover_url_processor = func(\n br, testing, log, abort, title, authors, identifiers, timeout)\n except SearchFailed:\n return\n\n if abort.is_set():\n return\n\n if not matches:\n if identifiers and title and authors:\n log('No matches found with identifiers, retrying using only'\n ' title and authors. Query: %r' % query)\n time.sleep(1)\n return self.identify(log, result_queue, abort, title=title,\n authors=authors, timeout=timeout)\n log.error('No matches found with query: %r' % query)\n return\n\n workers = [Worker(\n url, result_queue, br, log, i, domain, self, testing=testing, timeout=timeout,\n cover_url_processor=cover_url_processor, filter_result=partial(\n self.filter_result, title, authors, identifiers)) for i, url in enumerate(matches)]\n\n for w in workers:\n # Don't send all requests at the same time\n time.sleep(1)\n w.start()\n if abort.is_set():\n return\n\n while not abort.is_set():\n a_worker_is_alive = False\n for w in workers:\n w.join(0.2)\n if abort.is_set():\n break\n if w.is_alive():\n a_worker_is_alive = True\n if not a_worker_is_alive:\n break\n\n return None\n # }}}\n\n def filter_result(self, title, authors, identifiers, mi, log): # {{{\n if not self.use_search_engine:\n return True\n if title is not None:\n tokens = {icu_lower(x) for x in title.split() if len(x) > 3}\n if tokens:\n result_tokens = {icu_lower(x) for x in mi.title.split()}\n if not tokens.intersection(result_tokens):\n log('Ignoring result:', mi.title, 'as its title does not match')\n return False\n if authors:\n author_tokens = set()\n for author in authors:\n author_tokens |= {icu_lower(x) for x in author.split() if len(x) > 2}\n result_tokens = set()\n for author in mi.authors:\n result_tokens |= {icu_lower(x) for x in author.split() if len(x) > 2}\n if author_tokens and not author_tokens.intersection(result_tokens):\n log('Ignoring result:', mi.title, 'by', ' & '.join(mi.authors), 'as its author does not match')\n return False\n return True\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=60, get_best_cover=False):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n log('Downloading cover from:', cached_url)\n br = self.browser\n if self.use_search_engine:\n br = br.clone_browser()\n br.set_current_header('Referer', self.referrer_for_domain(self.domain))\n try:\n time.sleep(1)\n cdata = br.open_novisit(\n cached_url, timeout=timeout).read()\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n # }}}\n\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug\n # src/calibre/ebooks/metadata/sources/amazon.py\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n isbn_test, title_test, authors_test, comments_test, series_test)\n com_tests = [ # {{{\n\n ( # Paperback with series\n {'identifiers': {'amazon': '1423146786'}},\n [title_test('The Heroes of Olympus, Book Five The Blood of Olympus',\n exact=True), series_test('Heroes of Olympus', 5)]\n ),\n\n ( # Kindle edition with series\n {'identifiers': {'amazon': 'B0085UEQDO'}},\n [title_test('Three Parts Dead', exact=True),\n series_test('Craft Sequence', 1)]\n ),\n\n ( # + in title and uses id=\"main-image\" for cover\n {'identifiers': {'amazon': '1933988770'}},\n [title_test(\n 'C++ Concurrency in Action: Practical Multithreading', exact=True)]\n ),\n\n\n ( # Different comments markup, using Book Description section\n {'identifiers': {'amazon': '0982514506'}},\n [title_test(\n \"Griffin's Destiny: Book Three: The Griffin's Daughter Trilogy\",\n exact=True),\n comments_test('Jelena'), comments_test('Ashinji'),\n ]\n ),\n\n ( # # in title\n {'title': 'Expert C# 2008 Business Objects',\n 'authors': ['Lhotka']},\n [title_test('Expert C#'),\n authors_test(['Rockford Lhotka'])\n ]\n ),\n\n ( # No specific problems\n {'identifiers': {'isbn': '0743273567'}},\n [title_test('The great gatsby', exact=True),\n authors_test(['F. Scott Fitzgerald'])]\n ),\n\n ]\n\n # }}}\n\n de_tests = [ # {{{\n (\n {'identifiers': {'isbn': '9783453314979'}},\n [title_test('Die letzten W\u00e4chter: Roman',\n exact=False), authors_test(['Sergej Lukianenko'])\n ]\n\n ),\n\n (\n {'identifiers': {'isbn': '3548283519'}},\n [title_test('Wer Wind S\u00e4t: Der F\u00fcnfte Fall F\u00fcr Bodenstein Und Kirchhoff',\n exact=False), authors_test(['Nele Neuhaus'])\n ]\n\n ),\n ] # }}}\n\n it_tests = [ # {{{\n (\n {'identifiers': {'isbn': '8838922195'}},\n [title_test('La briscola in cinque',\n exact=True), authors_test(['Marco Malvaldi'])\n ]\n\n ),\n ] # }}}\n\n fr_tests = [ # {{{\n (\n {'identifiers': {'isbn': '2221116798'}},\n [title_test('L\\'\u00e9trange voyage de Monsieur Daldry',\n exact=True), authors_test(['Marc Levy'])\n ]\n\n ),\n ] # }}}\n\n es_tests = [ # {{{\n (\n {'identifiers': {'isbn': '8483460831'}},\n [title_test('Tiempos Interesantes',\n exact=False), authors_test(['Terry Pratchett'])\n ]\n\n ),\n ] # }}}\n\n jp_tests = [ # {{{\n ( # Adult filtering test\n {'identifiers': {'isbn': '4799500066'}},\n [title_test(u'\uff22\uff49\uff54\uff43\uff48 \uff34\uff52\uff41\uff50'), ]\n ),\n\n ( # isbn -> title, authors\n {'identifiers': {'isbn': '9784101302720'}},\n [title_test(u'\u7cbe\u970a\u306e\u5b88\u308a\u4eba' ,\n exact=True), authors_test([u'\u4e0a\u6a4b \u83dc\u7a42\u5b50'])\n ]\n ),\n ( # title, authors -> isbn (will use Shift_JIS encoding in query.)\n {'title': u'\u8003\u3048\u306a\u3044\u7df4\u7fd2',\n 'authors': [u'\u5c0f\u6c60 \u9f8d\u4e4b\u4ecb']},\n [isbn_test('9784093881067'), ]\n ),\n ] # }}}\n\n br_tests = [ # {{{\n (\n {'title': 'Guerra dos Tronos'},\n [title_test('A Guerra dos Tronos - As Cr\u00f4nicas de Gelo e Fogo',\n exact=True), authors_test(['George R. R. Martin'])\n ]\n\n ),\n ] # }}}\n\n nl_tests = [ # {{{\n (\n {'title': 'Freakonomics'},\n [title_test('Freakonomics',\n exact=True), authors_test(['Steven Levitt & Stephen Dubner & R. Kuitenbrouwer & O. Brenninkmeijer & A. van Den Berg'])\n ]\n\n ),\n ] # }}}\n\n cn_tests = [ # {{{\n (\n {'identifiers': {'isbn': '9787115369512'}},\n [title_test('\u82e5\u4e3a\u81ea\u7531\u6545 \u81ea\u7531\u8f6f\u4ef6\u4e4b\u7236\u7406\u67e5\u 5fb7\u65af\u6258\u66fc\u4f20', exact=True),\n authors_test(['[\u7f8e]sam Williams', '\u9093\u6960\uff0c\u674e\u51e1\u5e0c'])]\n ),\n (\n {'title': '\u7231\u4e0aRaspberry Pi'},\n [title_test('\u7231\u4e0aRaspberry Pi',\n exact=True), authors_test(['Matt Richardson', 'Shawn Wallace', '\u674e\u51e1\u5e0c'])\n ]\n\n ),\n ] # }}}\n\n ca_tests = [ # {{{\n ( # Paperback with series\n {'identifiers': {'isbn': '9781623808747'}},\n [title_test('Parting Shot', exact=True),\n authors_test(['Mary Calmes'])]\n ),\n ( # # in title\n {'title': 'Expert C# 2008 Business Objects',\n 'authors': ['Lhotka']},\n [title_test('Expert C# 2008 Business Objects'),\n authors_test(['Rockford Lhotka'])]\n ),\n ( # noscript description\n {'identifiers': {'amazon_ca': '162380874X'}},\n [title_test('Parting Shot', exact=True), authors_test(['Mary Calmes'])\n ]\n ),\n ] # }}}\n\n def do_test(domain, start=0, stop=None, server='auto'):\n tests = globals().get(domain + '_tests')\n if stop is None:\n stop = len(tests)\n tests = tests[start:stop]\n test_identify_plugin(Amazon.name, tests, modify_plugin=lambda p: (\n setattr(p, 'testing_domain', domain),\n setattr(p, 'touched_fields', p.touched_fields - {'tags'}),\n setattr(p, 'testing_server', server),\n ))\n\n do_test('com')\n # do_test('de')\n# }}}\n",
"overdrive": "#!/usr/bin/env python2\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2011, Kovid Goyal kovid@kovidgoyal.net'\n__docformat__ = 'restructuredtext en'\n\n'''\nFetch metadata using Overdrive Content Reserve\n'''\nimport re, random, copy, json\nfrom threading import RLock\nfrom Queue import Queue, Empty\n\n\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source, Option\nfrom calibre.ebooks.metadata.book.base import Metadata\n\novrdrv_data_cache = {}\ncache_lock = RLock()\nbase_url = 'https://search.overdrive.com/'\n\n\nclass OverDrive(Source):\n\n name = 'Overdrive'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads metadata and covers from Overdrive\\'s Content Reserve')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset(['title', 'authors', 'tags', 'pubdate',\n 'comments', 'publisher', 'identifier:isbn', 'series', 'series_index',\n 'languages', 'identifierverdrive'])\n has_html_comments = True\n supports_gzip_transfer_encoding = False\n cached_cover_url_is_reliable = True\n\n options = (\n Option('get_full_metadata', 'bool', True,\n _('Download all metadata (slow)'),\n _('Enable this option to gather all metadata available from Overdrive.')),\n )\n\n config_help_message = '<p>'+_('Additional metadata can be taken from Overdrive\\'s book detail'\n ' page. This includes a limited set of tags used by libraries, comments, language,'\n ' and the e-book ISBN. Collecting this data is disabled by default due to the extra'\n ' time required. Check the download all metadata option below to'\n ' enable downloading this data.')\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=30):\n ovrdrv_id = identifiers.get('overdrive', None)\n isbn = identifiers.get('isbn', None)\n\n br = self.browser\n ovrdrv_data = self.to_ovrdrv_data(br, log, title, authors, ovrdrv_id)\n if ovrdrv_data:\n title = ovrdrv_data[8]\n authors = ovrdrv_data[6]\n mi = Metadata(title, authors)\n self.parse_search_results(ovrdrv_data, mi)\n if ovrdrv_id is None:\n ovrdrv_id = ovrdrv_data[7]\n\n if self.prefs['get_full_metadata']:\n self.get_book_detail(br, ovrdrv_data[1], mi, ovrdrv_id, log)\n\n if isbn is not None:\n self.cache_isbn_to_identifier(isbn, ovrdrv_id)\n\n result_queue.put(mi)\n\n return None\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n import mechanize\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n\n ovrdrv_id = identifiers.get('overdrive', None)\n br = self.browser\n req = mechanize.Request(cached_url)\n if ovrdrv_id is not None:\n referer = self.get_base_referer()+'ContentDetails-Cover.htm?ID='+ovrdrv_id\n req.add_header('referer', referer)\n\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(req, timeout=timeout).read()\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n ovrdrv_id = identifiers.get('overdrive', None)\n if ovrdrv_id is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n ovrdrv_id = self.cached_isbn_to_identifier(isbn)\n if ovrdrv_id is not None:\n url = self.cached_identifier_to_cover_url(ovrdrv_id)\n\n return url\n # }}}\n\n def get_base_referer(self): # to be used for passing referrer headers to cover download\n choices = [\n 'https://overdrive.chipublib.org/82DC601D-7DDE-4212-B43A-09D821935B01/10/375/en/',\n 'https://emedia.clevnet.org/9D321DAD-EC0D-490D-BFD8-64AE2C96ECA8/10/241/en/',\n 'https://singapore.lib.overdrive.com/F11D55BE-A917-4D63-8111-318E88B29740/10/382/en/',\n 'https://ebooks.nypl.org/20E48048-A377-4520-BC43-F8729A42A424/10/257/en/',\n 'https://spl.lib.overdrive.com/5875E082-4CB2-4689-9426-8509F354AFEF/10/335/en/'\n ]\n return choices[random.randint(0, len(choices)-1)]\n\n def format_results(self, reserveid, od_title, subtitle, series, publisher, creators, thumbimage, worldcatlink, formatid):\n fix_slashes = re.compile(r'\\\\/')\n thumbimage = fix_slashes.sub('/', thumbimage)\n worldcatlink = fix_slashes.sub('/', worldcatlink)\n cover_url = re.sub('(?P<img>(Ima?g(eType-)?))200', '\\g<img>100', thumbimage)\n social_metadata_url = base_url+'TitleInfo.aspx?ReserveID='+reserveid+'&F ormatID='+formatid\n series_num = ''\n if not series:\n if subtitle:\n title = od_title+': '+subtitle\n else:\n title = od_title\n else:\n title = od_title\n m = re.search(\"([0-9]+$)\", subtitle)\n if m:\n series_num = float(m.group(1))\n return [cover_url, social_metadata_url, worldcatlink, series, series_num, publisher, creators, reserveid, title]\n\n def safe_query(self, br, query_url, post=''):\n '''\n The query must be initialized by loading an empty search results page\n this page attempts to set a cookie that Mechanize doesn't like\n copy the cookiejar to a separate instance and make a one-off request with the temp cookiejar\n '''\n import mechanize\n goodcookies = br._ua_handlers['_cookies'].cookiejar\n clean_cj = mechanize.CookieJar()\n cookies_to_copy = []\n for cookie in goodcookies:\n copied_cookie = copy.deepcopy(cookie)\n cookies_to_copy.append(copied_cookie)\n for copied_cookie in cookies_to_copy:\n clean_cj.set_cookie(copied_cookie)\n\n if post:\n br.open_novisit(query_url, post)\n else:\n br.open_novisit(query_url)\n\n br.set_cookiejar(clean_cj)\n\n def overdrive_search(self, br, log, q, title, author):\n import mechanize\n # re-initialize the cookiejar to so that it's clean\n clean_cj = mechanize.CookieJar()\n br.set_cookiejar(clean_cj)\n q_query = q+'default.aspx/SearchByKeyword'\n q_init_search = q+'SearchResults.aspx'\n # get first author as string - convert this to a proper cleanup function later\n author_tokens = list(self.get_author_tokens(author,\n only_first_author=True))\n title_tokens = list(self.get_title_tokens(title,\n strip_joiners=False, strip_subtitle=True))\n\n xref_q = ''\n if len(author_tokens) <= 1:\n initial_q = ' '.join(title_tokens)\n xref_q = '+'.join(author_tokens)\n else:\n initial_q = ' '.join(author_tokens)\n for token in title_tokens:\n if len(xref_q) < len(token):\n xref_q = token\n\n log.error('Initial query is %s'%initial_q)\n log.error('Cross reference query is %s'%xref_q)\n\n q_xref = q+'SearchResults.svc/GetResults?iDisplayLength=50&sSearch='+xref_q\n query = '{\"szKeyword\":\"'+initial_q+'\"}'\n\n # main query, requires specific Content Type header\n req = mechanize.Request(q_query)\n req.add_header('Content-Type', 'application/json; charset=utf-8')\n br.open_novisit(req, query)\n\n # initiate the search without messing up the cookiejar\n self.safe_query(br, q_init_search)\n\n # get the search results object\n results = False\n iterations = 0\n while results is False:\n iterations += 1\n xreq = mechanize.Request(q_xref)\n xreq.add_header('X-Requested-With', 'XMLHttpRequest')\n xreq.add_header('Referer', q_init_search)\n xreq.add_header('Accept', 'application/json, text/javascript, */*')\n raw = br.open_novisit(xreq).read()\n for m in re.finditer(unicode(r'\"iTotalDisplayRecords\"?P <displayrecords>\\d+).*?\"iTotalRecords\"?P<tota lrecords>\\d+)'), raw):\n if int(m.group('totalrecords')) == 0:\n return ''\n elif int(m.group('displayrecords')) >= 1:\n results = True\n elif int(m.group('totalrecords')) >= 1 and iterations < 3:\n if xref_q.find('+') != -1:\n xref_tokens = xref_q.split('+')\n xref_q = xref_tokens[0]\n for token in xref_tokens:\n if len(xref_q) < len(token):\n xref_q = token\n # log.error('rewrote xref_q, new query is '+xref_q)\n else:\n xref_q = ''\n q_xref = q+'SearchResults.svc/GetResults?iDisplayLength=50&sSearch='+xref_q\n\n return self.sort_ovrdrv_results(raw, log, title, title_tokens, author, author_tokens)\n\n def sort_ovrdrv_results(self, raw, log, title=None, title_tokens=None, author=None, author_tokens=None, ovrdrv_id=None):\n close_matches = []\n raw = re.sub('.*?\\[\\[(?P<content>.*?)\\]\\].*', '[[\\g<content>]]', raw)\n results = json.loads(raw)\n # log.error('raw results are:'+str(results))\n # The search results are either from a keyword search or a multi-format list from a single ID,\n # sort through the results for closest match/format\n if results:\n for reserveid, od_title, subtitle, edition, series, publisher, format, formatid, creators, \\\n thumbimage, shortdescription, worldcatlink, excerptlink, creatorfile, sorttitle, \\\n availabletolibrary, availabletoretailer, relevancyrank, unknown1, unknown2, unknown3 in results:\n # log.error(\"this record's title is \"+od_title+\", subtitle is \"+subtitle+\", author[s] are \"+creators+\", series is \"+series)\n if ovrdrv_id is not None and int(formatid) in [1, 50, 410, 900]:\n # log.error('overdrive id is not None, searching based on format type priority')\n return self.format_results(reserveid, od_title, subtitle, series, publisher,\n creators, thumbimage, worldcatlink, formatid)\n else:\n if creators:\n creators = creators.split(', ')\n\n # if an exact match in a preferred format occurs\n if ((author and creators and creators[0] == author[0]) or (not author and not creators)) and \\\n od_title.lower() == title.lower() and int(formatid) in [1, 50, 410, 900] and thumbimage:\n return self.format_results(reserveid, od_title, subtitle, series, publisher,\n creators, thumbimage, worldcatlink, formatid)\n else:\n close_title_match = False\n close_author_match = False\n for token in title_tokens:\n if od_title.lower().find(token.lower()) != -1:\n close_title_match = True\n else:\n close_title_match = False\n break\n for author in creators:\n for token in author_tokens:\n if author.lower().find(token.lower()) != -1:\n close_author_match = True\n else:\n close_author_match = False\n break\n if close_author_match:\n break\n if close_title_match and close_author_match and int(formatid) in [1, 50, 410, 900] and thumbimage:\n if subtitle and series:\n close_matches.insert(0, self.format_results(reserveid, od_title, subtitle, series,\n publisher, creators, thumbimage, worldcatlink, formatid))\n else:\n close_matches.append(self.format_results(reserveid , od_title, subtitle, series,\n publisher, creators, thumbimage, worldcatlink, formatid))\n\n elif close_title_match and close_author_match and int(formatid) in [1, 50, 410, 900]:\n close_matches.append(self.format_results(reserveid , od_title, subtitle, series,\n publisher, creators, thumbimage, worldcatlink, formatid))\n\n if close_matches:\n return close_matches[0]\n else:\n return ''\n else:\n return ''\n\n def overdrive_get_record(self, br, log, q, ovrdrv_id):\n import mechanize\n search_url = q+'SearchResults.aspx?ReserveID={'+ovrdrv_id+'}'\n results_url = q+'SearchResults.svc/GetResults?sEcho=1&iColumns=18&sColumns=ReserveID% 2CTitle%2CSubtitle%2CEdition%2CSeries%2CPublisher% 2CFormat%2CFormatID%2CCreators%2CThumbImage%2CShor tDescription%2CWorldCatLink%2CExcerptLink%2CCreato rFile%2CSortTitle%2CAvailableToLibrary%2CAvailable ToRetailer%2CRelevancyRank&iDisplayStart=0&iDispla yLength=10&sSearch=&bEscapeRegex=true&iSortingCols =1&iSortCol_0=17&sSortDir_0=asc' # noqa\n\n # re-initialize the cookiejar to so that it's clean\n clean_cj = mechanize.CookieJar()\n br.set_cookiejar(clean_cj)\n # get the base url to set the proper session cookie\n br.open_novisit(q)\n\n # initialize the search\n self.safe_query(br, search_url)\n\n # get the results\n req = mechanize.Request(results_url)\n req.add_header('X-Requested-With', 'XMLHttpRequest')\n req.add_header('Referer', search_url)\n req.add_header('Accept', 'application/json, text/javascript, */*')\n raw = br.open_novisit(req)\n raw = str(list(raw))\n clean_cj = mechanize.CookieJar()\n br.set_cookiejar(clean_cj)\n return self.sort_ovrdrv_results(raw, log, None, None, None, ovrdrv_id)\n\n def find_ovrdrv_data(self, br, log, title, author, isbn, ovrdrv_id=None):\n q = base_url\n if ovrdrv_id is None:\n return self.overdrive_search(br, log, q, title, author)\n else:\n return self.overdrive_get_record(br, log, q, ovrdrv_id)\n\n def to_ovrdrv_data(self, br, log, title=None, author=None, ovrdrv_id=None):\n '''\n Takes either a title/author combo or an Overdrive ID. One of these\n two must be passed to this function.\n '''\n if ovrdrv_id is not None:\n with cache_lock:\n ans = ovrdrv_data_cache.get(ovrdrv_id, None)\n if ans:\n return ans\n elif ans is False:\n return None\n else:\n ovrdrv_data = self.find_ovrdrv_data(br, log, title, author, ovrdrv_id)\n else:\n try:\n ovrdrv_data = self.find_ovrdrv_data(br, log, title, author, ovrdrv_id)\n except:\n import traceback\n traceback.print_exc()\n ovrdrv_data = None\n with cache_lock:\n ovrdrv_data_cache[ovrdrv_id] = ovrdrv_data if ovrdrv_data else False\n\n return ovrdrv_data if ovrdrv_data else False\n\n def parse_search_results(self, ovrdrv_data, mi):\n '''\n Parse the formatted search results from the initial Overdrive query and\n add the values to the metadta.\n\n The list object has these values:\n [cover_url[0], social_metadata_url[1], worldcatlink[2], series[3], series_num[4],\n publisher[5], creators[6], reserveid[7], title[8]]\n\n '''\n ovrdrv_id = ovrdrv_data[7]\n mi.set_identifier('overdrive', ovrdrv_id)\n\n if len(ovrdrv_data[3]) > 1:\n mi.series = ovrdrv_data[3]\n if ovrdrv_data[4]:\n try:\n mi.series_index = float(ovrdrv_data[4])\n except:\n pass\n mi.publisher = ovrdrv_data[5]\n mi.authors = ovrdrv_data[6]\n mi.title = ovrdrv_data[8]\n cover_url = ovrdrv_data[0]\n if cover_url:\n self.cache_identifier_to_cover_url(ovrdrv_id,\n cover_url)\n\n def get_book_detail(self, br, metadata_url, mi, ovrdrv_id, log):\n from lxml import html\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.soupparser import fromstring\n from calibre.library.comments import sanitize_comments_html\n\n try:\n raw = br.open_novisit(metadata_url).read()\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and \\\n e.getcode() == 404:\n return False\n raise\n raw = xml_to_unicode(raw, strip_encoding_pats=True,\n resolve_entities=True)[0]\n try:\n root = fromstring(raw)\n except:\n return False\n\n pub_date = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblPubDate']/text()\")\n lang = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblLanguage']/text()\")\n subjects = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblSubjects']/text()\")\n ebook_isbn = root.xpath(\"//td/label[@id='ctl00_ContentPlaceHolder1_lblIdentifier']/text()\")\n desc = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblDescription']/ancestor::div[1]\")\n\n if pub_date:\n from calibre.utils.date import parse_date\n try:\n mi.pubdate = parse_date(pub_date[0].strip())\n except:\n pass\n if lang:\n lang = lang[0].strip().lower()\n lang = {'english':'eng', 'french':'fra', 'german':'deu',\n 'spanish':'spa'}.get(lang, None)\n if lang:\n mi.language = lang\n\n if ebook_isbn:\n # print \"ebook isbn is \"+str(ebook_isbn[0])\n isbn = check_isbn(ebook_isbn[0].strip())\n if isbn:\n self.cache_isbn_to_identifier(isbn, ovrdrv_id)\n mi.isbn = isbn\n if subjects:\n mi.tags = [tag.strip() for tag in subjects[0].split(',')]\n\n if desc:\n desc = desc[0]\n desc = html.tostring(desc, method='html', encoding=unicode).strip()\n # remove all attributes from tags\n desc = re.sub(r'<([a-zA-Z0-9]+)\\s[^>]+>', r'<\\1>', desc)\n # Remove comments\n desc = re.sub(r'(?s)<!--.*?-->', '', desc)\n mi.comments = sanitize_comments_html(desc)\n\n return None\n\n\nif __name__ == '__main__':\n # To run these test use:\n # calibre-debug -e src/calibre/ebooks/metadata/sources/overdrive.py\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n title_test, authors_test)\n test_identify_plugin(OverDrive.name,\n [\n\n (\n {'title':'The Sea Kings Daughter',\n 'authors':['Elizabeth Peters']},\n [title_test('The Sea Kings Daughter', exact=False),\n authors_test(['Elizabeth Peters'])]\n ),\n\n (\n {'title': 'Elephants', 'authors':['Agatha']},\n [title_test('Elephants Can Remember', exact=False),\n authors_test(['Agatha Christie'])]\n ),\n ])\n",
"big_book_search": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2013, Kovid Goyal <kovid@kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nfrom calibre.ebooks.metadata.sources.base import Source, Option\n\n\ndef get_urls(br, tokens):\n from urllib import quote_plus\n from mechanize import Request\n from lxml import html\n escaped = [quote_plus(x.encode('utf-8')) for x in tokens if x and x.strip()]\n q = b'+'.join(escaped)\n url = 'http://bigbooksearch.com/books/'+q\n br.open(url).read()\n req = Request('http://bigbooksearch.com/query.php?SearchIndex=books&Keywords=%s&ItemPage=1 '%q)\n req.add_header('X-Requested-With', 'XMLHttpRequest')\n req.add_header('Referer', url)\n raw = br.open(req).read()\n root = html.fromstring(raw.decode('utf-8'))\n urls = [i.get('src') for i in root.xpath('//img[@src]')]\n return urls\n\n\nclass BigBookSearch(Source):\n\n name = 'Big Book Search'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads multiple book covers from Amazon. Useful to find alternate covers.')\n capabilities = frozenset(['cover'])\n can_get_multiple_covers = True\n options = (Option('max_covers', 'number', 5, _('Maximum number of covers to get'),\n _('The maximum number of covers to process from the search result')),\n )\n supports_gzip_transfer_encoding = True\n\n def download_cover(self, log, result_queue, abort,\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n if not title:\n return\n br = self.browser\n tokens = tuple(self.get_title_tokens(title)) + tuple(self.get_author_tokens(authors))\n urls = get_urls(br, tokens)\n self.download_multiple_covers(title, authors, urls, get_best_cover, timeout, result_queue, abort, log)\n\n\ndef test():\n from calibre import browser\n import pprint\n br = browser()\n urls = get_urls(br, ['consider', 'phlebas', 'banks'])\n pprint.pprint(urls)\n\n\nif __name__ == '__main__':\n test()\n",
"ozon": "#!/usr/bin/env python2\n# -*- coding: utf-8 -*-\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL 3'\n__copyright__ = '2011-2013 Roman Mukhin <ramses_ru at hotmail.com>'\n__docformat__ = 'restructuredtext en'\n\n# To ensure bugfix and development of this metadata source please donate\n# bitcoins to 1E6CRSLY1uNstcZjLYZBHRVs1CPKbdi4ep\n\nimport re\nfrom Queue import Queue, Empty\n\nfrom calibre import as_unicode, replace_entities\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source, Option\nfrom calibre.ebooks.metadata.book.base import Metadata\n\n\nclass Ozon(Source):\n name = 'OZON.ru'\n minimum_calibre_version = (2, 80, 0)\n version = (1, 1, 0)\n description = _('Downloads metadata and covers from OZON.ru (updated)')\n\n capabilities = frozenset(['identify', 'cover'])\n\n touched_fields = frozenset(['title', 'authors', 'identifier:isbn', 'identifierzon',\n 'publisher', 'pubdate', 'comments', 'series', 'rating', 'languages'])\n # Test purpose only, test function does not like when sometimes some filed are empty\n # touched_fields = frozenset(['title', 'authors', 'identifier:isbn', 'identifierzon',\n # 'publisher', 'pubdate', 'comments'])\n\n supports_gzip_transfer_encoding = True\n has_html_comments = True\n\n ozon_url = 'https://www.ozon.ru'\n\n # match any ISBN10/13. From \"Regular Expressions Cookbook\"\n isbnPattern = r'(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}|' \\\n '[-0-9X ]{13}|[0-9X]{10})(?:97[89][- ]?)?[0-9]{1,5}[- ]?' \\\n '(?:[0-9]+[- ]?){2}[0-9X]'\n isbnRegex = re.compile(isbnPattern)\n\n optkey_strictmatch = 'strict_result_match'\n options = (\n Option(optkey_strictmatch, 'bool', False,\n _('Filter out less relevant hits from the search results'),\n _(\n 'Improve search result by removing less relevant hits. It can be useful to refine the search when there are many matches')),\n )\n\n def get_book_url(self, identifiers): # {{{\n import urllib2\n ozon_id = identifiers.get('ozon', None)\n res = None\n if ozon_id:\n # no affiliateId is used in search/detail\n url = '{}/context/detail/id/{}'.format(self.ozon_url, urllib2.quote(ozon_id), _get_affiliateId())\n res = ('ozon', ozon_id, url)\n return res\n\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}): # {{{\n from urllib import quote_plus\n\n # div_book -> search only books, ebooks and audio books\n search_url = self.ozon_url + '/?context=search&group=div_book&text='\n\n # for ozon.ru search we have to format ISBN with '-'\n isbn = _format_isbn(log, identifiers.get('isbn', None))\n if isbn and '-' not in isbn:\n log.error(\n \"%s requires formatted ISBN for search. %s cannot be formated - removed. (only Russian ISBN format is supported now)\"\n % (self.name, isbn))\n isbn = None\n\n ozonid = identifiers.get('ozon', None)\n\n qItems = {ozonid, isbn}\n\n # Added Russian variant of 'Unknown'\n unk = [_('Unknown').upper(), '\u041d\u0435\u0438\u0437\u0432.'.upper(), icu_upper('\u041d\u0435\u0438\u0437\u0432.')]\n\n # use only ozonid if specified otherwise ozon.ru does not like a combination\n if not ozonid:\n if title and title not in unk:\n qItems.add(title)\n\n if authors:\n for auth in authors:\n if icu_upper(auth) not in unk:\n qItems.add(auth)\n\n qItems.discard(None)\n qItems.discard('')\n searchText = u' '.join(qItems).strip()\n\n if isinstance(searchText, unicode):\n searchText = searchText.encode('utf-8')\n if not searchText:\n return None\n\n search_url += quote_plus(searchText)\n log.debug(u'search url: %s' % search_url)\n return search_url\n\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None,\n identifiers={}, timeout=90): # {{{\n from calibre.ebooks.chardet import xml_to_unicode\n from HTMLParser import HTMLParser\n from lxml import etree, html\n import json\n\n if not self.is_configured():\n return\n query = self.create_query(log, title=title, authors=authors, identifiers=identifiers)\n if not query:\n err = u'Insufficient metadata to construct query'\n log.error(err)\n return err\n\n try:\n raw = self.browser.open_novisit(query).read()\n except Exception as e:\n log.exception(u'Failed to make identify query: %r' % query)\n return as_unicode(e)\n\n try:\n doc = html.fromstring(xml_to_unicode(raw, verbose=True)[0])\n entries_block = doc.xpath(u'//div[@class=\"bSearchResult\"]')\n\n # log.debug(u'HTML: %s' % xml_to_unicode(raw, verbose=True)[0])\n\n if entries_block:\n entries = doc.xpath(u'//div[contains(@itemprop, \"itemListElement\")]')\n # log.debug(u'entries_block')\n # for entry in entries:\n # log.debug('entries %s' % entree.tostring(entry))\n metadata = self.get_metadata(log, entries, title, authors, identifiers)\n self.get_all_details(log, metadata, abort, result_queue, identifiers, timeout)\n else:\n # Redirect page: trying to extract ozon_id from javascript data\n h = HTMLParser()\n entry_string = (h.unescape(etree.tostring(doc, pretty_print=True, encoding=unicode)))\n json_pat = re.compile(u'dataLayer\\s*=\\s*(.+)?;')\n json_info = re.search(json_pat, entry_string)\n jsondata = json_info.group(1) if json_info else None\n if jsondata:\n idx = jsondata.rfind('}]')\n if idx > 0:\n jsondata = jsondata[:idx + 2]\n\n # log.debug(u'jsondata: %s' % jsondata)\n dataLayer = json.loads(jsondata) if jsondata else None\n\n ozon_id = None\n if dataLayer and dataLayer[0] and 'ecommerce' in dataLayer[0]:\n jsproduct = dataLayer[0]['ecommerce']['detail']['products'][0]\n ozon_id = as_unicode(jsproduct['id'])\n entry_title = as_unicode(jsproduct['name'])\n\n log.debug(u'ozon_id %s' % ozon_id)\n log.debug(u'entry_title %s' % entry_title)\n\n if ozon_id:\n metadata = self.to_metadata_for_single_entry(log, ozon_id, entry_title, authors)\n identifiers['ozon'] = ozon_id\n self.get_all_details(log, [metadata], abort, result_queue, identifiers, timeout, cachedPagesDict={})\n\n if not ozon_id:\n log.error('No SearchResults in Ozon.ru response found!')\n\n except Exception as e:\n log.exception('Failed to parse identify results')\n return as_unicode(e)\n\n # }}}\n\n def to_metadata_for_single_entry(self, log, ozon_id, title, authors): # {{{\n\n # parsing javascript data from the redirect page\n mi = Metadata(title, authors)\n mi.identifiers = {'ozon': ozon_id}\n\n return mi\n\n # }}}\n\n def get_metadata(self, log, entries, title, authors, identifiers): # {{{\n # some book titles have extra characters like this\n\n reRemoveFromTitle = re.compile(r'[?!:.,;+-/&%\"\\'=]')\n\n title = unicode(title).upper() if title else ''\n if reRemoveFromTitle:\n title = reRemoveFromTitle.sub('', title)\n authors = map(_normalizeAuthorNameWithInitials,\n map(unicode.upper, map(unicode, authors))) if authors else None\n\n ozon_id = identifiers.get('ozon', None)\n # log.debug(u'ozonid: ', ozon_id)\n\n unk = unicode(_('Unknown')).upper()\n\n if title == unk:\n title = None\n\n if authors == [unk] or authors == []:\n authors = None\n\n def in_authors(authors, miauthors):\n for author in authors:\n for miauthor in miauthors:\n # log.debug(u'=> %s <> %s'%(author, miauthor))\n if author in miauthor:\n return True\n return None\n\n def calc_source_relevance(mi): # {{{\n relevance = 0\n if title:\n mititle = unicode(mi.title).upper() if mi.title else ''\n\n if reRemoveFromTitle:\n mititle = reRemoveFromTitle.sub('', mititle)\n\n if title in mititle:\n relevance += 3\n elif mititle:\n # log.debug(u'!!%s!'%mititle)\n relevance -= 3\n else:\n relevance += 1\n\n if authors:\n miauthors = map(unicode.upper, map(unicode, mi.authors)) if mi.authors else []\n # log.debug('Authors %s vs miauthors %s'%(','.join(authors), ','.join(miauthors)))\n\n if (in_authors(authors, miauthors)):\n relevance += 3\n elif u''.join(miauthors):\n # log.debug(u'!%s!'%u'|'.join(miauthors))\n relevance -= 3\n else:\n relevance += 1\n\n if ozon_id:\n mozon_id = mi.identifiers['ozon']\n if ozon_id == mozon_id:\n relevance += 100\n\n if relevance < 0:\n relevance = 0\n return relevance\n\n # }}}\n\n strict_match = self.prefs[self.optkey_strictmatch]\n metadata = []\n for entry in entries:\n\n mi = self.to_metadata(log, entry)\n relevance = calc_source_relevance(mi)\n # TODO findout which is really used\n mi.source_relevance = relevance\n mi.relevance_in_source = relevance\n\n if not strict_match or relevance > 0:\n # getting rid of a random book that shows up in results\n if not (mi.title == 'Unknown'):\n metadata.append(mi)\n # log.debug(u'added metadata %s %s.'%(mi.title, mi.authors))\n else:\n log.debug(u'skipped metadata title: %s, authors: %s. (does not match the query - relevance score: %s)'\n % (mi.title, u' '.join(mi.authors), relevance))\n return metadata\n\n # }}}\n\n def get_all_details(self, log, metadata, abort, result_queue, identifiers, timeout, cachedPagesDict={}): # {{{\n\n req_isbn = identifiers.get('isbn', None)\n\n for mi in metadata:\n if abort.is_set():\n break\n try:\n ozon_id = mi.identifiers['ozon']\n\n try:\n self.get_book_details(log, mi, timeout, cachedPagesDict[\n ozon_id] if cachedPagesDict and ozon_id in cachedPagesDict else None)\n except:\n log.exception(u'Failed to get details for metadata: %s' % mi.title)\n\n all_isbns = getattr(mi, 'all_isbns', [])\n if req_isbn and all_isbns and check_isbn(req_isbn) not in all_isbns:\n log.debug(u'skipped, no requested ISBN %s found' % req_isbn)\n continue\n\n for isbn in all_isbns:\n self.cache_isbn_to_identifier(isbn, ozon_id)\n\n if mi.ozon_cover_url:\n self.cache_identifier_to_cover_url(ozon_id, mi.ozon_cover_url)\n\n self.clean_downloaded_metadata(mi)\n result_queue.put(mi)\n\n except:\n log.exception(u'Failed to get details for metadata: %s' % mi.title)\n\n # }}}\n\n def to_metadata(self, log, entry): # {{{\n title = unicode(entry.xpath(u'normalize-space(.//div[@itemprop=\"name\"][1]/text())'))\n # log.debug(u'Title: -----> %s' % title)\n\n author = unicode(entry.xpath(u'normalize-space(.//div[contains(@class, \"mPerson\")])'))\n # log.debug(u'Author: -----> %s' % author)\n\n norm_authors = map(_normalizeAuthorNameWithInitials, map(unicode.strip, unicode(author).split(u',')))\n mi = Metadata(title, norm_authors)\n\n ozon_id = entry.get('data-href').split('/')[-2]\n\n if ozon_id:\n mi.identifiers = {'ozon': ozon_id}\n # log.debug(u'ozon_id: -----> %s' % ozon_id)\n\n mi.ozon_cover_url = None\n cover = entry.xpath(u'normalize-space(.//img[1]/@src)')\n log.debug(u'cover: -----> %s' % cover)\n if cover:\n mi.ozon_cover_url = _translateToBigCoverUrl(cover)\n # log.debug(u'mi.ozon_cover_url: -----> %s' % mi.ozon_cover_url)\n\n pub_year = None\n pub_year_block = entry.xpath(u'.//div[@class=\"bOneTileProperty\"]/text()')\n year_pattern = re.compile('\\d{4}')\n if pub_year_block:\n pub_year = re.search(year_pattern, pub_year_block[0])\n if pub_year:\n mi.pubdate = toPubdate(log, pub_year.group())\n # log.debug('pubdate %s' % mi.pubdate)\n\n mi.rating = self.get_rating(log, entry)\n # if not mi.rating:\n # log.debug('No rating found. ozon_id:%s'%ozon_id)\n\n return mi\n\n # }}}\n\n def get_rating(self, log, entry): # {{{\n # log.debug(entry)\n ozon_rating = None\n try:\n xp_rating_template = u'boolean(.//div[contains(@class, \"bStars\") and contains(@class, \"%s\")])'\n rating = None\n if entry.xpath(xp_rating_template % 'm5'):\n rating = 5.\n elif entry.xpath(xp_rating_template % 'm4'):\n rating = 4.\n elif entry.xpath(xp_rating_template % 'm3'):\n rating = 3.\n elif entry.xpath(xp_rating_template % 'm2'):\n rating = 2.\n elif entry.xpath(xp_rating_template % 'm1'):\n rating = 1.\n if rating:\n # 'rating', A floating point number between 0 and 10\n # OZON raion N of 5, calibre of 10, but there is a bug? in identify\n ozon_rating = float(rating)\n except:\n pass\n return ozon_rating\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n ozon_id = identifiers.get('ozon', None)\n if ozon_id is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n ozon_id = self.cached_isbn_to_identifier(isbn)\n if ozon_id is not None:\n url = self.cached_identifier_to_cover_url(ozon_id)\n return url\n\n # }}}\n\n def download_cover(self, log, result_queue, abort, title=None, authors=None, identifiers={}, timeout=30,\n get_best_cover=False): # {{{\n\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.debug('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors, identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(titl e=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n\n log.debug('Downloading cover from:', cached_url)\n try:\n cdata = self.browser.open_novisit(cached_url, timeout=timeout).read()\n if cdata:\n result_queue.put((self, cdata))\n except Exception as e:\n log.exception(u'Failed to download cover from: %s' % cached_url)\n return as_unicode(e)\n\n # }}}\n\n def get_book_details(self, log, metadata, timeout, cachedPage): # {{{\n from lxml import etree, html\n from calibre.ebooks.chardet import xml_to_unicode\n\n if not cachedPage:\n url = self.get_book_url(metadata.get_identifiers())[2]\n # log.debug(u'book_details_url', url)\n\n raw = self.browser.open_novisit(url, timeout=timeout).read()\n fulldoc = html.fromstring(xml_to_unicode(raw, verbose=True)[0])\n else:\n fulldoc = cachedPage\n log.debug(u'book_details -> using cached page')\n\n fullString = etree.tostring(fulldoc)\n doc = fulldoc.xpath(u'//div[@class=\"bDetailPage\"][1]')[0]\n\n # series \u0421\u0435\u0440\u0438\u044f/\u0421\u0435\u0440\u0438\u0438\n series_elem = doc.xpath(u'//div[contains(text(), \"\u0421\u0435\u0440\u0438\")]')\n if series_elem:\n series_text_elem = series_elem[0].getnext()\n metadata.series = series_text_elem.xpath(u'.//a/text()')[0]\n log.debug(u'**Seria: ', metadata.series)\n\n isbn = None\n isbn_elem = doc.xpath(u'//div[contains(text(), \"ISBN\")]')\n if isbn_elem:\n isbn = isbn_elem[0].getnext().xpath(u'normalize-space(./text())')\n metadata.identifiers['isbn'] = isbn\n\n # get authors/editors if no authors are available\n authors_joined = ','.join(metadata.authors)\n\n if authors_joined == '' or authors_joined == \"Unknown\":\n authors_from_detail = []\n editor_elem = doc.xpath(u'//div[contains(text(), \"\u0420\u0435\u0434\u0430\u043a\u0442\u043e\u0440 \")]')\n if editor_elem:\n editor = editor_elem[0].getnext().xpath(u'.//a/text()')[0]\n authors_from_detail.append(editor + u' (\u0440\u0435\u0434.)')\n authors_elem = doc.xpath(u'//div[contains(text(), \"\u0410\u0432\u0442\u043e\u0440\")]')\n if authors_elem:\n authors = authors_elem[0].getnext().xpath(u'.//a/text()') # list\n authors_from_detail.extend(authors)\n if len(authors_from_detail) > 0:\n metadata.authors = authors_from_detail\n\n cover = doc.xpath('.//img[contains(@class, \"fullImage\")]/@src')[0]\n metadata.ozon_cover_url = _translateToBigCoverUrl(cover)\n\n publishers = None\n publishers_elem = doc.xpath(u'//div[contains(text(), \"\u0418\u0437\u0434\u0430\u0442\u0435\u043b\u044c \")]')\n if publishers_elem:\n publishers_elem = publishers_elem[0].getnext()\n publishers = publishers_elem.xpath(u'.//a/text()')[0]\n\n if publishers:\n metadata.publisher = publishers\n\n displ_lang = None\n langs = None\n langs_elem = doc.xpath(u'//div[contains(text(), \"\u0437\u044b\u043a\")]')\n if langs_elem:\n langs_elem = langs_elem[0].getnext()\n langs = langs_elem.xpath(u'text()')[0].strip() if langs_elem else None\n if langs:\n lng_splt = langs.split(u',')\n if lng_splt:\n displ_lang = lng_splt[0].strip()\n # log.debug(u'displ_lang1: ', displ_lang)\n metadata.language = _translageLanguageToCode(displ_lang)\n # log.debug(u'Language: ', metadata.language)\n\n # can be set before from xml search response\n if not metadata.pubdate:\n pubdate_elem = doc.xpath(u'//div[contains(text(), \"\u0413\u043e\u0434 \u0432\u044b\u043f\u0443\u0441\u043a\u0430\")]')\n if pubdate_elem:\n pubYear = pubdate_elem[0].getnext().xpath(u'text()')[0].strip()\n if pubYear:\n matcher = re.search(r'\\d{4}', pubYear)\n if matcher:\n metadata.pubdate = toPubdate(log, matcher.group(0))\n # log.debug(u'Pubdate: ', metadata.pubdate)\n\n # comments, from Javascript data\n beginning = fullString.find(u'FirstBlock')\n end = fullString.find(u'}', beginning)\n comments = unicode(fullString[beginning + 75:end - 1]).decode(\"unicode-escape\")\n metadata.comments = replace_entities(comments, 'utf-8')\n # }}}\n\n\ndef _verifyISBNIntegrity(log, isbn): # {{{\n # Online ISBN-Check http://www.isbn-check.de/\n res = check_isbn(isbn)\n if not res:\n log.error(u'ISBN integrity check failed for \"%s\"' % isbn)\n return res is not None\n\n\n# }}}\n\n# TODO: make customizable\ndef _translateToBigCoverUrl(coverUrl): # {{{\n # //static.ozone.ru/multimedia/c200/1005748980.jpg\n # http://www.ozon.ru/multimedia/books_covers/1009493080.jpg\n m = re.match(r'.+\\/([^\\.\\\\]+).+$', coverUrl)\n if m:\n coverUrl = 'https://www.ozon.ru/multimedia/books_covers/' + m.group(1) + '.jpg'\n return coverUrl\n\n\n# }}}\n\ndef _get_affiliateId(): # {{{\n import random\n\n aff_id = 'romuk'\n # Use Kovid's affiliate id 30% of the time.\n if random.randint(1, 10) in (1, 2, 3):\n aff_id = 'kovidgoyal'\n return aff_id\n\n\n# }}}\n\ndef _format_isbn(log, isbn): # {{{\n # for now only RUS ISBN are supported\n # http://ru.wikipedia.org/wiki/ISBN_\u0440\u043e\u0441\u0441\u0438\u0439\u0441\u0 43a\u0438\u0445_\u0438\u0437\u0434\u0430\u0442\u04 35\u043b\u044c\u0441\u0442\u0432\n isbn_pat = re.compile(r\"\"\"\n ^\n (\\d{3})? # match GS1 Prefix for ISBN13\n (5) # group identifier for Russian-speaking countries\n ( # begin variable length for Publisher\n [01]\\d{1}| # 2x\n [2-6]\\d{2}| # 3x\n 7\\d{3}| # 4x (starting with 7)\n 8[0-4]\\d{2}| # 4x (starting with 8)\n 9[2567]\\d{2}| # 4x (starting with 9)\n 99[26]\\d{1}| # 4x (starting with 99)\n 8[5-9]\\d{3}| # 5x (starting with 8)\n 9[348]\\d{3}| # 5x (starting with 9)\n 900\\d{2}| # 5x (starting with 900)\n 91[0-8]\\d{2}| # 5x (starting with 91)\n 90[1-9]\\d{3}| # 6x (starting with 90)\n 919\\d{3}| # 6x (starting with 919)\n 99[^26]\\d{4} # 7x (starting with 99)\n ) # end variable length for Publisher\n (\\d+) # Title\n ([\\dX]) # Check digit\n $\n \"\"\", re.VERBOSE)\n\n res = check_isbn(isbn)\n if res:\n m = isbn_pat.match(res)\n if m:\n res = '-'.join([g for g in m.groups() if g])\n else:\n log.error('cannot format ISBN %s. Fow now only russian ISBNs are supported' % isbn)\n return res\n\n# }}}\n\n\ndef _translageLanguageToCode(displayLang): # {{{\n displayLang = unicode(displayLang).strip() if displayLang else None\n langTbl = {None: 'ru',\n u'\u0420\u0443\u0441\u0441\u043a\u0438\u0439': 'ru',\n u'\u041d\u0435\u043c\u0435\u0446\u043a\u0438\u0439 ': 'de',\n u'\u0410\u043d\u0433\u043b\u0438\u0439\u0441\u043a \u0438\u0439': 'en',\n u'\u0424\u0440\u0430\u043d\u0446\u0443\u0437\u0441 \u043a\u0438\u0439': 'fr',\n u'\u0418\u0442\u0430\u043b\u044c\u044f\u043d\u0441 \u043a\u0438\u0439': 'it',\n u'\u0418\u0441\u043f\u0430\u043d\u0441\u043a\u0438 \u0439': 'es',\n u'\u041a\u0438\u0442\u0430\u0439\u0441\u043a\u0438 \u0439': 'zh',\n u'\u042f\u043f\u043e\u043d\u0441\u043a\u0438\u0439 ': 'ja',\n u'\u0424\u0438\u043d\u0441\u043a\u0438\u0439': 'fi',\n u'\u041f\u043e\u043b\u044c\u0441\u043a\u0438\u0439 ': 'pl',\n u'\u0423\u043a\u0440\u0430\u0438\u043d\u0441\u043a \u0438\u0439': 'uk',}\n return langTbl.get(displayLang, None)\n\n\n# }}}\n\n# [\u0412.\u041f. \u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\u 043e\u0432 | \u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\u 043e\u0432 \u0412.\u041f.]-> \u0412. \u041f. B\u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\ u043e\u0432\ndef _normalizeAuthorNameWithInitials(name): # {{{\n res = name\n if name:\n re1 = u'^(?P<lname>\\S+)\\s+(?P<fname>[^\\d\\W]\\.)(?:\\s*(?P<mname>[^\\d\\W]\\.))?$'\n re2 = u'^(?P<fname>[^\\d\\W]\\.)(?:\\s*(?P<mname>[^\\d\\W]\\.))?\\s+(?P<lname>\\S+)$'\n matcher = re.match(re1, unicode(name), re.UNICODE)\n if not matcher:\n matcher = re.match(re2, unicode(name), re.UNICODE)\n\n if matcher:\n d = matcher.groupdict()\n res = ' '.join(x for x in (d['fname'], d['mname'], d['lname']) if x)\n return res\n\n\n# }}}\n\ndef toPubdate(log, yearAsString): # {{{\n from calibre.utils.date import parse_only_date\n res = None\n if yearAsString:\n try:\n res = parse_only_date(u\"01.01.\" + yearAsString)\n except:\n log.error('cannot parse to date %s' % yearAsString)\n return res\n\n\n# }}}\n\ndef _listToUnicodePrintStr(lst): # {{{\n return u'[' + u', '.join(unicode(x) for x in lst) + u']'\n\n\n# }}}\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug src/calibre/ebooks/metadata/sources/ozon.py\n # comment some touched_fields before run thoses tests\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n title_test, authors_test, isbn_test)\n\n test_identify_plugin(Ozon.name, [\n # (\n # {'identifiers':{}, 'title':u'\u041d\u043e\u0440\u0432\u0435\u0436\u04 41\u043a\u0438\u0439 \u044f\u0437\u044b\u043a: \u041f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u 0441\u043a\u0438\u0439 \u043a\u0443\u0440\u0441',\n # 'authors':[u'\u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a \u043e\u0432 \u0412.\u041f.', u'\u0413.\u0412. \u0428\u0430\u0442\u043a\u043e\u0432']},\n # [title_test(u'\u041d\u043e\u0440\u0432\u0435\u0436\ u0441\u043a\u0438\u0439 \u044f\u0437\u044b\u043a: \u041f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u 0441\u043a\u0438\u0439 \u043a\u0443\u0440\u0441', exact=True),\n # authors_test([u'\u0412. \u041f. \u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\u 043e\u0432', u'\u0413. \u0412. \u0428\u0430\u0442\u043a\u043e\u0432'])]\n # ),\n (\n {'identifiers': {'isbn': '9785916572629'}},\n [title_test(u'\u041d\u0430 \u0432\u0441\u0435 \u0447\u0435\u0442\u044b\u0440\u0435 \u0441\u0442\u043e\u0440\u043e\u043d\u044b', exact=True),\n authors_test([u'\u0410. \u0410. \u0413\u0438\u043b\u043b'])]\n ),\n (\n {'identifiers': {}, 'title': u'Der Himmel Kennt Keine Gunstlinge',\n 'authors': [u'Erich Maria Remarque']},\n [title_test(u'Der Himmel Kennt Keine Gunstlinge', exact=True),\n authors_test([u'Erich Maria Remarque'])]\n ),\n (\n {'identifiers': {}, 'title': u'\u041c\u0435\u0442\u0440\u043e 2033',\n 'authors': [u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e 2033', exact=False)]\n ),\n (\n {'identifiers': {'isbn': '9785170727209'}, 'title': u'\u041c\u0435\u0442\u0440\u043e 2033',\n 'authors': [u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e 2033', exact=True),\n authors_test([u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']),\n isbn_test('9785170727209')]\n ),\n (\n {'identifiers': {'isbn': '5-699-13613-4'}, 'title': u'\u041c\u0435\u0442\u0440\u043e 2033',\n 'authors': [u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e 2033', exact=True),\n authors_test([u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439'])]\n ),\n (\n {'identifiers': {}, 'title': u'\u041c\u0435\u0442\u0440\u043e',\n 'authors': [u'\u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a \u0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e', exact=False)]\n ),\n])\n# }}}\n",
"google": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\n# License: GPLv3 Copyright: 2011, Kovid Goyal <kovid at kovidgoyal.net>\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport hashlib\nimport re\nimport time\nfrom Queue import Empty, Queue\n\nfrom calibre import as_unicode\nfrom calibre.ebooks.chardet import xml_to_unicode\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.book.base import Metadata\nfrom calibre.ebooks.metadata.sources.base import Source\nfrom calibre.utils.cleantext import clean_ascii_chars\nfrom calibre.utils.localization import canonicalize_lang\n\nNAMESPACES = {\n 'openSearch': 'http://a9.com/-/spec/opensearchrss/1.0/',\n 'atom': 'http://www.w3.org/2005/Atom',\n 'dc': 'http://purl.org/dc/terms',\n 'gd': 'http://schemas.google.com/g/2005'\n}\n\n\ndef get_details(browser, url, timeout): # {{{\n try:\n raw = browser.open_novisit(url, timeout=timeout).read()\n except Exception as e:\n gc = getattr(e, 'getcode', lambda: -1)\n if gc() != 403:\n raise\n # Google is throttling us, wait a little\n time.sleep(2)\n raw = browser.open_novisit(url, timeout=timeout).read()\n\n return raw\n\n\n# }}}\n\nxpath_cache = {}\n\n\ndef XPath(x):\n ans = xpath_cache.get(x)\n if ans is None:\n from lxml import etree\n ans = xpath_cache[x] = etree.XPath(x, namespaces=NAMESPACES)\n return ans\n\n\ndef cleanup_title(title):\n if ':' in title:\n return title.partition(':')[0]\n return re.sub(r'(.+?) \\(.+\\)', r'\\1', title)\n\n\ndef to_metadata(browser, log, entry_, timeout): # {{{\n from lxml import etree\n\n # total_results = XPath('//openSearch:totalResults')\n # start_index = XPath('//openSearch:startIndex')\n # items_per_page = XPath('//openSearch:itemsPerPage')\n entry = XPath('//atom:entry')\n entry_id = XPath('descendant::atom:id')\n creator = XPath('descendant::dc:creator')\n identifier = XPath('descendant::dc:identifier')\n title = XPath('descendant::dc:title')\n date = XPath('descendant::dc:date')\n publisher = XPath('descendant::dcublisher')\n subject = XPath('descendant::dc:subject')\n description = XPath('descendant::dc:description')\n language = XPath('descendant::dc:language')\n\n # print(etree.tostring(entry_, pretty_print=True))\n\n def get_text(extra, x):\n try:\n ans = x(extra)\n if ans:\n ans = ans[0].text\n if ans and ans.strip():\n return ans.strip()\n except:\n log.exception('Programming error:')\n return None\n\n id_url = entry_id(entry_)[0].text\n google_id = id_url.split('/')[-1]\n title_ = ': '.join([x.text for x in title(entry_)]).strip()\n authors = [x.text.strip() for x in creator(entry_) if x.text]\n if not authors:\n authors = [_('Unknown')]\n if not id_url or not title:\n # Silently discard this entry\n return None\n\n mi = Metadata(title_, authors)\n mi.identifiers = {'google': google_id}\n try:\n raw = get_details(browser, id_url, timeout)\n feed = etree.fromstring(\n xml_to_unicode(clean_ascii_chars(raw), strip_encoding_pats=True)[0]\n )\n extra = entry(feed)[0]\n except:\n log.exception('Failed to get additional details for', mi.title)\n return mi\n\n mi.comments = get_text(extra, description)\n lang = canonicalize_lang(get_text(extra, language))\n if lang:\n mi.language = lang\n mi.publisher = get_text(extra, publisher)\n\n # ISBN\n isbns = []\n for x in identifier(extra):\n t = str(x.text).strip()\n if t[:5].upper() in ('ISBN:', 'LCCN:', 'OCLC:'):\n if t[:5].upper() == 'ISBN:':\n t = check_isbn(t[5:])\n if t:\n isbns.append(t)\n if isbns:\n mi.isbn = sorted(isbns, key=len)[-1]\n mi.all_isbns = isbns\n\n # Tags\n try:\n btags = [x.text for x in subject(extra) if x.text]\n tags = []\n for t in btags:\n atags = [y.strip() for y in t.split('/')]\n for tag in atags:\n if tag not in tags:\n tags.append(tag)\n except:\n log.exception('Failed to parse tags:')\n tags = []\n if tags:\n mi.tags = [x.replace(',', ';') for x in tags]\n\n # pubdate\n pubdate = get_text(extra, date)\n if pubdate:\n from calibre.utils.date import parse_date, utcnow\n try:\n default = utcnow().replace(day=15)\n mi.pubdate = parse_date(pubdate, assume_utc=True, default=default)\n except:\n log.error('Failed to parse pubdate %r' % pubdate)\n\n # Cover\n mi.has_google_cover = None\n for x in extra.xpath(\n '//*[@href and @rel=\"http://schemas.google.com/books/2008/thumbnail\"]'\n ):\n mi.has_google_cover = x.get('href')\n break\n\n return mi\n\n\n# }}}\n\n\nclass GoogleBooks(Source):\n\n name = 'Google'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads metadata and covers from Google Books')\n\n capabilities = frozenset({'identify', 'cover'})\n touched_fields = frozenset({\n 'title', 'authors', 'tags', 'pubdate', 'comments', 'publisher',\n 'identifier:isbn', 'identifier:google', 'languages'\n })\n supports_gzip_transfer_encoding = True\n cached_cover_url_is_reliable = False\n\n GOOGLE_COVER = 'https://books.google.com/books?id=%s&printsec=frontcover&img=1'\n\n DUMMY_IMAGE_MD5 = frozenset(\n {'0de4383ebad0adad5eeb8975cd796657', 'a64fa89d7ebc97075c1d363fc5fea71f'}\n )\n\n def get_book_url(self, identifiers): # {{{\n goog = identifiers.get('google', None)\n if goog is not None:\n return ('google', goog, 'https://books.google.com/books?id=%s' % goog)\n\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}): # {{{\n from urllib import urlencode\n BASE_URL = 'https://books.google.com/books/feeds/volumes?'\n isbn = check_isbn(identifiers.get('isbn', None))\n q = ''\n if isbn is not None:\n q += 'isbn:' + isbn\n elif title or authors:\n\n def build_term(prefix, parts):\n return ' '.join('in' + prefix + ':' + x for x in parts)\n\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n q += build_term('title', title_tokens)\n author_tokens = list(self.get_author_tokens(authors, only_first_author=True))\n if author_tokens:\n q += ('+' if q else '') + build_term('author', author_tokens)\n\n if isinstance(q, unicode):\n q = q.encode('utf-8')\n if not q:\n return None\n return BASE_URL + urlencode({\n 'q': q,\n 'max-results': 20,\n 'start-index': 1,\n 'min-viewability': 'none',\n })\n\n # }}}\n\n def download_cover( # {{{\n self,\n log,\n result_queue,\n abort,\n title=None,\n authors=None,\n identifiers={},\n timeout=30,\n get_best_cover=False\n ):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(\n log,\n rq,\n abort,\n title=title,\n authors=authors,\n identifiers=identifiers\n )\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(\n key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers\n )\n )\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n br = self.browser\n for candidate in (0, 1):\n if abort.is_set():\n return\n url = cached_url + '&zoom={}'.format(candidate)\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(url, timeout=timeout).read()\n if cdata:\n if hashlib.md5(cdata).hexdigest() in self.DUMMY_IMAGE_MD5:\n log.warning('Google returned a dummy image, ignoring')\n else:\n result_queue.put((self, cdata))\n break\n except Exception:\n log.exception('Failed to download cover from:', cached_url)\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n goog = identifiers.get('google', None)\n if goog is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n goog = self.cached_isbn_to_identifier(isbn)\n if goog is not None:\n url = self.cached_identifier_to_cover_url(goog)\n\n return url\n\n # }}}\n\n def get_all_details( # {{{\n self,\n br,\n log,\n entries,\n abort,\n result_queue,\n timeout\n ):\n from lxml import etree\n for relevance, i in enumerate(entries):\n try:\n ans = to_metadata(br, log, i, timeout)\n if isinstance(ans, Metadata):\n ans.source_relevance = relevance\n goog = ans.identifiers['google']\n for isbn in getattr(ans, 'all_isbns', []):\n self.cache_isbn_to_identifier(isbn, goog)\n if getattr(ans, 'has_google_cover', False):\n self.cache_identifier_to_cover_url(\n goog, self.GOOGLE_COVER % goog\n )\n self.clean_downloaded_metadata(ans)\n result_queue.put(ans)\n except:\n log.exception(\n 'Failed to get metadata for identify entry:', etree.tostring(i)\n )\n if abort.is_set():\n break\n\n # }}}\n\n def identify( # {{{\n self,\n log,\n result_queue,\n abort,\n title=None,\n authors=None,\n identifiers={},\n timeout=30\n ):\n from lxml import etree\n entry = XPath('//atom:entry')\n\n query = self.create_query(\n log, title=title, authors=authors, identifiers=identifiers\n )\n if not query:\n log.error('Insufficient metadata to construct query')\n return\n br = self.browser\n log('Making query:', query)\n try:\n raw = br.open_novisit(query, timeout=timeout).read()\n except Exception as e:\n log.exception('Failed to make identify query: %r' % query)\n return as_unicode(e)\n\n try:\n parser = etree.XMLParser(recover=True, no_network=True)\n feed = etree.fromstring(\n xml_to_unicode(clean_ascii_chars(raw), strip_encoding_pats=True)[0],\n parser=parser\n )\n entries = entry(feed)\n except Exception as e:\n log.exception('Failed to parse identify results')\n return as_unicode(e)\n\n if not entries and title and not abort.is_set():\n if identifiers:\n log('No results found, retrying without identifiers')\n return self.identify(\n log,\n result_queue,\n abort,\n title=title,\n authors=authors,\n timeout=timeout\n )\n ntitle = cleanup_title(title)\n if ntitle and ntitle != title:\n log('No results found, retrying without sub-title')\n return self.identify(\n log,\n result_queue,\n abort,\n title=ntitle,\n authors=authors,\n timeout=timeout\n )\n\n # There is no point running these queries in threads as google\n # throttles requests returning 403 Forbidden errors\n self.get_all_details(br, log, entries, abort, result_queue, timeout)\n\n # }}}\n\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug\n # src/calibre/ebooks/metadata/sources/google.py\n from calibre.ebooks.metadata.sources.test import (\n test_identify_plugin, title_test, authors_test\n )\n tests = [({\n 'identifiers': {\n 'isbn': '0743273567'\n },\n 'title': 'Great Gatsby',\n 'authors': ['Fitzgerald']\n }, [\n title_test('The great gatsby', exact=True),\n authors_test(['F. Scott Fitzgerald'])\n ]), ({\n 'title': 'Flatland',\n 'authors': ['Abbott']\n }, [title_test('Flatland', exact=False)]), ({\n 'title':\n 'The Blood Red Indian Summer: A Berger and Mitry Mystery',\n 'authors': ['David Handler'],\n }, [title_test('The Blood Red Indian Summer: A Berger and Mitry Mystery')])]\n test_identify_plugin(GoogleBooks.name, tests[:])\n\n# }}}\n",
"hashes": {
"amazon": "b4e05a23d5977a29413f8c31a5d4221b139ed18d",
"overdrive": "2e9fced7c6f8d8778ddfd30bca4dafce07e29667",
"big_book_search": "be5d30f0338d7218ccc9ce789bc0c1abab782d20",
"ozon": "7c0227525310f7b2cb09df4406a8c403ec12a908",
"google": "9c0b40f729cfc7166015c5058730814832b6d4c5",
"search_engines": "64c567211638569b273f38a309b640a4b6c94584",
"edelweiss": "d16963f0cd71f91b620303660f51420cbec06097",
"google_images": "e7e815ad0d8cafd3782cda61a4fbad0bb54f6518",
"douban": "7b23c5f63e17c65f80cc630e22dd68b2342ba8ba",
"openlibrary": "ad68135f861170468aab5fcb2a6a33e697c21459"
},
"edelweiss": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nimport time, re\nfrom threading import Thread\nfrom Queue import Queue, Empty\n\nfrom calibre import as_unicode, random_user_agent\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source\n\n\ndef clean_html(raw):\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.cleantext import clean_ascii_chars\n return clean_ascii_chars(xml_to_unicode(raw, strip_encoding_pats=True,\n resolve_entities=True, assume_utf8=True)[0])\n\n\ndef parse_html(raw):\n raw = clean_html(raw)\n from html5_parser import parse\n return parse(raw)\n\n\ndef astext(node):\n from lxml import etree\n return etree.tostring(node, method='text', encoding=unicode,\n with_tail=False).strip()\n\n\nclass Worker(Thread): # {{{\n\n def __init__(self, basic_data, relevance, result_queue, br, timeout, log, plugin):\n Thread.__init__(self)\n self.daemon = True\n self.basic_data = basic_data\n self.br, self.log, self.timeout = br, log, timeout\n self.result_queue, self.plugin, self.sku = result_queue, plugin, self.basic_data['sku']\n self.relevance = relevance\n\n def run(self):\n url = ('https://www.edelweiss.plus/GetTreelineControl.aspx?controlName=/uc/product/two_Enhanced.ascx&'\n 'sku={0}&idPrefix=content_1_{0}&mode=0'.format(sel f.sku))\n try:\n raw = self.br.open_novisit(url, timeout=self.timeout).read()\n except:\n self.log.exception('Failed to load comments page: %r'%url)\n return\n\n try:\n mi = self.parse(raw)\n mi.source_relevance = self.relevance\n self.plugin.clean_downloaded_metadata(mi)\n self.result_queue.put(mi)\n except:\n self.log.exception('Failed to parse details for sku: %s'%self.sku)\n\n def parse(self, raw):\n from calibre.ebooks.metadata.book.base import Metadata\n from calibre.utils.date import UNDEFINED_DATE\n root = parse_html(raw)\n mi = Metadata(self.basic_data['title'], self.basic_data['authors'])\n\n # Identifiers\n if self.basic_data['isbns']:\n mi.isbn = self.basic_data['isbns'][0]\n mi.set_identifier('edelweiss', self.sku)\n\n # Tags\n if self.basic_data['tags']:\n mi.tags = self.basic_data['tags']\n mi.tags = [t[1:].strip() if t.startswith('&') else t for t in mi.tags]\n\n # Publisher\n mi.publisher = self.basic_data['publisher']\n\n # Pubdate\n if self.basic_data['pubdate'] and self.basic_data['pubdate'].year != UNDEFINED_DATE:\n mi.pubdate = self.basic_data['pubdate']\n\n # Rating\n if self.basic_data['rating']:\n mi.rating = self.basic_data['rating']\n\n # Comments\n comments = ''\n for cid in ('summary', 'contributorbio', 'quotes_reviews'):\n cid = 'desc_{}{}-content'.format(cid, self.sku)\n div = root.xpath('//*[@id=\"{}\"]'.format(cid))\n if div:\n comments += self.render_comments(div[0])\n if comments:\n mi.comments = comments\n\n mi.has_cover = self.plugin.cached_identifier_to_cover_url(self.sk u) is not None\n return mi\n\n def render_comments(self, desc):\n from lxml import etree\n from calibre.library.comments import sanitize_comments_html\n for c in desc.xpath('descendant::noscript'):\n c.getparent().remove(c)\n for a in desc.xpath('descendant::a[@href]'):\n del a.attrib['href']\n a.tag = 'span'\n desc = etree.tostring(desc, method='html', encoding=unicode).strip()\n\n # remove all attributes from tags\n desc = re.sub(r'<([a-zA-Z0-9]+)\\s[^>]+>', r'<\\1>', desc)\n # Collapse whitespace\n # desc = re.sub('\\n+', '\\n', desc)\n # desc = re.sub(' +', ' ', desc)\n # Remove comments\n desc = re.sub(r'(?s)<!--.*?-->', '', desc)\n return sanitize_comments_html(desc)\n# }}}\n\n\ndef get_basic_data(browser, log, *skus):\n from calibre.utils.date import parse_only_date\n from mechanize import Request\n zeroes = ','.join('0' for sku in skus)\n data = {\n 'skus': ','.join(skus),\n 'drc': zeroes,\n 'startPosition': '0',\n 'sequence': '1',\n 'selected': zeroes,\n 'itemID': '0',\n 'orderID': '0',\n 'mailingID': '',\n 'tContentWidth': '926',\n 'originalOrder': ','.join(str(i) for i in range(len(skus))),\n 'selectedOrderID': '0',\n 'selectedSortColumn': '0',\n 'listType': '1',\n 'resultType': '32',\n 'blockView': '1',\n }\n items_data_url = 'https://www.edelweiss.plus/GetTreelineControl.aspx?controlName=/uc/listviews/ListView_Title_Multi.ascx'\n req = Request(items_data_url, data)\n response = browser.open_novisit(req)\n raw = response.read()\n root = parse_html(raw)\n for item in root.xpath('//div[@data-priority]'):\n row = item.getparent().getparent()\n sku = item.get('id').split('-')[-1]\n isbns = [x.strip() for x in row.xpath('descendant::*[contains(@class, \"pev_sku\")]/text()')[0].split(',') if check_isbn(x.strip())]\n isbns.sort(key=len, reverse=True)\n try:\n tags = [x.strip() for x in astext(row.xpath('descendant::*[contains(@class, \"pev_categories\")]')[0]).split('/')]\n except IndexError:\n tags = []\n rating = 0\n for bar in row.xpath('descendant::*[contains(@class, \"bgdColorCommunity\")]/@style'):\n m = re.search('width: (\\d+)px;.*max-width: (\\d+)px', bar)\n if m is not None:\n rating = float(m.group(1)) / float(m.group(2))\n break\n try:\n pubdate = parse_only_date(astext(row.xpath('descendant::*[contains(@class, \"pev_shipDate\")]')[0]\n ).split(':')[-1].split(u'\\xa0')[-1].strip(), assume_utc=True)\n except Exception:\n log.exception('Error parsing published date')\n pubdate = None\n authors = []\n for x in [x.strip() for x in row.xpath('descendant::*[contains(@class, \"pev_contributor\")]/@title')]:\n authors.extend(a.strip() for a in x.split(','))\n entry = {\n 'sku': sku,\n 'cover': row.xpath('descendant::img/@src')[0].split('?')[0],\n 'publisher': astext(row.xpath('descendant::*[contains(@class, \"headerPublisher\")]')[0]),\n 'title': astext(row.xpath('descendant::*[@id=\"title_{}\"]'.format(sku))[0]),\n 'authors': authors,\n 'isbns': isbns,\n 'tags': tags,\n 'pubdate': pubdate,\n 'format': ' '.join(row.xpath('descendant::*[contains(@class, \"pev_format\")]/text()')).strip(),\n 'rating': rating,\n }\n if entry['cover'].startswith('/'):\n entry['cover'] = None\n yield entry\n\n\nclass Edelweiss(Source):\n\n name = 'Edelweiss'\n version = (2, 0, 1)\n minimum_calibre_version = (3, 6, 0)\n description = _('Downloads metadata and covers from Edelweiss - A catalog updated by book publishers')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset([\n 'title', 'authors', 'tags', 'pubdate', 'comments', 'publisher',\n 'identifier:isbn', 'identifier:edelweiss', 'rating'])\n supports_gzip_transfer_encoding = True\n has_html_comments = True\n\n @property\n def user_agent(self):\n # Pass in an index to random_user_agent() to test with a particular\n # user agent\n return random_user_agent(allow_ie=False)\n\n def _get_book_url(self, sku):\n if sku:\n return 'https://www.edelweiss.plus/#sku={}&page=1'.format(sku)\n\n def get_book_url(self, identifiers): # {{{\n sku = identifiers.get('edelweiss', None)\n if sku:\n return 'edelweiss', sku, self._get_book_url(sku)\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n sku = identifiers.get('edelweiss', None)\n if not sku:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n sku = self.cached_isbn_to_identifier(isbn)\n return self.cached_identifier_to_cover_url(sku)\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}):\n from urllib import urlencode\n import time\n BASE_URL = ('https://www.edelweiss.plus/GetTreelineControl.aspx?'\n 'controlName=/uc/listviews/controls/ListView_data.ascx&itemID=0&resultType=32&dashboar dType=8&itemType=1&dataType=products&keywordSearch &')\n keywords = []\n isbn = check_isbn(identifiers.get('isbn', None))\n if isbn is not None:\n keywords.append(isbn)\n elif title:\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n keywords.extend(title_tokens)\n author_tokens = self.get_author_tokens(authors, only_first_author=True)\n if author_tokens:\n keywords.extend(author_tokens)\n if not keywords:\n return None\n params = {\n 'q': (' '.join(keywords)).encode('utf-8'),\n '_': str(int(time.time()))\n }\n return BASE_URL+urlencode(params)\n\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=30):\n import json\n\n br = self.browser\n br.addheaders = [\n ('Referer', 'https://www.edelweiss.plus/'),\n ('X-Requested-With', 'XMLHttpRequest'),\n ('Cache-Control', 'no-cache'),\n ('Pragma', 'no-cache'),\n ]\n if 'edelweiss' in identifiers:\n items = [identifiers['edelweiss']]\n else:\n log.error('Currently Edelweiss returns random books for search queries')\n return\n query = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers)\n if not query:\n log.error('Insufficient metadata to construct query')\n return\n log('Using query URL:', query)\n try:\n raw = br.open(query, timeout=timeout).read().decode('utf-8')\n except Exception as e:\n log.exception('Failed to make identify query: %r'%query)\n return as_unicode(e)\n items = re.search('window[.]items\\s*=\\s*(.+?);', raw)\n if items is None:\n log.error('Failed to get list of matching items')\n log.debug('Response text:')\n log.debug(raw)\n return\n items = json.loads(items.group(1))\n\n if (not items and identifiers and title and authors and\n not abort.is_set()):\n return self.identify(log, result_queue, abort, title=title,\n authors=authors, timeout=timeout)\n\n if not items:\n return\n\n workers = []\n items = items[:5]\n for i, item in enumerate(get_basic_data(self.browser, log, *items)):\n sku = item['sku']\n for isbn in item['isbns']:\n self.cache_isbn_to_identifier(isbn, sku)\n if item['cover']:\n self.cache_identifier_to_cover_url(sku, item['cover'])\n fmt = item['format'].lower()\n if 'audio' in fmt or 'mp3' in fmt:\n continue # Audio-book, ignore\n workers.append(Worker(item, i, result_queue, br.clone_browser(), timeout, log, self))\n\n if not workers:\n return\n\n for w in workers:\n w.start()\n # Don't send all requests at the same time\n time.sleep(0.1)\n\n while not abort.is_set():\n a_worker_is_alive = False\n for w in workers:\n w.join(0.2)\n if abort.is_set():\n break\n if w.is_alive():\n a_worker_is_alive = True\n if not a_worker_is_alive:\n break\n\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n br = self.browser\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(cached_url, timeout=timeout).read()\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n # }}}\n\n\nif __name__ == '__main__':\n from calibre.ebooks.metadata.sources.test import (\n test_identify_plugin, title_test, authors_test, comments_test, pubdate_test)\n tests = [\n ( # A title and author search\n {'title': 'The Husband\\'s Secret', 'authors':['Liane Moriarty']},\n [title_test('The Husband\\'s Secret', exact=True),\n authors_test(['Liane Moriarty'])]\n ),\n\n ( # An isbn present in edelweiss\n {'identifiers':{'isbn': '9780312621360'}, },\n [title_test('Flame: A Sky Chasers Novel', exact=True),\n authors_test(['Amy Kathleen Ryan'])]\n ),\n\n # Multiple authors and two part title and no general description\n ({'identifiers':{'edelweiss':'0321180607'}},\n [title_test(\n \"XQuery From the Experts:\u00a0A Guide to the W3C XML Query Language\"\n , exact=True), authors_test([\n 'Howard Katz', 'Don Chamberlin', 'Denise Draper', 'Mary Fernandez',\n 'Michael Kay', 'Jonathan Robie', 'Michael Rys', 'Jerome Simeon',\n 'Jim Tivy', 'Philip Wadler']), pubdate_test(2003, 8, 22),\n comments_test('J\u00e9r\u00f4me Sim\u00e9on'), lambda mi: bool(mi.comments and 'No title summary' not in mi.comments)\n ]),\n ]\n start, stop = 0, len(tests)\n\n tests = tests[start:stop]\n test_identify_plugin(Edelweiss.name, tests)\n",
"google_images": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2013, Kovid Goyal <kovid@kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nfrom collections import OrderedDict\n\nfrom calibre import random_user_agent\nfrom calibre.ebooks.metadata.sources.base import Source, Option\n\n\ndef parse_html(raw):\n try:\n from html5_parser import parse\n except ImportError:\n # Old versions of calibre\n import html5lib\n return html5lib.parse(raw, treebuilder='lxml', namespaceHTMLElements=False)\n else:\n return parse(raw)\n\n\nclass GoogleImages(Source):\n\n name = 'Google Images'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads covers from a Google Image search. Useful to find larger/alternate covers.')\n capabilities = frozenset(['cover'])\n can_get_multiple_covers = True\n supports_gzip_transfer_encoding = True\n options = (Option('max_covers', 'number', 5, _('Maximum number of covers to get'),\n _('The maximum number of covers to process from the Google search result')),\n Option('size', 'choices', 'svga', _('Cover size'),\n _('Search for covers larger than the specified size'),\n choices=OrderedDict((\n ('any', _('Any size'),),\n ('l', _('Large'),),\n ('qsvga', _('Larger than %s')%'400x300',),\n ('vga', _('Larger than %s')%'640x480',),\n ('svga', _('Larger than %s')%'600x800',),\n ('xga', _('Larger than %s')%'1024x768',),\n ('2mp', _('Larger than %s')%'2 MP',),\n ('4mp', _('Larger than %s')%'4 MP',),\n ))),\n )\n\n def download_cover(self, log, result_queue, abort,\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n if not title:\n return\n timeout = max(60, timeout) # Needs at least a minute\n title = ' '.join(self.get_title_tokens(title))\n author = ' '.join(self.get_author_tokens(authors))\n urls = self.get_image_urls(title, author, log, abort, timeout)\n self.download_multiple_covers(title, authors, urls, get_best_cover, timeout, result_queue, abort, log)\n\n @property\n def user_agent(self):\n return random_user_agent(allow_ie=False)\n\n def get_image_urls(self, title, author, log, abort, timeout):\n from calibre.utils.cleantext import clean_ascii_chars\n from urllib import urlencode\n import json\n from collections import OrderedDict\n ans = OrderedDict()\n br = self.browser\n q = urlencode({'as_q': ('%s %s'%(title, author)).encode('utf-8')}).decode('utf-8')\n sz = self.prefs['size']\n if sz == 'any':\n sz = ''\n elif sz == 'l':\n sz = 'isz:l,'\n else:\n sz = 'isz:lt,islt:%s,' % sz\n # See https://www.google.com/advanced_image_search to understand this\n # URL scheme\n url = 'https://www.google.com/search?as_st=y&tbm=isch&{}&as_epq=&as_oq=&as_eq=&c r=&as_sitesearch=&safe=images&tbs={}iar:t,ift:jpg' .format(q, sz)\n log('Search URL: ' + url)\n raw = clean_ascii_chars(br.open(url).read().decode('utf-8'))\n root = parse_html(raw)\n for div in root.xpath('//div[@class=\"rg_meta notranslate\"]'):\n try:\n data = json.loads(div.text)\n except Exception:\n continue\n if 'ou' in data:\n ans[data['ou']] = True\n return list(ans.iterkeys())\n\n\ndef test():\n from Queue import Queue\n from threading import Event\n from calibre.utils.logging import default_log\n p = GoogleImages(None)\n p.log = default_log\n rq = Queue()\n p.download_cover(default_log, rq, Event(), title='The Heroes',\n authors=('Joe Abercrombie',))\n print('Downloaded', rq.qsize(), 'covers')\n\n\nif __name__ == '__main__':\n test()\n",
"douban": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>; 2011, Li Fanxi <lifanxi@freemindworld.com>'\n__docformat__ = 'restructuredtext en'\n\nimport time\nfrom functools import partial\nfrom Queue import Queue, Empty\n\n\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source\nfrom calibre.ebooks.metadata.book.base import Metadata\nfrom calibre import as_unicode\n\nNAMESPACES = {\n 'openSearch':'http://a9.com/-/spec/opensearchrss/1.0/',\n 'atom' : 'http://www.w3.org/2005/Atom',\n 'db': 'https://www.douban.com/xmlns/',\n 'gd': 'http://schemas.google.com/g/2005'\n }\n\n\ndef get_details(browser, url, timeout): # {{{\n try:\n if Douban.DOUBAN_API_KEY and Douban.DOUBAN_API_KEY != '':\n url = url + \"?apikey=\" + Douban.DOUBAN_API_KEY\n raw = browser.open_novisit(url, timeout=timeout).read()\n except Exception as e:\n gc = getattr(e, 'getcode', lambda : -1)\n if gc() != 403:\n raise\n # Douban is throttling us, wait a little\n time.sleep(2)\n raw = browser.open_novisit(url, timeout=timeout).read()\n\n return raw\n# }}}\n\n\ndef to_metadata(browser, log, entry_, timeout): # {{{\n from lxml import etree\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.date import parse_date, utcnow\n from calibre.utils.cleantext import clean_ascii_chars\n\n XPath = partial(etree.XPath, namespaces=NAMESPACES)\n entry = XPath('//atom:entry')\n entry_id = XPath('descendant::atom:id')\n title = XPath('descendant::atom:title')\n description = XPath('descendant::atom:summary')\n publisher = XPath(\"descendant::db:attribute[@name='publisher']\")\n isbn = XPath(\"descendant::db:attribute[@name='isbn13']\")\n date = XPath(\"descendant::db:attribute[@name='pubdate']\")\n creator = XPath(\"descendant::db:attribute[@name='author']\")\n booktag = XPath(\"descendant::db:tag/attribute::name\")\n rating = XPath(\"descendant::gd:rating/attribute::average\")\n cover_url = XPath(\"descendant::atom:link[@rel='image']/attribute::href\")\n\n def get_text(extra, x):\n try:\n ans = x(extra)\n if ans:\n ans = ans[0].text\n if ans and ans.strip():\n return ans.strip()\n except:\n log.exception('Programming error:')\n return None\n\n id_url = entry_id(entry_)[0].text.replace('http://', 'https://')\n douban_id = id_url.split('/')[-1]\n title_ = ': '.join([x.text for x in title(entry_)]).strip()\n authors = [x.text.strip() for x in creator(entry_) if x.text]\n if not authors:\n authors = [_('Unknown')]\n if not id_url or not title:\n # Silently discard this entry\n return None\n\n mi = Metadata(title_, authors)\n mi.identifiers = {'douban':douban_id}\n try:\n raw = get_details(browser, id_url, timeout)\n feed = etree.fromstring(xml_to_unicode(clean_ascii_chars( raw),\n strip_encoding_pats=True)[0])\n extra = entry(feed)[0]\n except:\n log.exception('Failed to get additional details for', mi.title)\n return mi\n mi.comments = get_text(extra, description)\n mi.publisher = get_text(extra, publisher)\n\n # ISBN\n isbns = []\n for x in [t.text for t in isbn(extra)]:\n if check_isbn(x):\n isbns.append(x)\n if isbns:\n mi.isbn = sorted(isbns, key=len)[-1]\n mi.all_isbns = isbns\n\n # Tags\n try:\n btags = [x for x in booktag(extra) if x]\n tags = []\n for t in btags:\n atags = [y.strip() for y in t.split('/')]\n for tag in atags:\n if tag not in tags:\n tags.append(tag)\n except:\n log.exception('Failed to parse tags:')\n tags = []\n if tags:\n mi.tags = [x.replace(',', ';') for x in tags]\n\n # pubdate\n pubdate = get_text(extra, date)\n if pubdate:\n try:\n default = utcnow().replace(day=15)\n mi.pubdate = parse_date(pubdate, assume_utc=True, default=default)\n except:\n log.error('Failed to parse pubdate %r'%pubdate)\n\n # Ratings\n if rating(extra):\n try:\n mi.rating = float(rating(extra)[0]) / 2.0\n except:\n log.exception('Failed to parse rating')\n mi.rating = 0\n\n # Cover\n mi.has_douban_cover = None\n u = cover_url(extra)\n if u:\n u = u[0].replace('/spic/', '/lpic/')\n # If URL contains \"book-default\", the book doesn't have a cover\n if u.find('book-default') == -1:\n mi.has_douban_cover = u\n return mi\n# }}}\n\n\nclass Douban(Source):\n\n name = 'Douban Books'\n author = 'Li Fanxi'\n version = (2, 1, 0)\n minimum_calibre_version = (2, 80, 0)\n\n description = _('Downloads metadata and covers from Douban.com. '\n 'Useful only for Chinese language books.')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset(['title', 'authors', 'tags',\n 'pubdate', 'comments', 'publisher', 'identifier:isbn', 'rating',\n 'identifier:douban']) # language currently disabled\n supports_gzip_transfer_encoding = True\n cached_cover_url_is_reliable = True\n\n DOUBAN_API_KEY = '0bd1672394eb1ebf2374356abec15c3d'\n DOUBAN_BOOK_URL = 'https://book.douban.com/subject/%s/'\n\n def get_book_url(self, identifiers): # {{{\n db = identifiers.get('douban', None)\n if db is not None:\n return ('douban', db, self.DOUBAN_BOOK_URL%db)\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}): # {{{\n from urllib import urlencode\n SEARCH_URL = 'https://api.douban.com/book/subjects?'\n ISBN_URL = 'https://api.douban.com/book/subject/isbn/'\n SUBJECT_URL = 'https://api.douban.com/book/subject/'\n\n q = ''\n t = None\n isbn = check_isbn(identifiers.get('isbn', None))\n subject = identifiers.get('douban', None)\n if isbn is not None:\n q = isbn\n t = 'isbn'\n elif subject is not None:\n q = subject\n t = 'subject'\n elif title or authors:\n def build_term(prefix, parts):\n return ' '.join(x for x in parts)\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n q += build_term('title', title_tokens)\n author_tokens = list(self.get_author_tokens(authors,\n only_first_author=True))\n if author_tokens:\n q += ((' ' if q != '' else '') +\n build_term('author', author_tokens))\n t = 'search'\n q = q.strip()\n if isinstance(q, unicode):\n q = q.encode('utf-8')\n if not q:\n return None\n url = None\n if t == \"isbn\":\n url = ISBN_URL + q\n elif t == 'subject':\n url = SUBJECT_URL + q\n else:\n url = SEARCH_URL + urlencode({\n 'q': q,\n })\n if self.DOUBAN_API_KEY and self.DOUBAN_API_KEY != '':\n if t == \"isbn\" or t == \"subject\":\n url = url + \"?apikey=\" + self.DOUBAN_API_KEY\n else:\n url = url + \"&apikey=\" + self.DOUBAN_API_KEY\n return url\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n br = self.browser\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(cached_url, timeout=timeout).read()\n if cdata:\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n db = identifiers.get('douban', None)\n if db is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n db = self.cached_isbn_to_identifier(isbn)\n if db is not None:\n url = self.cached_identifier_to_cover_url(db)\n\n return url\n # }}}\n\n def get_all_details(self, br, log, entries, abort, # {{{\n result_queue, timeout):\n from lxml import etree\n for relevance, i in enumerate(entries):\n try:\n ans = to_metadata(br, log, i, timeout)\n if isinstance(ans, Metadata):\n ans.source_relevance = relevance\n db = ans.identifiers['douban']\n for isbn in getattr(ans, 'all_isbns', []):\n self.cache_isbn_to_identifier(isbn, db)\n if ans.has_douban_cover:\n self.cache_identifier_to_cover_url(db,\n ans.has_douban_cover)\n self.clean_downloaded_metadata(ans)\n result_queue.put(ans)\n except:\n log.exception(\n 'Failed to get metadata for identify entry:',\n etree.tostring(i))\n if abort.is_set():\n break\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=30):\n from lxml import etree\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.cleantext import clean_ascii_chars\n\n XPath = partial(etree.XPath, namespaces=NAMESPACES)\n entry = XPath('//atom:entry')\n\n query = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers)\n if not query:\n log.error('Insufficient metadata to construct query')\n return\n br = self.browser\n try:\n raw = br.open_novisit(query, timeout=timeout).read()\n except Exception as e:\n log.exception('Failed to make identify query: %r'%query)\n return as_unicode(e)\n try:\n parser = etree.XMLParser(recover=True, no_network=True)\n feed = etree.fromstring(xml_to_unicode(clean_ascii_chars( raw),\n strip_encoding_pats=True)[0], parser=parser)\n entries = entry(feed)\n except Exception as e:\n log.exception('Failed to parse identify results')\n return as_unicode(e)\n if not entries and identifiers and title and authors and \\\n not abort.is_set():\n return self.identify(log, result_queue, abort, title=title,\n authors=authors, timeout=timeout)\n\n # There is no point running these queries in threads as douban\n # throttles requests returning 403 Forbidden errors\n self.get_all_details(br, log, entries, abort, result_queue, timeout)\n\n return None\n # }}}\n\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug -e src/calibre/ebooks/metadata/sources/douban.py\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n title_test, authors_test)\n test_identify_plugin(Douban.name,\n [\n\n\n (\n {'identifiers':{'isbn': '9787536692930'}, 'title':'\u4e09\u4f53',\n 'authors':['\u5218\u6148\u6b23']},\n [title_test('\u4e09\u4f53', exact=True),\n authors_test(['\u5218\u6148\u6b23'])]\n ),\n\n (\n {'title': 'Linux\u5185\u6838\u4fee\u70bc\u4e4b\u9053', 'authors':['\u4efb\u6865\u4f1f']},\n [title_test('Linux\u5185\u6838\u4fee\u70bc\u4e4b\u9 053', exact=False)]\n ),\n ])\n# }}}\n",
"openlibrary": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nfrom calibre.ebooks.metadata.sources.base import Source\n\n\nclass OpenLibrary(Source):\n\n name = 'Open Library'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads covers from The Open Library')\n\n capabilities = frozenset(['cover'])\n\n OPENLIBRARY = 'https://covers.openlibrary.org/b/isbn/%s-L.jpg?default=false'\n\n def download_cover(self, log, result_queue, abort,\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n if 'isbn' not in identifiers:\n return\n isbn = identifiers['isbn']\n br = self.browser\n try:\n ans = br.open_novisit(self.OPENLIBRARY%isbn, timeout=timeout).read()\n result_queue.put((self, ans))\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and e.getcode() == 404:\n log.error('No cover for ISBN: %r found'%isbn)\n else:\n log.exception('Failed to download cover for ISBN:', isbn)\n",
"search_engines": "#!/usr/bin/env python2\n# vim:fileencoding=utf-8\n# License: GPLv3 Copyright: 2017, Kovid Goyal <kovid at kovidgoyal.net>\n\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport json\nimport re\nimport time\nfrom collections import defaultdict, namedtuple\nfrom polyglot.builtins import map\nfrom urllib import quote_plus, urlencode\nfrom urlparse import parse_qs\n\nfrom lxml import etree\n\nfrom calibre import browser as _browser, prints, random_user_agent\nfrom calibre.utils.monotonic import monotonic\nfrom calibre.utils.random_ua import accept_header_for_ua\n\ncurrent_version = (1, 0, 1)\nminimum_calibre_version = (2, 80, 0)\n\n\nlast_visited = defaultdict(lambda: 0)\nResult = namedtuple('Result', 'url title cached_url')\n\n\ndef tostring(elem):\n return etree.tostring(elem, encoding=unicode, method='text', with_tail=False)\n\n\ndef browser():\n ua = random_user_agent(allow_ie=False)\n br = _browser(user_agent=ua)\n br.set_handle_gzip(True)\n br.addheaders += [\n ('Accept', accept_header_for_ua(ua)),\n ('Upgrade-insecure-requests', '1'),\n ]\n return br\n\n\ndef encode_query(**query):\n q = {k.encode('utf-8'): v.encode('utf-8') for k, v in query.iteritems()}\n return urlencode(q).decode('utf-8')\n\n\ndef parse_html(raw):\n try:\n from html5_parser import parse\n except ImportError:\n # Old versions of calibre\n import html5lib\n return html5lib.parse(raw, treebuilder='lxml', namespaceHTMLElements=False)\n else:\n return parse(raw)\n\n\ndef query(br, url, key, dump_raw=None, limit=1, parser=parse_html, timeout=60):\n delta = monotonic() - last_visited[key]\n if delta < limit and delta > 0:\n time.sleep(delta)\n try:\n raw = br.open_novisit(url, timeout=timeout).read()\n finally:\n last_visited[key] = monotonic()\n if dump_raw is not None:\n with open(dump_raw, 'wb') as f:\n f.write(raw)\n return parser(raw)\n\n\ndef quote_term(x):\n return quote_plus(x.encode('utf-8')).decode('utf-8')\n\n\n# DDG + Wayback machine {{{\n\ndef ddg_term(t):\n t = t.replace('\"', '')\n if t.lower() in {'map', 'news'}:\n t = '\"' + t + '\"'\n if t in {'OR', 'AND', 'NOT'}:\n t = t.lower()\n return t\n\n\ndef ddg_href(url):\n if url.startswith('/'):\n q = url.partition('?')[2]\n url = parse_qs(q.encode('utf-8'))['uddg'][0].decode('utf-8')\n return url\n\n\ndef wayback_machine_cached_url(url, br=None, log=prints, timeout=60):\n q = quote_term(url)\n br = br or browser()\n data = query(br, 'https://archive.org/wayback/available?url=' +\n q, 'wayback', parser=json.loads, limit=0.25, timeout=timeout)\n try:\n closest = data['archived_snapshots']['closest']\n if closest['available']:\n return closest['url'].replace('http:', 'https:')\n except Exception:\n pass\n from pprint import pformat\n log('Response from wayback machine:', pformat(data))\n\n\ndef wayback_url_processor(url):\n if url.startswith('/'):\n # Use original URL instead of absolutizing to wayback URL as wayback is\n # slow\n m = re.search('https?:', url)\n if m is None:\n url = 'https://web.archive.org' + url\n else:\n url = url[m.start():]\n return url\n\n\ndef ddg_search(terms, site=None, br=None, log=prints, safe_search=False, dump_raw=None, timeout=60):\n # https://duck.co/help/results/syntax\n terms = map(ddg_term, terms)\n terms = [quote_term(t) for t in terms]\n if site is not None:\n terms.append(quote_term(('site:' + site)))\n q = '+'.join(terms)\n url = 'https://duckduckgo.com/html/?q={q}&kp={kp}'.format(\n q=q, kp=1 if safe_search else -1)\n log('Making ddg query: ' + url)\n br = br or browser()\n root = query(br, url, 'ddg', dump_raw, timeout=timeout)\n ans = []\n for a in root.xpath('//*[@class=\"results\"]//*[@class=\"result__title\"]/a[@href and @class=\"result__a\"]'):\n ans.append(Result(ddg_href(a.get('href')), tostring(a), None))\n return ans, url\n\n\ndef ddg_develop():\n br = browser()\n for result in ddg_search('heroes abercrombie'.split(), 'www.amazon.com', dump_raw='/t/raw.html', br=br)[0]:\n if '/dp/' in result.url:\n print(result.title)\n print(' ', result.url)\n print(' ', wayback_machine_cached_url(result.url, br))\n print()\n# }}}\n\n# Bing {{{\n\n\ndef bing_term(t):\n t = t.replace('\"', '')\n if t in {'OR', 'AND', 'NOT'}:\n t = t.lower()\n return t\n\n\ndef bing_url_processor(url):\n return url\n\n\ndef bing_search(terms, site=None, br=None, log=prints, safe_search=False, dump_raw=None, timeout=60):\n # http://vlaurie.com/computers2/Articles/bing_advanced_search.htm\n terms = map(bing_term, terms)\n terms = [quote_term(t) for t in terms]\n if site is not None:\n terms.append(quote_term(('site:' + site)))\n q = '+'.join(terms)\n url = 'https://www.bing.com/search?q={q}'.format(q=q)\n log('Making bing query: ' + url)\n br = br or browser()\n root = query(br, url, 'bing', dump_raw, timeout=timeout)\n ans = []\n for li in root.xpath('//*[@id=\"b_results\"]/li[@class=\"b_algo\"]'):\n a = li.xpath('descendant::h2/a[@href]')[0]\n title = tostring(a)\n try:\n div = li.xpath('descendant::div[@class=\"b_attribution\" and @u]')[0]\n except IndexError:\n log('Ignoring {!r} as it has no cached page'.format(title))\n continue\n d, w = div.get('u').split('|')[-2:]\n # The bing cache does not have a valid https certificate currently\n # (March 2017)\n cached_url = 'http://cc.bingj.com/cache.aspx?q={q}&d={d}&mkt=en-US&setlang=en-US&w={w}'.format(\n q=q, d=d, w=w)\n ans.append(Result(a.get('href'), title, cached_url))\n if not ans:\n title = ' '.join(root.xpath('//title/text()'))\n log('Failed to find any results on results page, with title:', title)\n return ans, url\n\n\ndef bing_develop():\n br = browser()\n for result in bing_search('heroes abercrombie'.split(), 'www.amazon.com', dump_raw='/t/raw.html', br=br)[0]:\n if '/dp/' in result.url:\n print(result.title)\n print(' ', result.url)\n print(' ', result.cached_url)\n print()\n# }}}\n\n# Google {{{\n\n\ndef google_term(t):\n t = t.replace('\"', '')\n if t in {'OR', 'AND', 'NOT'}:\n t = t.lower()\n return t\n\n\ndef google_url_processor(url):\n return url\n\n\ndef google_search(terms, site=None, br=None, log=prints, safe_search=False, dump_raw=None, timeout=60):\n terms = map(google_term, terms)\n terms = [quote_term(t) for t in terms]\n if site is not None:\n terms.append(quote_term(('site:' + site)))\n q = '+'.join(terms)\n url = 'https://www.google.com/search?q={q}'.format(q=q)\n log('Making google query: ' + url)\n br = br or browser()\n root = query(br, url, 'google', dump_raw, timeout=timeout)\n ans = []\n for div in root.xpath('//*[@id=\"search\"]//*[@id=\"rso\"]//*[@class=\"g\"]'):\n try:\n a = div.xpath('descendant::h3[@class=\"r\"]/a[@href]')[0]\n except IndexError:\n log('Ignoring div with no descendant')\n continue\n title = tostring(a)\n try:\n c = div.xpath('descendant::div[@class=\"s\"]//a[@class=\"fl\"]')[0]\n except IndexError:\n log('Ignoring {!r} as it has no cached page'.format(title))\n continue\n cached_url = c.get('href')\n ans.append(Result(a.get('href'), title, cached_url))\n if not ans:\n title = ' '.join(root.xpath('//title/text()'))\n log('Failed to find any results on results page, with title:', title)\n return ans, url\n\n\ndef google_develop():\n br = browser()\n for result in google_search('1423146786'.split(), 'www.amazon.com', dump_raw='/t/raw.html', br=br)[0]:\n if '/dp/' in result.url:\n print(result.title)\n print(' ', result.url)\n print(' ', result.cached_url)\n print()\n# }}}\n\n\ndef resolve_url(url):\n prefix, rest = url.partition(':')[::2]\n if prefix == 'bing':\n return bing_url_processor(rest)\n if prefix == 'wayback':\n return wayback_url_processor(rest)\n return url\n"
}][/CODE]
that caused
Spoiler:
Code:
calibre, version 3.31.0
ERROR: Download failed: Failed to download metadata. Click Show Details to see details

Traceback (most recent call last):
  File "site-packages/calibre/utils/ipc/simple_worker.py", line 289, in main
  File "site-packages/calibre/ebooks/metadata/sources/worker.py", line 102, in single_identify
  File "site-packages/calibre/ebooks/metadata/sources/update.py", line 79, in patch_plugins
  File "site-packages/calibre/ebooks/metadata/sources/update.py", line 62, in patch_search_engines
  File "<string>", line 11, in <module>
ImportError: No module named polyglot.builtins
with metadata-sources-cache.json containing empty JSON dataset
Code:
{}
and ran calibre-debug, got
Spoiler:
Code:
calibre Debug log
calibre 3.31  embedded-python: True is64bit: True
Linux-4.15.0-33-lowlatency-x86_64-with-debian-stretch-sid Linux ('64bit', 'ELF')
('Linux', '4.15.0-33-lowlatency', '#36~16.04.1-Ubuntu SMP PREEMPT Wed Aug 15 19:09:25 UTC 2018')
Python 2.7.12
Linux: ('debian', 'stretch/sid', '')
Interface language: None
Successfully initialized third party plugins: FictionDB (1, 0, 10) && Read MP3 AudioBook metadata (1, 0, 79) && Goodreads (1, 1, 14) && Job Spy (1, 0, 132) && View Manager (1, 4, 3) && Barnes & Noble (1, 2, 15) && Overdrive Link (2, 29, 0) && Goodreads Sync (1, 12, 0) && Find Duplicates (1, 6, 3) && EpubSplit (2, 4, 0)
calibre 3.31  embedded-python: True is64bit: True
Linux-4.15.0-33-lowlatency-x86_64-with-debian-stretch-sid Linux ('64bit', 'ELF')
('Linux', '4.15.0-33-lowlatency', '#36~16.04.1-Ubuntu SMP PREEMPT Wed Aug 15 19:09:25 UTC 2018')
Python 2.7.12
Linux: ('debian', 'stretch/sid', '')
Interface language: None
Successfully initialized third party plugins: FictionDB (1, 0, 10) && Read MP3 AudioBook metadata (1, 0, 79) && Goodreads (1, 1, 14) && Job Spy (1, 0, 132) && View Manager (1, 4, 3) && Barnes & Noble (1, 2, 15) && Overdrive Link (2, 29, 0) && Goodreads Sync (1, 12, 0) && Find Duplicates (1, 6, 3) && EpubSplit (2, 4, 0)
Turning on automatic hidpi scaling
devicePixelRatio: 1.0
logicalDpi: 96.0 x 96.0
physicalDpi: 98.5496535797 x 98.4132841328
Using calibre Qt style: True
[0.00] Starting up...
[0.05] Showing splash screen...
[0.41] splash screen shown
[0.41] Initializing db...
[0.63] db initialized
[0.63] Constructing main UI...
Job Spy has begun initialization...
Calibre, and hence Job Spy, was gracefully shut down last time?  True
Last time daemon started:  never
Last time daemon failed:  never
Total daemon starts inception_to_date:  0
Total daemon failures inception-to-date:  0
libpng warning: iCCP: known incorrect sRGB profile
Job Spy has finished initialization...
DEBUG:    0.0 HttpHelper::__init__: proxy=None
[6.05] main UI initialized...
[6.05] Hiding splash screen
[70.18] splash screen hidden
[70.18] Started up in 70.18 seconds with 1551 books
Metadata sources cache was recently updated not updating again
Metadata sources cache was recently updated not updating again

(mousepad:14125): GtkSourceView-CRITICAL **: gtk_source_style_scheme_get_id: assertion 'GTK_SOURCE_IS_STYLE_SCHEME (scheme)' failed

(mousepad:14125): GLib-CRITICAL **: g_variant_new_string: assertion 'string != NULL' failed

(mousepad:14125): GtkSourceView-CRITICAL **: gtk_source_style_scheme_get_id: assertion 'GTK_SOURCE_IS_STYLE_SCHEME (scheme)' failed

(mousepad:14125): GLib-CRITICAL **: g_variant_new_string: assertion 'string != NULL' failed

(mousepad:14125): GtkSourceView-CRITICAL **: gtk_source_style_scheme_get_id: assertion 'GTK_SOURCE_IS_STYLE_SCHEME (scheme)' failed
with no error while downloading metadata this time. Kept working after several restarts.

Hope this information helps.
kenmac999 is offline   Reply With Quote
Old 09-10-2018, 08:41 PM   #56
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,567
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by PacificNW View Post
Hey! That was the solution I posted several hours ago in the other thread.
Moderator Notice

You neglected to slap a copyright notice on it

BR
BetterRed is offline   Reply With Quote
Old 09-10-2018, 09:17 PM   #57
nwgal
Member
nwgal began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Sep 2018
Location: Pacific Northwest
Device: Nook
Sorry, but I'm not understanding what the solution is. I tried deleting the file, which is a temporary fix. I don't understand how to "change the file to zero bytes", and the post from kenmac999, sorry, but how do you replace the file with "empty JSON dataset".

My apologies. I can follow directions, but I guess I"m just clueless on this.

Thanks.
nwgal is offline   Reply With Quote
Old 09-10-2018, 09:27 PM   #58
AlisaP
Junior Member
AlisaP has learned how to buy an e-book online
 
Posts: 3
Karma: 80
Join Date: Sep 2018
Device: none
Quote:
Originally Posted by nwgal View Post
Sorry, but I'm not understanding what the solution is. I tried deleting the file, which is a temporary fix. I don't understand how to "change the file to zero bytes", and the post from kenmac999, sorry, but how do you replace the file with "empty JSON dataset".

My apologies. I can follow directions, but I guess I"m just clueless on this.

Thanks.
Open the file in a text editor (I use Edit Pad Lite).

Either delete everything in the file or replace it with this

{}

Save.

Hope that helps!
AlisaP is offline   Reply With Quote
Old 09-10-2018, 09:52 PM   #59
Kathleen1810
Junior Member
Kathleen1810 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Sep 2018
Location: Canada
Device: kindle paperwhite
Quote:
Originally Posted by GalacticHull View Post
This seems to work. Though the file did re-appear on one occasion without closing Calibre. Strange stuff.
Thank you..! I ran into this problem this evening and have been searching for a solution. This worked for me. Deleted the file and was then able to download the meta data. I then closed Calibre and reopened it. Tried downloading the meta data for another book and it worked just fine.
Kathleen1810 is offline   Reply With Quote
Old 09-10-2018, 10:00 PM   #60
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Apologies, should be fine now. You might need to restart calibre, then start a metadata download. The first one might still fail, but after that it should be fine.
kovidgoyal is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unable to download metadata Audra Calibre 8 08-26-2017 09:13 PM
Unable to Download Metadata Apache Calibre 3 04-08-2015 10:07 AM
Unable to Download Metadata Neptunus Calibre 4 03-13-2014 06:03 AM
[Solved] How come I am unable to download metadata and covers? schmuck281 Calibre 15 12-25-2013 12:01 AM
Unable to Download Metadata Nalgarryn Library Management 5 01-04-2013 12:31 PM


All times are GMT -4. The time now is 10:20 PM.


MobileRead.com is a privately owned, operated and funded community.