[SOLVED] UNABLE TO DOWNLOAD METADATA - Page 4

09-10-2018, 07:38 PM

#46

gbm

Wizard

gbm ought to be getting tired of karma fortunes by now.

Posts: 2,237

Karma: 8888888

Join Date: Jun 2010

Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite

Quote:

Originally Posted by Corpsegoddess

I started experiencing this last night, and I didn't install the 3.31 patch until about an hour ago. Deleting the cache worked for the first book I tried downloading metadata for, but doesn't work with any subsequent attempts, even if I have an isbn for them.

See these two posts.

https://www.mobileread.com/forums/sh...90&postcount=8

https://www.mobileread.com/forums/sh...0&postcount=14

bernie

09-10-2018, 07:39 PM	#47
Corpsegoddess Groupie Posts: 190 Karma: 168826 Join Date: Jul 2011 Location: Vancouver, BC Device: Kobo Aura One	Thank you! I missed those. I always appreciate the help and patience on these boards. You guys rock.

Advert

09-10-2018, 07:43 PM

#48

gbm

Wizard

Posts: 2,237

Karma: 8888888

Join Date: Jun 2010

Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite

Quote:

Originally Posted by BetterRed

But, that change was made after 3.31 was released, in fact only a few hours ago. How could it immediately affect everyone on every platform.

Surely Kovid wouldn't be pushing changes.

Timebomb.

The metadata-sources-cache.json contains code that appears to import from polyglot.builtins

Be interesting to know if anyone who runs calibre from very latest source is getting the problem.

BR

Take a look at my debug logs.

bernie

09-10-2018, 07:50 PM

#49

BetterRed

null operator (he/him)

Posts: 22,404

Karma: 31000056

Join Date: Mar 2012

Location: Sydney Australia

Device: none

Quote:

Originally Posted by Corpsegoddess

Close calibre and try try replacing it with this one ==>> http://www.mediafire.com/file/q7yaba...ache.json/file

BR

09-10-2018, 07:56 PM	#50
Corpsegoddess Groupie Posts: 190 Karma: 168826 Join Date: Jul 2011 Location: Vancouver, BC Device: Kobo Aura One	Thank you!

Advert

09-10-2018, 08:04 PM

#51

BetterRed

null operator (he/him)

Posts: 22,404

Karma: 31000056

Join Date: Mar 2012

Location: Sydney Australia

Device: none

Quote:

Originally Posted by gbm

Take a look at my debug logs.

bernie

Thanks, but whilst they may mean something to a programmer, they mean nothing to an ordinary person like me

09-10-2018, 08:48 PM	#52
rdorton Junior Member Posts: 2 Karma: 10 Join Date: Aug 2012 Device: Kindle	Can't download metadata Hello, I'm Ray. I check this forum occasionally when I'm searching for information and solutions. I've found this site is a great source-thanks! I am having trouble downloading metadata. I experienced this problem using V3.30. I updated to V3.31-same problem. Any help would be appreciated. I get the following error: Failed to download metadata. Show details to see details. DETAILS: calibre, version 3.31.0 ERROR: Download failed: Failed to download metadata. Click Show Details to see details Traceback (most recent call last): File "site-packages\calibre\utils\ipc\simple_worker.py", line 289, in main File "site-packages\calibre\ebooks\metadata\sources\worker.py ", line 102, in single_identify File "site-packages\calibre\ebooks\metadata\sources\update.py ", line 79, in patch_plugins File "site-packages\calibre\ebooks\metadata\sources\update.py ", line 62, in patch_search_engines File "<string>", line 11, in <module> ImportError: No module named polyglot.builtins

09-10-2018, 09:14 PM	#53
Chris_Snow Groupie Posts: 156 Karma: 8170 Join Date: Jul 2013 Device: kobo glo	I've also been busted with these issue. Guess I'll wait for a calibre update.

09-10-2018, 09:21 PM

#54

PacificNW

Junior Mint

Posts: 5

Karma: 10

Join Date: Sep 2018

Device: Kindle of DOOM

Quote:

Originally Posted by BetterRed

Close calibre and try try replacing it with this one ==>> http://www.mediafire.com/file/q7yaba...ache.json/file

BR

Hey! That was the solution I posted several hours ago in the other thread.

https://www.mobileread.com/forums/sh...0&postcount=14

Last edited by PacificNW; 09-10-2018 at 09:26 PM. Reason: URL of plagiarised fix fix. Try try!

09-10-2018, 09:32 PM	#55
kenmac999 Member Posts: 22 Karma: 10 Join Date: Sep 2013 Location: Oklahoma Device: Devices: Motorola G Power 5G, ASUS C436F Chromebook, Kindle Fire 7" Ap	I have same problem. After reading this and the other threads, I tried replacing my original metadata-sources-cache.json Spoiler: [CODE{ "amazon": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\n# License: GPLv3 Copyright: 2011, Kovid Goyal <kovid at kovidgoyal.net>\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport re\nimport socket\nimport time\nfrom functools import partial\nfrom Queue import Empty, Queue\nfrom threading import Thread\nfrom urlparse import urlparse\n\nfrom calibre import as_unicode, browser, random_user_agent\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.book.base import Metadata\nfrom calibre.ebooks.metadata.sources.base import Option, Source, fixauthors, fixcase\nfrom calibre.utils.localization import canonicalize_lang\nfrom calibre.utils.random_ua import accept_header_for_ua, all_user_agents\n\n\nclass CaptchaError(Exception):\n pass\n\n\nclass SearchFailed(ValueError):\n pass\n\n\nua_index = -1\n\n\ndef parse_html(raw):\n try:\n from html5_parser import parse\n except ImportError:\n # Old versions of calibre\n import html5lib\n return html5lib.parse(raw, treebuilder='lxml', namespaceHTMLElements=False)\n else:\n return parse(raw)\n\n\ndef parse_details_page(url, log, timeout, browser, domain):\n from calibre.utils.cleantext import clean_ascii_chars\n from calibre.ebooks.chardet import xml_to_unicode\n from lxml.html import tostring\n log('Getting details from:', url)\n try:\n raw = browser.open_novisit(url, timeout=timeout).read().strip()\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and \\\n e.getcode() == 404:\n log.error('URL malformed: %r' % url)\n return\n attr = getattr(e, 'args', [None])\n attr = attr if attr else [None]\n if isinstance(attr[0], socket.timeout):\n msg = 'Details page timed out. Try again later.'\n log.error(msg)\n else:\n msg = 'Failed to make details query: %r' % url\n log.exception(msg)\n return\n\n oraw = raw\n if 'amazon.com.br' in url:\n # amazon.com.br serves utf-8 but has an incorrect latin1 <meta> tag\n raw = raw.decode('utf-8')\n raw = xml_to_unicode(raw, strip_encoding_pats=True,\n resolve_entities=True)[0]\n if '<title>404 - ' in raw:\n raise ValueError('URL malformed: %r' % url)\n if '>Could not find the requested document in the cache.<' in raw:\n raise ValueError('No cached entry for %s found' % url)\n\n try:\n root = parse_html(clean_ascii_chars(raw))\n except Exception:\n msg = 'Failed to parse amazon details page: %r' % url\n log.exception(msg)\n return\n if domain == 'jp':\n for a in root.xpath('//a[@href]'):\n if 'black-curtain-redirect.html' in a.get('href'):\n url = 'https://amazon.co.jp' + a.get('href')\n log('Black curtain redirect found, following')\n return parse_details_page(url, log, timeout, browser, domain)\n\n errmsg = root.xpath('//[@id=\"errorMessage\"]')\n if errmsg:\n msg = 'Failed to parse amazon details page: %r' % url\n msg += tostring(errmsg, method='text', encoding=unicode).strip()\n log.error(msg)\n return\n\n from css_selectors import Select\n selector = Select(root)\n return oraw, root, selector\n\n\ndef parse_asin(root, log, url):\n try:\n link = root.xpath('//link[@rel=\"canonical\" and @href]')\n for l in link:\n return l.get('href').rpartition('/')[-1]\n except Exception:\n log.exception('Error parsing ASIN for url: %r' % url)\n\n\nclass Worker(Thread): # Get details {{{\n\n '''\n Get book details from amazons book page in a separate thread\n '''\n\n def __init__(self, url, result_queue, browser, log, relevance, domain,\n plugin, timeout=20, testing=False, preparsed_root=None,\n cover_url_processor=None, filter_result=None):\n Thread.__init__(self)\n self.cover_url_processor = cover_url_processor\n self.preparsed_root = preparsed_root\n self.daemon = True\n self.testing = testing\n self.url, self.result_queue = url, result_queue\n self.log, self.timeout = log, timeout\n self.filter_result = filter_result or (lambda x, log: True)\n self.relevance, self.plugin = relevance, plugin\n self.browser = browser\n self.cover_url = self.amazon_id = self.isbn = None\n self.domain = domain\n from lxml.html import tostring\n self.tostring = tostring\n\n months = { # {{{\n 'de': {\n 1: ['j\u00e4n', 'januar'],\n 2: ['februar'],\n 3: ['m\u00e4rz'],\n 5: ['mai'],\n 6: ['juni'],\n 7: ['juli'],\n 10: ['okt', 'oktober'],\n 12: ['dez', 'dezember']\n },\n 'it': {\n 1: ['gennaio', 'enn'],\n 2: ['febbraio', 'febbr'],\n 3: ['marzo'],\n 4: ['aprile'],\n 5: ['maggio', 'magg'],\n 6: ['giugno'],\n 7: ['luglio'],\n 8: ['agosto', 'ag'],\n 9: ['settembre', 'sett'],\n 10: ['ottobre', 'ott'],\n 11: ['novembre'],\n 12: ['dicembre', 'dic'],\n },\n 'fr': {\n 1: ['janv'],\n 2: ['f\u00e9vr'],\n 3: ['mars'],\n 4: ['avril'],\n 5: ['mai'],\n 6: ['juin'],\n 7: ['juil'],\n 8: ['ao\u00fbt'],\n 9: ['sept'],\n 12: ['d\u00e9c'],\n },\n 'br': {\n 1: ['janeiro'],\n 2: ['fevereiro'],\n 3: ['mar\u00e7o'],\n 4: ['abril'],\n 5: ['maio'],\n 6: ['junho'],\n 7: ['julho'],\n 8: ['agosto'],\n 9: ['setembro'],\n 10: ['outubro'],\n 11: ['novembro'],\n 12: ['dezembro'],\n },\n 'es': {\n 1: ['enero'],\n 2: ['febrero'],\n 3: ['marzo'],\n 4: ['abril'],\n 5: ['mayo'],\n 6: ['junio'],\n 7: ['julio'],\n 8: ['agosto'],\n 9: ['septiembre', 'setiembre'],\n 10: ['octubre'],\n 11: ['noviembre'],\n 12: ['diciembre'],\n },\n 'jp': {\n 1: [u'1\u6708'],\n 2: [u'2\u6708'],\n 3: [u'3\u6708'],\n 4: [u'4\u6708'],\n 5: [u'5\u6708'],\n 6: [u'6\u6708'],\n 7: [u'7\u6708'],\n 8: [u'8\u6708'],\n 9: [u'9\u6708'],\n 10: [u'10\u6708'],\n 11: [u'11\u6708'],\n 12: [u'12\u6708'],\n },\n 'nl': {\n 1: ['januari'], 2: ['februari'], 3: ['maart'], 5: ['mei'], 6: ['juni'], 7: ['juli'], 8: ['augustus'], 10: ['oktober'],\n }\n\n } # }}}\n\n self.english_months = [None, 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',\n 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']\n self.months = months.get(self.domain, {})\n\n self.pd_xpath = '''\n //h2[text()=\"Product Details\" or \\\n text()=\"Produktinformation\" or \\\n text()=\"Dettagli prodotto\" or \\\n text()=\"Product details\" or \\\n text()=\"D\u00e9tails sur le produit\" or \\\n text()=\"Detalles del producto\" or \\\n text()=\"Detalhes do produto\" or \\\n text()=\"Productgegevens\" or \\\n text()=\"\u57fa\u672c\u4fe1\u606f\" or \\\n starts-with(text(), \"\u767b\u9332\u60c5\u5831\")]/../div[@class=\"content\"]\n '''\n # Editor: is for Spanish\n self.publisher_xpath = '''\n descendant::[starts-with(text(), \"Publisher:\") or \\\n starts-with(text(), \"Verlag:\") or \\\n starts-with(text(), \"Editore:\") or \\\n starts-with(text(), \"Editeur\") or \\\n starts-with(text(), \"Editor:\") or \\\n starts-with(text(), \"Editora:\") or \\\n starts-with(text(), \"Uitgever:\") or \\\n starts-with(text(), \"\u51fa\u7248\u793e:\")]\n '''\n self.publisher_names = {'Publisher', 'Uitgever', 'Verlag',\n 'Editore', 'Editeur', 'Editor', 'Editora', '\u51fa\u7248\u793e'}\n\n self.language_xpath = '''\n descendant::[\n starts-with(text(), \"Language:\") \\\n or text() = \"Language\" \\\n or text() = \"Sprache:\" \\\n or text() = \"Lingua:\" \\\n or text() = \"Idioma:\" \\\n or starts-with(text(), \"Langue\") \\\n or starts-with(text(), \"\u8a00\u8a9e\") \\\n or starts-with(text(), \"\u8bed\u79cd\")\n ]\n '''\n self.language_names = {'Language', 'Sprache',\n 'Lingua', 'Idioma', 'Langue', '\u8a00\u8a9e', 'Taal', '\u8bed\u79cd'}\n\n self.tags_xpath = '''\n descendant::h2[\n text() = \"Look for Similar Items by Category\" or\n text() = \"\u00c4hnliche Artikel finden\" or\n text() = \"Buscar productos similares por categor\u00eda\" or\n text() = \"Ricerca articoli simili per categoria\" or\n text() = \"Rechercher des articles similaires par rubrique\" or\n text() = \"Procure por itens similares por categoria\" or\n text() = \"\u95a2\u9023\u5546\u54c1\u3092\u63a2\u3059\"\ n ]/../descendant::ul/li\n '''\n\n self.ratings_pat = re.compile(\n r'([0-9.]+) ?(out of\|von\|van\|su\|\u00e9toiles sur\|\u3064\u661f\u306e\u3046\u3061\|de un m\u00e1ximo de\|de) ([\\d\\.]+)( (stars\|Sternen\|stelle\|estrellas\|estrelas\|sterren)) {0,1}')\n self.ratings_pat_cn = re.compile('\u5e73\u5747([0-9.]+)')\n\n lm = {\n 'eng': ('English', 'Englisch', 'Engels'),\n 'fra': ('French', 'Fran\u00e7ais'),\n 'ita': ('Italian', 'Italiano'),\n 'deu': ('German', 'Deutsch'),\n 'spa': ('Spanish', 'Espa\\xf1ol', 'Espaniol'),\n 'jpn': ('Japanese', u'\u65e5\u672c\u8a9e'),\n 'por': ('Portuguese', 'Portugu\u00eas'),\n 'nld': ('Dutch', 'Nederlands',),\n 'chs': ('Chinese', u'\u4e2d\u6587', u'\u7b80\u4f53\u4e2d\u6587'),\n }\n self.lang_map = {}\n for code, names in lm.iteritems():\n for name in names:\n self.lang_map[name] = code\n\n self.series_pat = re.compile(\n r'''\n \\\|\\s # Prefix\n (Series)\\s:\\s # Series declaration\n (?P<series>.+?)\\s+ # The series name\n \$(Book)\\s* # Book declaration\n (?P<index>[0-9.]+) # Series index\n \\s\$\n ''', re.X)\n\n def delocalize_datestr(self, raw):\n if self.domain == 'cn':\n return raw.replace('\u5e74', '-').replace('\u6708', '-').replace('\u65e5', '')\n if not self.months:\n return raw\n ans = raw.lower()\n for i, vals in self.months.iteritems():\n for x in vals:\n ans = ans.replace(x, self.english_months[i])\n ans = ans.replace(' de ', ' ')\n return ans\n\n def run(self):\n try:\n self.get_details()\n except:\n self.log.exception('get_details failed for url: %r' % self.url)\n\n def get_details(self):\n if self.preparsed_root is None:\n raw, root, selector = parse_details_page(\n self.url, self.log, self.timeout, self.browser, self.domain)\n else:\n raw, root, selector = self.preparsed_root\n\n from css_selectors import Select\n self.selector = Select(root)\n self.parse_details(raw, root)\n\n def parse_details(self, raw, root):\n asin = parse_asin(root, self.log, self.url)\n if not asin and root.xpath('//form[@action=\"/errors/validateCaptcha\"]'):\n raise CaptchaError(\n 'Amazon returned a CAPTCHA page, probably because you downloaded too many books. Wait for some time and try again.')\n if self.testing:\n import tempfile\n import uuid\n with tempfile.NamedTemporaryFile(prefix=(asin or str(uuid.uuid4())) + '_',\n suffix='.html', delete=False) as f:\n f.write(raw)\n print ('Downloaded html for', asin, 'saved in', f.name)\n\n try:\n title = self.parse_title(root)\n except:\n self.log.exception('Error parsing title for url: %r' % self.url)\n title = None\n\n try:\n authors = self.parse_authors(root)\n except:\n self.log.exception('Error parsing authors for url: %r' % self.url)\n authors = []\n\n if not title or not authors or not asin:\n self.log.error(\n 'Could not find title/authors/asin for %r' % self.url)\n self.log.error('ASIN: %r Title: %r Authors: %r' % (asin, title,\n authors))\n return\n\n mi = Metadata(title, authors)\n idtype = 'amazon' if self.domain == 'com' else 'amazon_' + self.domain\n mi.set_identifier(idtype, asin)\n self.amazon_id = asin\n\n try:\n mi.rating = self.parse_rating(root)\n except:\n self.log.exception('Error parsing ratings for url: %r' % self.url)\n\n try:\n mi.comments = self.parse_comments(root, raw)\n except:\n self.log.exception('Error parsing comments for url: %r' % self.url)\n\n try:\n series, series_index = self.parse_series(root)\n if series:\n mi.series, mi.series_index = series, series_index\n elif self.testing:\n mi.series, mi.series_index = 'Dummy series for testing', 1\n except:\n self.log.exception('Error parsing series for url: %r' % self.url)\n\n try:\n mi.tags = self.parse_tags(root)\n except:\n self.log.exception('Error parsing tags for url: %r' % self.url)\n\n try:\n self.cover_url = self.parse_cover(root, raw)\n except:\n self.log.exception('Error parsing cover for url: %r' % self.url)\n if self.cover_url_processor is not None and self.cover_url.startswith('/'):\n self.cover_url = self.cover_url_processor(self.cover_url)\n mi.has_cover = bool(self.cover_url)\n\n non_hero = tuple(self.selector(\n 'div#bookDetails_container_div div#nonHeroSection'))\n if non_hero:\n # New style markup\n try:\n self.parse_new_details(root, mi, non_hero[0])\n except:\n self.log.exception(\n 'Failed to parse new-style book details section')\n else:\n pd = root.xpath(self.pd_xpath)\n if pd:\n pd = pd[0]\n\n try:\n isbn = self.parse_isbn(pd)\n if isbn:\n self.isbn = mi.isbn = isbn\n except:\n self.log.exception(\n 'Error parsing ISBN for url: %r' % self.url)\n\n try:\n mi.publisher = self.parse_publisher(pd)\n except:\n self.log.exception(\n 'Error parsing publisher for url: %r' % self.url)\n\n try:\n mi.pubdate = self.parse_pubdate(pd)\n except:\n self.log.exception(\n 'Error parsing publish date for url: %r' % self.url)\n\n try:\n lang = self.parse_language(pd)\n if lang:\n mi.language = lang\n except:\n self.log.exception(\n 'Error parsing language for url: %r' % self.url)\n\n else:\n self.log.warning(\n 'Failed to find product description for url: %r' % self.url)\n\n mi.source_relevance = self.relevance\n\n if self.amazon_id:\n if self.isbn:\n self.plugin.cache_isbn_to_identifier(self.isbn, self.amazon_id)\n if self.cover_url:\n self.plugin.cache_identifier_to_cover_url(self.ama zon_id,\n self.cover_url)\n\n self.plugin.clean_downloaded_metadata(mi)\n\n if self.filter_result(mi, self.log):\n self.result_queue.put(mi)\n\n def totext(self, elem):\n return self.tostring(elem, encoding=unicode, method='text').strip()\n\n def parse_title(self, root):\n h1 = root.xpath('//h1[@id=\"title\"]')\n if h1:\n h1 = h1[0]\n for child in h1.xpath('./[contains(@class, \"a-color-secondary\")]'):\n h1.remove(child)\n return self.totext(h1)\n tdiv = root.xpath('//h1[contains(@class, \"parseasinTitle\")]')[0]\n actual_title = tdiv.xpath('descendant::[@id=\"btAsinTitle\"]')\n if actual_title:\n title = self.tostring(actual_title[0], encoding=unicode,\n method='text').strip()\n else:\n title = self.tostring(tdiv, encoding=unicode,\n method='text').strip()\n ans = re.sub(r'[(\\[].[)\\]]', '', title).strip()\n if not ans:\n ans = title.rpartition('[')[0].strip()\n return ans\n\n def parse_authors(self, root):\n for sel in (\n '#byline .author .contributorNameID',\n '#byline .author a.a-link-normal',\n '#bylineInfo .author .contributorNameID',\n '#bylineInfo .author a.a-link-normal'\n ):\n matches = tuple(self.selector(sel))\n if matches:\n authors = [self.totext(x) for x in matches]\n return [a for a in authors if a]\n\n x = '//h1[contains(@class, \"parseasinTitle\")]/following-sibling::span/[(name()=\"a\" and @href) or (name()=\"span\" and @class=\"contributorNameTrigger\")]'\n aname = root.xpath(x)\n if not aname:\n aname = root.xpath('''\n //h1[contains(@class, \"parseasinTitle\")]/following-sibling::[(name()=\"a\" and @href) or (name()=\"span\" and @class=\"contributorNameTrigger\")]\n ''')\n for x in aname:\n x.tail = ''\n authors = [self.tostring(x, encoding=unicode, method='text').strip() for x\n in aname]\n authors = [a for a in authors if a]\n return authors\n\n def parse_rating(self, root):\n for x in root.xpath('//div[@id=\"cpsims-feature\" or @id=\"purchase-sims-feature\" or @id=\"rhf\"]'):\n # Remove the similar books section as it can cause spurious\n # ratings matches\n x.getparent().remove(x)\n\n rating_paths = ('//div[@data-feature-name=\"averageCustomerReviews\" or @id=\"averageCustomerReviews\"]',\n '//div[@class=\"jumpBar\"]/descendant::span[contains(@class,\"asinReviewsSummary\")]',\n '//div[@class=\"buying\"]/descendant::span[contains(@class,\"asinReviewsSummary\")]',\n '//span[@class=\"crAvgStars\"]/descendant::span[contains(@class,\"asinReviewsSummary\")]')\n ratings = None\n for p in rating_paths:\n ratings = root.xpath(p)\n if ratings:\n break\n if ratings:\n for elem in ratings[0].xpath('descendant::[@title]'):\n t = elem.get('title').strip()\n if self.domain == 'cn':\n m = self.ratings_pat_cn.match(t)\n if m is not None:\n return float(m.group(1))\n else:\n m = self.ratings_pat.match(t)\n if m is not None:\n return float(m.group(1)) / float(m.group(3)) 5\n\n def _render_comments(self, desc):\n from calibre.library.comments import sanitize_comments_html\n\n for c in desc.xpath('descendant::noscript'):\n c.getparent().remove(c)\n for c in desc.xpath('descendant::[@class=\"seeAll\" or'\n ' @class=\"emptyClear\" or @id=\"collapsePS\" or'\n ' @id=\"expandPS\"]'):\n c.getparent().remove(c)\n for b in desc.xpath('descendant::b[@style]'):\n # Bing highlights search results\n s = b.get('style', '')\n if 'color' in s:\n b.tag = 'span'\n del b.attrib['style']\n\n for a in desc.xpath('descendant::a[@href]'):\n del a.attrib['href']\n a.tag = 'span'\n desc = self.tostring(desc, method='html', encoding=unicode).strip()\n\n # Encoding bug in Amazon data U+fffd (replacement char)\n # in some examples it is present in place of '\n desc = desc.replace('\\ufffd', \"'\")\n # remove all attributes from tags\n desc = re.sub(r'<([a-zA-Z0-9]+)\\s[^>]+>', r'<\\1>', desc)\n # Collapse whitespace\n # desc = re.sub('\\n+', '\\n', desc)\n # desc = re.sub(' +', ' ', desc)\n # Remove the notice about text referring to out of print editions\n desc = re.sub(r'(?s)<em>--This text ref.?</em>', '', desc)\n # Remove comments\n desc = re.sub(r'(?s)<!--.?-->', '', desc)\n return sanitize_comments_html(desc)\n\n def parse_comments(self, root, raw):\n from urllib import unquote\n ans = ''\n ns = tuple(self.selector('#bookDescription_feature_div noscript'))\n if ns:\n ns = ns[0]\n if len(ns) == 0 and ns.text:\n import html5lib\n # html5lib parsed noscript as CDATA\n ns = html5lib.parseFragment(\n '<div>%s</div>' % (ns.text), treebuilder='lxml', namespaceHTMLElements=False)[0]\n else:\n ns.tag = 'div'\n ans = self._render_comments(ns)\n else:\n desc = root.xpath('//div[@id=\"ps-content\"]/div[@class=\"content\"]')\n if desc:\n ans = self._render_comments(desc[0])\n\n desc = root.xpath(\n '//div[@id=\"productDescription\"]/[@class=\"content\"]')\n if desc:\n ans += self._render_comments(desc[0])\n else:\n # Idiot chickens from amazon strike again. This data is now stored\n # in a JS variable inside a script tag URL encoded.\n m = re.search(br'var\\s+iframeContent\\s=\\s\"([^\"]+)\"', raw)\n if m is not None:\n try:\n text = unquote(m.group(1)).decode('utf-8')\n nr = parse_html(text)\n desc = nr.xpath(\n '//div[@id=\"productDescription\"]/[@class=\"content\"]')\n if desc:\n ans += self._render_comments(desc[0])\n except Exception as e:\n self.log.warn(\n 'Parsing of obfuscated product description failed with error: %s' % as_unicode(e))\n\n return ans\n\n def parse_series(self, root):\n ans = (None, None)\n\n # This is found on the paperback/hardback pages for books on amazon.com\n series = root.xpath('//div[@data-feature-name=\"seriesTitle\"]')\n if series:\n series = series[0]\n spans = series.xpath('./span')\n if spans:\n raw = self.tostring(\n spans[0], encoding=unicode, method='text', with_tail=False).strip()\n m = re.search(r'\\s+([0-9.]+)$', raw.strip())\n if m is not None:\n series_index = float(m.group(1))\n s = series.xpath('./a[@id=\"series-page-link\"]')\n if s:\n series = self.tostring(\n s[0], encoding=unicode, method='text', with_tail=False).strip()\n if series:\n ans = (series, series_index)\n # This is found on Kindle edition pages on amazon.com\n if ans == (None, None):\n for span in root.xpath('//div[@id=\"aboutEbooksSection\"]//li/span'):\n text = (span.text or '').strip()\n m = re.match(r'Book\\s+([0-9.]+)', text)\n if m is not None:\n series_index = float(m.group(1))\n a = span.xpath('./a[@href]')\n if a:\n series = self.tostring(\n a[0], encoding=unicode, method='text', with_tail=False).strip()\n if series:\n ans = (series, series_index)\n # This is found on newer Kindle edition pages on amazon.com\n if ans == (None, None):\n for b in root.xpath('//div[@id=\"reviewFeatureGroup\"]/span/b'):\n text = (b.text or '').strip()\n m = re.match(r'Book\\s+([0-9.]+)', text)\n if m is not None:\n series_index = float(m.group(1))\n a = b.getparent().xpath('./a[@href]')\n if a:\n series = self.tostring(\n a[0], encoding=unicode, method='text', with_tail=False).partition('(')[0].strip()\n if series:\n ans = series, series_index\n\n if ans == (None, None):\n desc = root.xpath('//div[@id=\"ps-content\"]/div[@class=\"buying\"]')\n if desc:\n raw = self.tostring(desc[0], method='text', encoding=unicode)\n raw = re.sub(r'\\s+', ' ', raw)\n match = self.series_pat.search(raw)\n if match is not None:\n s, i = match.group('series'), float(match.group('index'))\n if s:\n ans = (s, i)\n if ans[0]:\n ans = (re.sub(r'\\s+Series$', '', ans[0]).strip(), ans[1])\n ans = (re.sub(r'\$.+?\\s+Series\$$', '', ans[0]).strip(), ans[1])\n return ans\n\n def parse_tags(self, root):\n ans = []\n exclude_tokens = {'kindle', 'a-z'}\n exclude = {'special features', 'by authors',\n 'authors & illustrators', 'books', 'new; used & rental textbooks'}\n seen = set()\n for li in root.xpath(self.tags_xpath):\n for i, a in enumerate(li.iterdescendants('a')):\n if i > 0:\n # we ignore the first category since it is almost always\n # too broad\n raw = (a.text or '').strip().replace(',', ';')\n lraw = icu_lower(raw)\n tokens = frozenset(lraw.split())\n if raw and lraw not in exclude and not tokens.intersection(exclude_tokens) and lraw not in seen:\n ans.append(raw)\n seen.add(lraw)\n return ans\n\n def parse_cover(self, root, raw=b\"\"):\n # Look for the image URL in javascript, using the first image in the\n # image gallery as the cover\n import json\n imgpat = re.compile(r\"\"\"'imageGalleryData'\\s:\\s(\\[\\s{.+])\"\"\")\n for script in root.xpath('//script'):\n m = imgpat.search(script.text or '')\n if m is not None:\n try:\n return json.loads(m.group(1))[0]['mainUrl']\n except Exception:\n continue\n\n def clean_img_src(src):\n parts = src.split('/')\n if len(parts) > 3:\n bn = parts[-1]\n sparts = bn.split('_')\n if len(sparts) > 2:\n bn = re.sub(r'\\.\\.jpg$', '.jpg', (sparts[0] + sparts[-1]))\n return ('/'.join(parts[:-1])) + '/' + bn\n\n imgpat2 = re.compile(r'var imageSrc = \"([^\"]+)\"')\n for script in root.xpath('//script'):\n m = imgpat2.search(script.text or '')\n if m is not None:\n src = m.group(1)\n url = clean_img_src(src)\n if url:\n return url\n\n imgs = root.xpath(\n '//img[(@id=\"prodImage\" or @id=\"original-main-image\" or @id=\"main-image\" or @id=\"main-image-nonjs\") and @src]')\n if not imgs:\n imgs = (\n root.xpath('//div[@class=\"main-image-inner-wrapper\"]/img[@src]') or\n root.xpath('//div[@id=\"main-image-container\" or @id=\"ebooks-main-image-container\"]//img[@src]') or\n root.xpath(\n '//div[@id=\"mainImageContainer\"]//img[@data-a-dynamic-image]')\n )\n for img in imgs:\n try:\n idata = json.loads(img.get('data-a-dynamic-image'))\n except Exception:\n imgs = ()\n else:\n mwidth = 0\n try:\n url = None\n for iurl, (width, height) in idata.iteritems():\n if width > mwidth:\n mwidth = width\n url = iurl\n return url\n except Exception:\n pass\n\n for img in imgs:\n src = img.get('src')\n if 'data:' in src:\n continue\n if 'loading-' in src:\n js_img = re.search(br'\"largeImage\":\"(https?://[^\"]+)\",', raw)\n if js_img:\n src = js_img.group(1).decode('utf-8')\n if ('/no-image-avail' not in src and 'loading-' not in src and '/no-img-sm' not in src):\n self.log('Found image: %s' % src)\n url = clean_img_src(src)\n if url:\n return url\n\n def parse_new_details(self, root, mi, non_hero):\n table = non_hero.xpath('descendant::table')[0]\n for tr in table.xpath('descendant::tr'):\n cells = tr.xpath('descendant::td')\n if len(cells) == 2:\n name = self.totext(cells[0])\n val = self.totext(cells[1])\n if not val:\n continue\n if name in self.language_names:\n ans = self.lang_map.get(val, None)\n if not ans:\n ans = canonicalize_lang(val)\n if ans:\n mi.language = ans\n elif name in self.publisher_names:\n pub = val.partition(';')[0].partition('(')[0].strip()\n if pub:\n mi.publisher = pub\n date = val.rpartition('(')[-1].replace(')', '').strip()\n try:\n from calibre.utils.date import parse_only_date\n date = self.delocalize_datestr(date)\n mi.pubdate = parse_only_date(date, assume_utc=True)\n except:\n self.log.exception('Failed to parse pubdate: %s' % val)\n elif name in {'ISBN', 'ISBN-10', 'ISBN-13'}:\n ans = check_isbn(val)\n if ans:\n self.isbn = mi.isbn = ans\n\n def parse_isbn(self, pd):\n items = pd.xpath(\n 'descendant::[starts-with(text(), \"ISBN\")]')\n if not items:\n items = pd.xpath(\n 'descendant::b[contains(text(), \"ISBN:\")]')\n for x in reversed(items):\n if x.tail:\n ans = check_isbn(x.tail.strip())\n if ans:\n return ans\n\n def parse_publisher(self, pd):\n for x in reversed(pd.xpath(self.publisher_xpath)):\n if x.tail:\n ans = x.tail.partition(';')[0]\n return ans.partition('(')[0].strip()\n\n def parse_pubdate(self, pd):\n for x in reversed(pd.xpath(self.publisher_xpath)):\n if x.tail:\n from calibre.utils.date import parse_only_date\n ans = x.tail\n date = ans.rpartition('(')[-1].replace(')', '').strip()\n date = self.delocalize_datestr(date)\n return parse_only_date(date, assume_utc=True)\n\n def parse_language(self, pd):\n for x in reversed(pd.xpath(self.language_xpath)):\n if x.tail:\n raw = x.tail.strip().partition(',')[0].strip()\n ans = self.lang_map.get(raw, None)\n if ans:\n return ans\n ans = canonicalize_lang(ans)\n if ans:\n return ans\n# }}}\n\n\nclass Amazon(Source):\n\n name = 'Amazon.com'\n version = (1, 2, 3)\n minimum_calibre_version = (2, 82, 0)\n description = _('Downloads metadata and covers from Amazon')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset(['title', 'authors', 'identifier:amazon',\n 'rating', 'comments', 'publisher', 'pubdate',\n 'languages', 'series', 'tags'])\n has_html_comments = True\n supports_gzip_transfer_encoding = True\n prefer_results_with_isbn = False\n\n AMAZON_DOMAINS = {\n 'com': _('US'),\n 'fr': _('France'),\n 'de': _('Germany'),\n 'uk': _('UK'),\n 'au': _('Australia'),\n 'it': _('Italy'),\n 'jp': _('Japan'),\n 'es': _('Spain'),\n 'br': _('Brazil'),\n 'nl': _('Netherlands'),\n 'cn': _('China'),\n 'ca': _('Canada'),\n }\n\n SERVERS = {\n 'auto': _('Choose server automatically'),\n 'amazon': _('Amazon servers'),\n 'bing': _('Bing search cache'),\n 'google': _('Google search cache'),\n 'wayback': _('Wayback machine cache (slow)'),\n }\n\n options = (\n Option('domain', 'choices', 'com', _('Amazon country website to use:'),\n _('Metadata from Amazon will be fetched using this '\n 'country\\'s Amazon website.'), choices=AMAZON_DOMAINS),\n Option('server', 'choices', 'auto', _('Server to get data from:'),\n _(\n 'Amazon has started blocking attempts to download'\n ' metadata from its servers. To get around this problem,'\n ' calibre can fetch the Amazon data from many different'\n ' places where it is cached. Choose the source you prefer.'\n ), choices=SERVERS),\n )\n\n def __init__(self, args, *kwargs):\n Source.__init__(self, args, *kwargs)\n self.set_amazon_id_touched_fields()\n\n def test_fields(self, mi):\n '''\n Return the first field from self.touched_fields that is null on the\n mi object\n '''\n for key in self.touched_fields:\n if key.startswith('identifier:'):\n key = key.partition(':')[-1]\n if key == 'amazon':\n if self.domain != 'com':\n key += '_' + self.domain\n if not mi.has_identifier(key):\n return 'identifier: ' + key\n elif mi.is_null(key):\n return key\n\n @property\n def browser(self):\n global ua_index\n if self.use_search_engine:\n if self._browser is None:\n ua = random_user_agent(allow_ie=False)\n self._browser = br = browser(user_agent=ua)\n br.set_handle_gzip(True)\n br.addheaders += [\n ('Accept', accept_header_for_ua(ua)),\n ('Upgrade-insecure-requests', '1'),\n ]\n br = self._browser\n else:\n all_uas = all_user_agents()\n ua_index = (ua_index + 1) % len(all_uas)\n ua = all_uas[ua_index]\n self._browser = br = browser(user_agent=ua)\n br.set_handle_gzip(True)\n br.addheaders += [\n ('Accept', accept_header_for_ua(ua)),\n ('Upgrade-insecure-requests', '1'),\n ('Referer', self.referrer_for_domain()),\n ]\n return br\n\n def save_settings(self, args, *kwargs):\n Source.save_settings(self, args, *kwargs)\n self.set_amazon_id_touched_fields()\n\n def set_amazon_id_touched_fields(self):\n ident_name = \"identifier:amazon\"\n if self.domain != 'com':\n ident_name += '_' + self.domain\n tf = [x for x in self.touched_fields if not\n x.startswith('identifier:amazon')] + [ident_name]\n self.touched_fields = frozenset(tf)\n\n def get_domain_and_asin(self, identifiers, extra_domains=()):\n for key, val in identifiers.iteritems():\n key = key.lower()\n if key in ('amazon', 'asin'):\n return 'com', val\n if key.startswith('amazon_'):\n domain = key.partition('_')[-1]\n if domain and (domain in self.AMAZON_DOMAINS or domain in extra_domains):\n return domain, val\n return None, None\n\n def referrer_for_domain(self, domain=None):\n domain = domain or self.domain\n return {\n 'uk': 'https://www.amazon.co.uk/',\n 'au': 'https://www.amazon.com.au/',\n 'br': 'https://www.amazon.com.br/',\n }.get(domain, 'https://www.amazon.%s/' % domain)\n\n def _get_book_url(self, identifiers): # {{{\n domain, asin = self.get_domain_and_asin(\n identifiers, extra_domains=('in', 'au', 'ca'))\n if domain and asin:\n url = None\n r = self.referrer_for_domain(domain)\n if r is not None:\n url = r + 'dp/' + asin\n if url:\n idtype = 'amazon' if domain == 'com' else 'amazon_' + domain\n return domain, idtype, asin, url\n\n def get_book_url(self, identifiers):\n ans = self._get_book_url(identifiers)\n if ans is not None:\n return ans[1:]\n\n def get_book_url_name(self, idtype, idval, url):\n if idtype == 'amazon':\n return self.name\n return 'A' + idtype.replace('_', '.')[1:]\n # }}}\n\n @property\n def domain(self):\n x = getattr(self, 'testing_domain', None)\n if x is not None:\n return x\n domain = self.prefs['domain']\n if domain not in self.AMAZON_DOMAINS:\n domain = 'com'\n\n return domain\n\n @property\n def server(self):\n x = getattr(self, 'testing_server', None)\n if x is not None:\n return x\n server = self.prefs['server']\n if server not in self.SERVERS:\n server = 'auto'\n return server\n\n @property\n def use_search_engine(self):\n return self.server != 'amazon'\n\n def clean_downloaded_metadata(self, mi):\n docase = (\n mi.language == 'eng' or\n (mi.is_null('language') and self.domain in {'com', 'uk', 'au'})\n )\n if mi.title and docase:\n # Remove series information from title\n m = re.search(r'\\S+\\s+(\$.+?\\s+Book\\s+\\d+\$)$', mi.title)\n if m is not None:\n mi.title = mi.title.replace(m.group(1), '').strip()\n mi.title = fixcase(mi.title)\n mi.authors = fixauthors(mi.authors)\n if mi.tags and docase:\n mi.tags = list(map(fixcase, mi.tags))\n mi.isbn = check_isbn(mi.isbn)\n if mi.series and docase:\n mi.series = fixcase(mi.series)\n if mi.title and mi.series:\n for pat in (r':\\sBook\\s+\\d+\\s+of\\s+%s$', r'\$%s\$$', r':\\s%s\\s+Book\\s+\\d+$'):\n pat = pat % re.escape(mi.series)\n q = re.sub(pat, '', mi.title, flags=re.I).strip()\n if q and q != mi.title:\n mi.title = q\n break\n\n def get_website_domain(self, domain):\n return {'uk': 'co.uk', 'jp': 'co.jp', 'br': 'com.br', 'au': 'com.au'}.get(domain, domain)\n\n def create_query(self, log, title=None, authors=None, identifiers={}, # {{{\n domain=None, for_amazon=True):\n from urllib import urlencode\n if domain is None:\n domain = self.domain\n\n idomain, asin = self.get_domain_and_asin(identifiers)\n if idomain is not None:\n domain = idomain\n\n # See the amazon detailed search page to get all options\n terms = []\n q = {'search-alias': 'aps',\n 'unfiltered': '1',\n }\n\n if domain == 'com':\n q['sort'] = 'relevanceexprank'\n else:\n q['sort'] = 'relevancerank'\n\n isbn = check_isbn(identifiers.get('isbn', None))\n\n if asin is not None:\n q['field-keywords'] = asin\n terms.append(asin)\n elif isbn is not None:\n q['field-isbn'] = isbn\n if len(isbn) == 13:\n terms.extend('({} OR {}-{})'.format(isbn, isbn[:3], isbn[3:]).split())\n else:\n terms.append(isbn)\n else:\n # Only return book results\n q['search-alias'] = {'br': 'digital-text',\n 'nl': 'aps'}.get(domain, 'stripbooks')\n if title:\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n q['field-title'] = ' '.join(title_tokens)\n terms.extend(title_tokens)\n if authors:\n author_tokens = list(self.get_author_tokens(authors,\n only_first_author=True))\n if author_tokens:\n q['field-author'] = ' '.join(author_tokens)\n terms.extend(author_tokens)\n\n if not ('field-keywords' in q or 'field-isbn' in q or\n ('field-title' in q)):\n # Insufficient metadata to make an identify query\n return None, None\n\n if not for_amazon:\n return terms, domain\n\n # magic parameter to enable Japanese Shift_JIS encoding.\n if domain == 'jp':\n q['__mk_ja_JP'] = u'\u30ab\u30bf\u30ab\u30ca'\n if domain == 'nl':\n q['__mk_nl_NL'] = u'\u00c5M\u00c5\u017d\u00d5\u00d1'\n if 'field-keywords' not in q:\n q['field-keywords'] = ''\n for f in 'field-isbn field-title field-author'.split():\n q['field-keywords'] += ' ' + q.pop(f, '')\n q['field-keywords'] = q['field-keywords'].strip()\n\n if domain == 'jp':\n encode_to = 'Shift_JIS'\n elif domain == 'nl' or domain == 'cn':\n encode_to = 'utf-8'\n else:\n encode_to = 'latin1'\n encoded_q = dict([(x.encode(encode_to, 'ignore'), y.encode(encode_to,\n 'ignore')) for x, y in\n q.iteritems()])\n url = 'https://www.amazon.%s/s/?' % self.get_website_domain(\n domain) + urlencode(encoded_q)\n return url, domain\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n domain, asin = self.get_domain_and_asin(identifiers)\n if asin is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n asin = self.cached_isbn_to_identifier(isbn)\n if asin is not None:\n url = self.cached_identifier_to_cover_url(asin)\n\n return url\n # }}}\n\n def parse_results_page(self, root, domain): # {{{\n from lxml.html import tostring\n\n matches = []\n\n def title_ok(title):\n title = title.lower()\n bad = ['bulk pack', '[audiobook]', '[audio cd]',\n '(a book companion)', '( slipcase with door )', ': free sampler']\n if self.domain == 'com':\n bad.extend(['(%s edition)' % x for x in ('spanish', 'german')])\n for x in bad:\n if x in title:\n return False\n if title and title[0] in '[{' and re.search(r'\$\\sauthor\\s\$', title) is not None:\n # Bad entries in the catalog\n return False\n return True\n\n for a in root.xpath(r'//li[starts-with(@id, \"result_\")]//a[@href and contains(@class, \"s-access-detail-page\")]'):\n title = tostring(a, method='text', encoding=unicode)\n if title_ok(title):\n url = a.get('href')\n if url.startswith('/'):\n url = 'https://www.amazon.%s%s' % (\n self.get_website_domain(domain), url)\n matches.append(url)\n\n if not matches:\n # Previous generation of results page markup\n for div in root.xpath(r'//div[starts-with(@id, \"result_\")]'):\n links = div.xpath(r'descendant::a[@class=\"title\" and @href]')\n if not links:\n # New amazon markup\n links = div.xpath('descendant::h3/a[@href]')\n for a in links:\n title = tostring(a, method='text', encoding=unicode)\n if title_ok(title):\n url = a.get('href')\n if url.startswith('/'):\n url = 'https://www.amazon.%s%s' % (\n self.get_website_domain(domain), url)\n matches.append(url)\n break\n\n if not matches:\n # This can happen for some user agents that Amazon thinks are\n # mobile/less capable\n for td in root.xpath(\n r'//div[@id=\"Results\"]/descendant::td[starts-with(@id, \"search:Td:\")]'):\n for a in td.xpath(r'descendant::td[@class=\"dataColumn\"]/descendant::a[@href]/span[@class=\"srTitle\"]/..'):\n title = tostring(a, method='text', encoding=unicode)\n if title_ok(title):\n url = a.get('href')\n if url.startswith('/'):\n url = 'https://www.amazon.%s%s' % (\n self.get_website_domain(domain), url)\n matches.append(url)\n break\n if not matches and root.xpath('//form[@action=\"/errors/validateCaptcha\"]'):\n raise CaptchaError('Amazon returned a CAPTCHA page. Recently Amazon has begun using statistical'\n ' profiling to block access to its website. As such this metadata plugin is'\n ' unlikely to ever work reliably.')\n\n # Keep only the top 3 matches as the matches are sorted by relevance by\n # Amazon so lower matches are not likely to be very relevant\n return matches[:3]\n # }}}\n\n def search_amazon(self, br, testing, log, abort, title, authors, identifiers, timeout): # {{{\n from calibre.utils.cleantext import clean_ascii_chars\n from calibre.ebooks.chardet import xml_to_unicode\n matches = []\n query, domain = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers)\n if query is None:\n log.error('Insufficient metadata to construct query')\n raise SearchFailed()\n try:\n raw = br.open_novisit(query, timeout=timeout).read().strip()\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and \\\n e.getcode() == 404:\n log.error('Query malformed: %r' % query)\n raise SearchFailed()\n attr = getattr(e, 'args', [None])\n attr = attr if attr else [None]\n if isinstance(attr[0], socket.timeout):\n msg = _('Amazon timed out. Try again later.')\n log.error(msg)\n else:\n msg = 'Failed to make identify query: %r' % query\n log.exception(msg)\n raise SearchFailed()\n\n raw = clean_ascii_chars(xml_to_unicode(raw,\n strip_encoding_pats=True, resolve_entities=True)[0])\n\n if testing:\n import tempfile\n with tempfile.NamedTemporaryFile(prefix='amazon_results _',\n suffix='.html', delete=False) as f:\n f.write(raw.encode('utf-8'))\n print ('Downloaded html for results page saved in', f.name)\n\n matches = []\n found = '<title>404 - ' not in raw\n\n if found:\n try:\n root = parse_html(raw)\n except Exception:\n msg = 'Failed to parse amazon page for query: %r' % query\n log.exception(msg)\n raise SearchFailed()\n\n matches = self.parse_results_page(root, domain)\n\n return matches, query, domain, None\n # }}}\n\n def search_search_engine(self, br, testing, log, abort, title, authors, identifiers, timeout, override_server=None): # {{{\n from calibre.ebooks.metadata.sources.update import search_engines_module\n terms, domain = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers, for_amazon=False)\n site = self.referrer_for_domain(\n domain)[len('https://'):].partition('/')[0]\n matches = []\n se = search_engines_module()\n server = override_server or self.server\n if server in ('bing',):\n urlproc, sfunc = se.bing_url_processor, se.bing_search\n elif server in ('auto', 'google'):\n urlproc, sfunc = se.google_url_processor, se.google_search\n elif server == 'wayback':\n urlproc, sfunc = se.wayback_url_processor, se.ddg_search\n results, qurl = sfunc(terms, site, log=log, br=br, timeout=timeout)\n br.set_current_header('Referer', qurl)\n for result in results:\n if abort.is_set():\n return matches, terms, domain, None\n\n purl = urlparse(result.url)\n if '/dp/' in purl.path and site in purl.netloc:\n url = result.cached_url\n if url is None:\n url = se.wayback_machine_cached_url(\n result.url, br, timeout=timeout)\n if url is None:\n log('Failed to find cached page for:', result.url)\n continue\n if url not in matches:\n matches.append(url)\n if len(matches) >= 3:\n break\n else:\n log('Skipping non-book result:', result)\n if not matches:\n log('No search engine results for terms:', ' '.join(terms))\n if urlproc is se.google_url_processor:\n # Google does not cache adult titles\n log('Trying the bing search engine instead')\n return self.search_search_engine(br, testing, log, abort, title, authors, identifiers, timeout, 'bing')\n return matches, terms, domain, urlproc\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=60):\n '''\n Note this method will retry without identifiers automatically if no\n match is found with identifiers.\n '''\n\n testing = getattr(self, 'running_a_test', False)\n\n udata = self._get_book_url(identifiers)\n br = self.browser\n log('User-agent:', br.current_user_agent())\n log('Server:', self.server)\n if testing:\n print('User-agent:', br.current_user_agent())\n if udata is not None and not self.use_search_engine:\n # Try to directly get details page instead of running a search\n # Cannot use search engine as the directly constructed URL is\n # usually redirected to a full URL by amazon, and is therefore\n # not cached\n domain, idtype, asin, durl = udata\n if durl is not None:\n preparsed_root = parse_details_page(\n durl, log, timeout, br, domain)\n if preparsed_root is not None:\n qasin = parse_asin(preparsed_root[1], log, durl)\n if qasin == asin:\n w = Worker(durl, result_queue, br, log, 0, domain,\n self, testing=testing, preparsed_root=preparsed_root, timeout=timeout)\n try:\n w.get_details()\n return\n except Exception:\n log.exception(\n 'get_details failed for url: %r' % durl)\n func = self.search_search_engine if self.use_search_engine else self.search_amazon\n try:\n matches, query, domain, cover_url_processor = func(\n br, testing, log, abort, title, authors, identifiers, timeout)\n except SearchFailed:\n return\n\n if abort.is_set():\n return\n\n if not matches:\n if identifiers and title and authors:\n log('No matches found with identifiers, retrying using only'\n ' title and authors. Query: %r' % query)\n time.sleep(1)\n return self.identify(log, result_queue, abort, title=title,\n authors=authors, timeout=timeout)\n log.error('No matches found with query: %r' % query)\n return\n\n workers = [Worker(\n url, result_queue, br, log, i, domain, self, testing=testing, timeout=timeout,\n cover_url_processor=cover_url_processor, filter_result=partial(\n self.filter_result, title, authors, identifiers)) for i, url in enumerate(matches)]\n\n for w in workers:\n # Don't send all requests at the same time\n time.sleep(1)\n w.start()\n if abort.is_set():\n return\n\n while not abort.is_set():\n a_worker_is_alive = False\n for w in workers:\n w.join(0.2)\n if abort.is_set():\n break\n if w.is_alive():\n a_worker_is_alive = True\n if not a_worker_is_alive:\n break\n\n return None\n # }}}\n\n def filter_result(self, title, authors, identifiers, mi, log): # {{{\n if not self.use_search_engine:\n return True\n if title is not None:\n tokens = {icu_lower(x) for x in title.split() if len(x) > 3}\n if tokens:\n result_tokens = {icu_lower(x) for x in mi.title.split()}\n if not tokens.intersection(result_tokens):\n log('Ignoring result:', mi.title, 'as its title does not match')\n return False\n if authors:\n author_tokens = set()\n for author in authors:\n author_tokens \|= {icu_lower(x) for x in author.split() if len(x) > 2}\n result_tokens = set()\n for author in mi.authors:\n result_tokens \|= {icu_lower(x) for x in author.split() if len(x) > 2}\n if author_tokens and not author_tokens.intersection(result_tokens):\n log('Ignoring result:', mi.title, 'by', ' & '.join(mi.authors), 'as its author does not match')\n return False\n return True\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=60, get_best_cover=False):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n log('Downloading cover from:', cached_url)\n br = self.browser\n if self.use_search_engine:\n br = br.clone_browser()\n br.set_current_header('Referer', self.referrer_for_domain(self.domain))\n try:\n time.sleep(1)\n cdata = br.open_novisit(\n cached_url, timeout=timeout).read()\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n # }}}\n\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug\n # src/calibre/ebooks/metadata/sources/amazon.py\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n isbn_test, title_test, authors_test, comments_test, series_test)\n com_tests = [ # {{{\n\n ( # Paperback with series\n {'identifiers': {'amazon': '1423146786'}},\n [title_test('The Heroes of Olympus, Book Five The Blood of Olympus',\n exact=True), series_test('Heroes of Olympus', 5)]\n ),\n\n ( # Kindle edition with series\n {'identifiers': {'amazon': 'B0085UEQDO'}},\n [title_test('Three Parts Dead', exact=True),\n series_test('Craft Sequence', 1)]\n ),\n\n ( # + in title and uses id=\"main-image\" for cover\n {'identifiers': {'amazon': '1933988770'}},\n [title_test(\n 'C++ Concurrency in Action: Practical Multithreading', exact=True)]\n ),\n\n\n ( # Different comments markup, using Book Description section\n {'identifiers': {'amazon': '0982514506'}},\n [title_test(\n \"Griffin's Destiny: Book Three: The Griffin's Daughter Trilogy\",\n exact=True),\n comments_test('Jelena'), comments_test('Ashinji'),\n ]\n ),\n\n ( # # in title\n {'title': 'Expert C# 2008 Business Objects',\n 'authors': ['Lhotka']},\n [title_test('Expert C#'),\n authors_test(['Rockford Lhotka'])\n ]\n ),\n\n ( # No specific problems\n {'identifiers': {'isbn': '0743273567'}},\n [title_test('The great gatsby', exact=True),\n authors_test(['F. Scott Fitzgerald'])]\n ),\n\n ]\n\n # }}}\n\n de_tests = [ # {{{\n (\n {'identifiers': {'isbn': '9783453314979'}},\n [title_test('Die letzten W\u00e4chter: Roman',\n exact=False), authors_test(['Sergej Lukianenko'])\n ]\n\n ),\n\n (\n {'identifiers': {'isbn': '3548283519'}},\n [title_test('Wer Wind S\u00e4t: Der F\u00fcnfte Fall F\u00fcr Bodenstein Und Kirchhoff',\n exact=False), authors_test(['Nele Neuhaus'])\n ]\n\n ),\n ] # }}}\n\n it_tests = [ # {{{\n (\n {'identifiers': {'isbn': '8838922195'}},\n [title_test('La briscola in cinque',\n exact=True), authors_test(['Marco Malvaldi'])\n ]\n\n ),\n ] # }}}\n\n fr_tests = [ # {{{\n (\n {'identifiers': {'isbn': '2221116798'}},\n [title_test('L\\'\u00e9trange voyage de Monsieur Daldry',\n exact=True), authors_test(['Marc Levy'])\n ]\n\n ),\n ] # }}}\n\n es_tests = [ # {{{\n (\n {'identifiers': {'isbn': '8483460831'}},\n [title_test('Tiempos Interesantes',\n exact=False), authors_test(['Terry Pratchett'])\n ]\n\n ),\n ] # }}}\n\n jp_tests = [ # {{{\n ( # Adult filtering test\n {'identifiers': {'isbn': '4799500066'}},\n [title_test(u'\uff22\uff49\uff54\uff43\uff48 \uff34\uff52\uff41\uff50'), ]\n ),\n\n ( # isbn -> title, authors\n {'identifiers': {'isbn': '9784101302720'}},\n [title_test(u'\u7cbe\u970a\u306e\u5b88\u308a\u4eba' ,\n exact=True), authors_test([u'\u4e0a\u6a4b \u83dc\u7a42\u5b50'])\n ]\n ),\n ( # title, authors -> isbn (will use Shift_JIS encoding in query.)\n {'title': u'\u8003\u3048\u306a\u3044\u7df4\u7fd2',\n 'authors': [u'\u5c0f\u6c60 \u9f8d\u4e4b\u4ecb']},\n [isbn_test('9784093881067'), ]\n ),\n ] # }}}\n\n br_tests = [ # {{{\n (\n {'title': 'Guerra dos Tronos'},\n [title_test('A Guerra dos Tronos - As Cr\u00f4nicas de Gelo e Fogo',\n exact=True), authors_test(['George R. R. Martin'])\n ]\n\n ),\n ] # }}}\n\n nl_tests = [ # {{{\n (\n {'title': 'Freakonomics'},\n [title_test('Freakonomics',\n exact=True), authors_test(['Steven Levitt & Stephen Dubner & R. Kuitenbrouwer & O. Brenninkmeijer & A. van Den Berg'])\n ]\n\n ),\n ] # }}}\n\n cn_tests = [ # {{{\n (\n {'identifiers': {'isbn': '9787115369512'}},\n [title_test('\u82e5\u4e3a\u81ea\u7531\u6545 \u81ea\u7531\u8f6f\u4ef6\u4e4b\u7236\u7406\u67e5\u 5fb7\u65af\u6258\u66fc\u4f20', exact=True),\n authors_test(['[\u7f8e]sam Williams', '\u9093\u6960\uff0c\u674e\u51e1\u5e0c'])]\n ),\n (\n {'title': '\u7231\u4e0aRaspberry Pi'},\n [title_test('\u7231\u4e0aRaspberry Pi',\n exact=True), authors_test(['Matt Richardson', 'Shawn Wallace', '\u674e\u51e1\u5e0c'])\n ]\n\n ),\n ] # }}}\n\n ca_tests = [ # {{{\n ( # Paperback with series\n {'identifiers': {'isbn': '9781623808747'}},\n [title_test('Parting Shot', exact=True),\n authors_test(['Mary Calmes'])]\n ),\n ( # # in title\n {'title': 'Expert C# 2008 Business Objects',\n 'authors': ['Lhotka']},\n [title_test('Expert C# 2008 Business Objects'),\n authors_test(['Rockford Lhotka'])]\n ),\n ( # noscript description\n {'identifiers': {'amazon_ca': '162380874X'}},\n [title_test('Parting Shot', exact=True), authors_test(['Mary Calmes'])\n ]\n ),\n ] # }}}\n\n def do_test(domain, start=0, stop=None, server='auto'):\n tests = globals().get(domain + '_tests')\n if stop is None:\n stop = len(tests)\n tests = tests[start:stop]\n test_identify_plugin(Amazon.name, tests, modify_plugin=lambda p: (\n setattr(p, 'testing_domain', domain),\n setattr(p, 'touched_fields', p.touched_fields - {'tags'}),\n setattr(p, 'testing_server', server),\n ))\n\n do_test('com')\n # do_test('de')\n# }}}\n", "overdrive": "#!/usr/bin/env python2\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2011, Kovid Goyal kovid@kovidgoyal.net'\n__docformat__ = 'restructuredtext en'\n\n'''\nFetch metadata using Overdrive Content Reserve\n'''\nimport re, random, copy, json\nfrom threading import RLock\nfrom Queue import Queue, Empty\n\n\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source, Option\nfrom calibre.ebooks.metadata.book.base import Metadata\n\novrdrv_data_cache = {}\ncache_lock = RLock()\nbase_url = 'https://search.overdrive.com/'\n\n\nclass OverDrive(Source):\n\n name = 'Overdrive'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads metadata and covers from Overdrive\\'s Content Reserve')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset(['title', 'authors', 'tags', 'pubdate',\n 'comments', 'publisher', 'identifier:isbn', 'series', 'series_index',\n 'languages', 'identifierverdrive'])\n has_html_comments = True\n supports_gzip_transfer_encoding = False\n cached_cover_url_is_reliable = True\n\n options = (\n Option('get_full_metadata', 'bool', True,\n _('Download all metadata (slow)'),\n _('Enable this option to gather all metadata available from Overdrive.')),\n )\n\n config_help_message = '<p>'+_('Additional metadata can be taken from Overdrive\\'s book detail'\n ' page. This includes a limited set of tags used by libraries, comments, language,'\n ' and the e-book ISBN. Collecting this data is disabled by default due to the extra'\n ' time required. Check the download all metadata option below to'\n ' enable downloading this data.')\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=30):\n ovrdrv_id = identifiers.get('overdrive', None)\n isbn = identifiers.get('isbn', None)\n\n br = self.browser\n ovrdrv_data = self.to_ovrdrv_data(br, log, title, authors, ovrdrv_id)\n if ovrdrv_data:\n title = ovrdrv_data[8]\n authors = ovrdrv_data[6]\n mi = Metadata(title, authors)\n self.parse_search_results(ovrdrv_data, mi)\n if ovrdrv_id is None:\n ovrdrv_id = ovrdrv_data[7]\n\n if self.prefs['get_full_metadata']:\n self.get_book_detail(br, ovrdrv_data[1], mi, ovrdrv_id, log)\n\n if isbn is not None:\n self.cache_isbn_to_identifier(isbn, ovrdrv_id)\n\n result_queue.put(mi)\n\n return None\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n import mechanize\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n\n ovrdrv_id = identifiers.get('overdrive', None)\n br = self.browser\n req = mechanize.Request(cached_url)\n if ovrdrv_id is not None:\n referer = self.get_base_referer()+'ContentDetails-Cover.htm?ID='+ovrdrv_id\n req.add_header('referer', referer)\n\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(req, timeout=timeout).read()\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n ovrdrv_id = identifiers.get('overdrive', None)\n if ovrdrv_id is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n ovrdrv_id = self.cached_isbn_to_identifier(isbn)\n if ovrdrv_id is not None:\n url = self.cached_identifier_to_cover_url(ovrdrv_id)\n\n return url\n # }}}\n\n def get_base_referer(self): # to be used for passing referrer headers to cover download\n choices = [\n 'https://overdrive.chipublib.org/82DC601D-7DDE-4212-B43A-09D821935B01/10/375/en/',\n 'https://emedia.clevnet.org/9D321DAD-EC0D-490D-BFD8-64AE2C96ECA8/10/241/en/',\n 'https://singapore.lib.overdrive.com/F11D55BE-A917-4D63-8111-318E88B29740/10/382/en/',\n 'https://ebooks.nypl.org/20E48048-A377-4520-BC43-F8729A42A424/10/257/en/',\n 'https://spl.lib.overdrive.com/5875E082-4CB2-4689-9426-8509F354AFEF/10/335/en/'\n ]\n return choices[random.randint(0, len(choices)-1)]\n\n def format_results(self, reserveid, od_title, subtitle, series, publisher, creators, thumbimage, worldcatlink, formatid):\n fix_slashes = re.compile(r'\\\\/')\n thumbimage = fix_slashes.sub('/', thumbimage)\n worldcatlink = fix_slashes.sub('/', worldcatlink)\n cover_url = re.sub('(?P<img>(Ima?g(eType-)?))200', '\\g<img>100', thumbimage)\n social_metadata_url = base_url+'TitleInfo.aspx?ReserveID='+reserveid+'&F ormatID='+formatid\n series_num = ''\n if not series:\n if subtitle:\n title = od_title+': '+subtitle\n else:\n title = od_title\n else:\n title = od_title\n m = re.search(\"([0-9]+$)\", subtitle)\n if m:\n series_num = float(m.group(1))\n return [cover_url, social_metadata_url, worldcatlink, series, series_num, publisher, creators, reserveid, title]\n\n def safe_query(self, br, query_url, post=''):\n '''\n The query must be initialized by loading an empty search results page\n this page attempts to set a cookie that Mechanize doesn't like\n copy the cookiejar to a separate instance and make a one-off request with the temp cookiejar\n '''\n import mechanize\n goodcookies = br._ua_handlers['_cookies'].cookiejar\n clean_cj = mechanize.CookieJar()\n cookies_to_copy = []\n for cookie in goodcookies:\n copied_cookie = copy.deepcopy(cookie)\n cookies_to_copy.append(copied_cookie)\n for copied_cookie in cookies_to_copy:\n clean_cj.set_cookie(copied_cookie)\n\n if post:\n br.open_novisit(query_url, post)\n else:\n br.open_novisit(query_url)\n\n br.set_cookiejar(clean_cj)\n\n def overdrive_search(self, br, log, q, title, author):\n import mechanize\n # re-initialize the cookiejar to so that it's clean\n clean_cj = mechanize.CookieJar()\n br.set_cookiejar(clean_cj)\n q_query = q+'default.aspx/SearchByKeyword'\n q_init_search = q+'SearchResults.aspx'\n # get first author as string - convert this to a proper cleanup function later\n author_tokens = list(self.get_author_tokens(author,\n only_first_author=True))\n title_tokens = list(self.get_title_tokens(title,\n strip_joiners=False, strip_subtitle=True))\n\n xref_q = ''\n if len(author_tokens) <= 1:\n initial_q = ' '.join(title_tokens)\n xref_q = '+'.join(author_tokens)\n else:\n initial_q = ' '.join(author_tokens)\n for token in title_tokens:\n if len(xref_q) < len(token):\n xref_q = token\n\n log.error('Initial query is %s'%initial_q)\n log.error('Cross reference query is %s'%xref_q)\n\n q_xref = q+'SearchResults.svc/GetResults?iDisplayLength=50&sSearch='+xref_q\n query = '{\"szKeyword\":\"'+initial_q+'\"}'\n\n # main query, requires specific Content Type header\n req = mechanize.Request(q_query)\n req.add_header('Content-Type', 'application/json; charset=utf-8')\n br.open_novisit(req, query)\n\n # initiate the search without messing up the cookiejar\n self.safe_query(br, q_init_search)\n\n # get the search results object\n results = False\n iterations = 0\n while results is False:\n iterations += 1\n xreq = mechanize.Request(q_xref)\n xreq.add_header('X-Requested-With', 'XMLHttpRequest')\n xreq.add_header('Referer', q_init_search)\n xreq.add_header('Accept', 'application/json, text/javascript, /')\n raw = br.open_novisit(xreq).read()\n for m in re.finditer(unicode(r'\"iTotalDisplayRecords\"?P <displayrecords>\\d+).?\"iTotalRecords\"?P<tota lrecords>\\d+)'), raw):\n if int(m.group('totalrecords')) == 0:\n return ''\n elif int(m.group('displayrecords')) >= 1:\n results = True\n elif int(m.group('totalrecords')) >= 1 and iterations < 3:\n if xref_q.find('+') != -1:\n xref_tokens = xref_q.split('+')\n xref_q = xref_tokens[0]\n for token in xref_tokens:\n if len(xref_q) < len(token):\n xref_q = token\n # log.error('rewrote xref_q, new query is '+xref_q)\n else:\n xref_q = ''\n q_xref = q+'SearchResults.svc/GetResults?iDisplayLength=50&sSearch='+xref_q\n\n return self.sort_ovrdrv_results(raw, log, title, title_tokens, author, author_tokens)\n\n def sort_ovrdrv_results(self, raw, log, title=None, title_tokens=None, author=None, author_tokens=None, ovrdrv_id=None):\n close_matches = []\n raw = re.sub('.?\\[\\[(?P<content>.?)\\]\\].', '[[\\g<content>]]', raw)\n results = json.loads(raw)\n # log.error('raw results are:'+str(results))\n # The search results are either from a keyword search or a multi-format list from a single ID,\n # sort through the results for closest match/format\n if results:\n for reserveid, od_title, subtitle, edition, series, publisher, format, formatid, creators, \\\n thumbimage, shortdescription, worldcatlink, excerptlink, creatorfile, sorttitle, \\\n availabletolibrary, availabletoretailer, relevancyrank, unknown1, unknown2, unknown3 in results:\n # log.error(\"this record's title is \"+od_title+\", subtitle is \"+subtitle+\", author[s] are \"+creators+\", series is \"+series)\n if ovrdrv_id is not None and int(formatid) in [1, 50, 410, 900]:\n # log.error('overdrive id is not None, searching based on format type priority')\n return self.format_results(reserveid, od_title, subtitle, series, publisher,\n creators, thumbimage, worldcatlink, formatid)\n else:\n if creators:\n creators = creators.split(', ')\n\n # if an exact match in a preferred format occurs\n if ((author and creators and creators[0] == author[0]) or (not author and not creators)) and \\\n od_title.lower() == title.lower() and int(formatid) in [1, 50, 410, 900] and thumbimage:\n return self.format_results(reserveid, od_title, subtitle, series, publisher,\n creators, thumbimage, worldcatlink, formatid)\n else:\n close_title_match = False\n close_author_match = False\n for token in title_tokens:\n if od_title.lower().find(token.lower()) != -1:\n close_title_match = True\n else:\n close_title_match = False\n break\n for author in creators:\n for token in author_tokens:\n if author.lower().find(token.lower()) != -1:\n close_author_match = True\n else:\n close_author_match = False\n break\n if close_author_match:\n break\n if close_title_match and close_author_match and int(formatid) in [1, 50, 410, 900] and thumbimage:\n if subtitle and series:\n close_matches.insert(0, self.format_results(reserveid, od_title, subtitle, series,\n publisher, creators, thumbimage, worldcatlink, formatid))\n else:\n close_matches.append(self.format_results(reserveid , od_title, subtitle, series,\n publisher, creators, thumbimage, worldcatlink, formatid))\n\n elif close_title_match and close_author_match and int(formatid) in [1, 50, 410, 900]:\n close_matches.append(self.format_results(reserveid , od_title, subtitle, series,\n publisher, creators, thumbimage, worldcatlink, formatid))\n\n if close_matches:\n return close_matches[0]\n else:\n return ''\n else:\n return ''\n\n def overdrive_get_record(self, br, log, q, ovrdrv_id):\n import mechanize\n search_url = q+'SearchResults.aspx?ReserveID={'+ovrdrv_id+'}'\n results_url = q+'SearchResults.svc/GetResults?sEcho=1&iColumns=18&sColumns=ReserveID% 2CTitle%2CSubtitle%2CEdition%2CSeries%2CPublisher% 2CFormat%2CFormatID%2CCreators%2CThumbImage%2CShor tDescription%2CWorldCatLink%2CExcerptLink%2CCreato rFile%2CSortTitle%2CAvailableToLibrary%2CAvailable ToRetailer%2CRelevancyRank&iDisplayStart=0&iDispla yLength=10&sSearch=&bEscapeRegex=true&iSortingCols =1&iSortCol_0=17&sSortDir_0=asc' # noqa\n\n # re-initialize the cookiejar to so that it's clean\n clean_cj = mechanize.CookieJar()\n br.set_cookiejar(clean_cj)\n # get the base url to set the proper session cookie\n br.open_novisit(q)\n\n # initialize the search\n self.safe_query(br, search_url)\n\n # get the results\n req = mechanize.Request(results_url)\n req.add_header('X-Requested-With', 'XMLHttpRequest')\n req.add_header('Referer', search_url)\n req.add_header('Accept', 'application/json, text/javascript, /')\n raw = br.open_novisit(req)\n raw = str(list(raw))\n clean_cj = mechanize.CookieJar()\n br.set_cookiejar(clean_cj)\n return self.sort_ovrdrv_results(raw, log, None, None, None, ovrdrv_id)\n\n def find_ovrdrv_data(self, br, log, title, author, isbn, ovrdrv_id=None):\n q = base_url\n if ovrdrv_id is None:\n return self.overdrive_search(br, log, q, title, author)\n else:\n return self.overdrive_get_record(br, log, q, ovrdrv_id)\n\n def to_ovrdrv_data(self, br, log, title=None, author=None, ovrdrv_id=None):\n '''\n Takes either a title/author combo or an Overdrive ID. One of these\n two must be passed to this function.\n '''\n if ovrdrv_id is not None:\n with cache_lock:\n ans = ovrdrv_data_cache.get(ovrdrv_id, None)\n if ans:\n return ans\n elif ans is False:\n return None\n else:\n ovrdrv_data = self.find_ovrdrv_data(br, log, title, author, ovrdrv_id)\n else:\n try:\n ovrdrv_data = self.find_ovrdrv_data(br, log, title, author, ovrdrv_id)\n except:\n import traceback\n traceback.print_exc()\n ovrdrv_data = None\n with cache_lock:\n ovrdrv_data_cache[ovrdrv_id] = ovrdrv_data if ovrdrv_data else False\n\n return ovrdrv_data if ovrdrv_data else False\n\n def parse_search_results(self, ovrdrv_data, mi):\n '''\n Parse the formatted search results from the initial Overdrive query and\n add the values to the metadta.\n\n The list object has these values:\n [cover_url[0], social_metadata_url[1], worldcatlink[2], series[3], series_num[4],\n publisher[5], creators[6], reserveid[7], title[8]]\n\n '''\n ovrdrv_id = ovrdrv_data[7]\n mi.set_identifier('overdrive', ovrdrv_id)\n\n if len(ovrdrv_data[3]) > 1:\n mi.series = ovrdrv_data[3]\n if ovrdrv_data[4]:\n try:\n mi.series_index = float(ovrdrv_data[4])\n except:\n pass\n mi.publisher = ovrdrv_data[5]\n mi.authors = ovrdrv_data[6]\n mi.title = ovrdrv_data[8]\n cover_url = ovrdrv_data[0]\n if cover_url:\n self.cache_identifier_to_cover_url(ovrdrv_id,\n cover_url)\n\n def get_book_detail(self, br, metadata_url, mi, ovrdrv_id, log):\n from lxml import html\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.soupparser import fromstring\n from calibre.library.comments import sanitize_comments_html\n\n try:\n raw = br.open_novisit(metadata_url).read()\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and \\\n e.getcode() == 404:\n return False\n raise\n raw = xml_to_unicode(raw, strip_encoding_pats=True,\n resolve_entities=True)[0]\n try:\n root = fromstring(raw)\n except:\n return False\n\n pub_date = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblPubDate']/text()\")\n lang = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblLanguage']/text()\")\n subjects = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblSubjects']/text()\")\n ebook_isbn = root.xpath(\"//td/label[@id='ctl00_ContentPlaceHolder1_lblIdentifier']/text()\")\n desc = root.xpath(\"//div/label[@id='ctl00_ContentPlaceHolder1_lblDescription']/ancestor::div[1]\")\n\n if pub_date:\n from calibre.utils.date import parse_date\n try:\n mi.pubdate = parse_date(pub_date[0].strip())\n except:\n pass\n if lang:\n lang = lang[0].strip().lower()\n lang = {'english':'eng', 'french':'fra', 'german':'deu',\n 'spanish':'spa'}.get(lang, None)\n if lang:\n mi.language = lang\n\n if ebook_isbn:\n # print \"ebook isbn is \"+str(ebook_isbn[0])\n isbn = check_isbn(ebook_isbn[0].strip())\n if isbn:\n self.cache_isbn_to_identifier(isbn, ovrdrv_id)\n mi.isbn = isbn\n if subjects:\n mi.tags = [tag.strip() for tag in subjects[0].split(',')]\n\n if desc:\n desc = desc[0]\n desc = html.tostring(desc, method='html', encoding=unicode).strip()\n # remove all attributes from tags\n desc = re.sub(r'<([a-zA-Z0-9]+)\\s[^>]+>', r'<\\1>', desc)\n # Remove comments\n desc = re.sub(r'(?s)<!--.?-->', '', desc)\n mi.comments = sanitize_comments_html(desc)\n\n return None\n\n\nif __name__ == '__main__':\n # To run these test use:\n # calibre-debug -e src/calibre/ebooks/metadata/sources/overdrive.py\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n title_test, authors_test)\n test_identify_plugin(OverDrive.name,\n [\n\n (\n {'title':'The Sea Kings Daughter',\n 'authors':['Elizabeth Peters']},\n [title_test('The Sea Kings Daughter', exact=False),\n authors_test(['Elizabeth Peters'])]\n ),\n\n (\n {'title': 'Elephants', 'authors':['Agatha']},\n [title_test('Elephants Can Remember', exact=False),\n authors_test(['Agatha Christie'])]\n ),\n ])\n", "big_book_search": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2013, Kovid Goyal <kovid@kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nfrom calibre.ebooks.metadata.sources.base import Source, Option\n\n\ndef get_urls(br, tokens):\n from urllib import quote_plus\n from mechanize import Request\n from lxml import html\n escaped = [quote_plus(x.encode('utf-8')) for x in tokens if x and x.strip()]\n q = b'+'.join(escaped)\n url = 'http://bigbooksearch.com/books/'+q\n br.open(url).read()\n req = Request('http://bigbooksearch.com/query.php?SearchIndex=books&Keywords=%s&ItemPage=1 '%q)\n req.add_header('X-Requested-With', 'XMLHttpRequest')\n req.add_header('Referer', url)\n raw = br.open(req).read()\n root = html.fromstring(raw.decode('utf-8'))\n urls = [i.get('src') for i in root.xpath('//img[@src]')]\n return urls\n\n\nclass BigBookSearch(Source):\n\n name = 'Big Book Search'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads multiple book covers from Amazon. Useful to find alternate covers.')\n capabilities = frozenset(['cover'])\n can_get_multiple_covers = True\n options = (Option('max_covers', 'number', 5, _('Maximum number of covers to get'),\n _('The maximum number of covers to process from the search result')),\n )\n supports_gzip_transfer_encoding = True\n\n def download_cover(self, log, result_queue, abort,\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n if not title:\n return\n br = self.browser\n tokens = tuple(self.get_title_tokens(title)) + tuple(self.get_author_tokens(authors))\n urls = get_urls(br, tokens)\n self.download_multiple_covers(title, authors, urls, get_best_cover, timeout, result_queue, abort, log)\n\n\ndef test():\n from calibre import browser\n import pprint\n br = browser()\n urls = get_urls(br, ['consider', 'phlebas', 'banks'])\n pprint.pprint(urls)\n\n\nif __name__ == '__main__':\n test()\n", "ozon": "#!/usr/bin/env python2\n# -- coding: utf-8 --\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL 3'\n__copyright__ = '2011-2013 Roman Mukhin <ramses_ru at hotmail.com>'\n__docformat__ = 'restructuredtext en'\n\n# To ensure bugfix and development of this metadata source please donate\n# bitcoins to 1E6CRSLY1uNstcZjLYZBHRVs1CPKbdi4ep\n\nimport re\nfrom Queue import Queue, Empty\n\nfrom calibre import as_unicode, replace_entities\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source, Option\nfrom calibre.ebooks.metadata.book.base import Metadata\n\n\nclass Ozon(Source):\n name = 'OZON.ru'\n minimum_calibre_version = (2, 80, 0)\n version = (1, 1, 0)\n description = _('Downloads metadata and covers from OZON.ru (updated)')\n\n capabilities = frozenset(['identify', 'cover'])\n\n touched_fields = frozenset(['title', 'authors', 'identifier:isbn', 'identifierzon',\n 'publisher', 'pubdate', 'comments', 'series', 'rating', 'languages'])\n # Test purpose only, test function does not like when sometimes some filed are empty\n # touched_fields = frozenset(['title', 'authors', 'identifier:isbn', 'identifierzon',\n # 'publisher', 'pubdate', 'comments'])\n\n supports_gzip_transfer_encoding = True\n has_html_comments = True\n\n ozon_url = 'https://www.ozon.ru'\n\n # match any ISBN10/13. From \"Regular Expressions Cookbook\"\n isbnPattern = r'(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}\|' \\\n '[-0-9X ]{13}\|[0-9X]{10})(?:97[89][- ]?)?[0-9]{1,5}[- ]?' \\\n '(?:[0-9]+[- ]?){2}[0-9X]'\n isbnRegex = re.compile(isbnPattern)\n\n optkey_strictmatch = 'strict_result_match'\n options = (\n Option(optkey_strictmatch, 'bool', False,\n _('Filter out less relevant hits from the search results'),\n _(\n 'Improve search result by removing less relevant hits. It can be useful to refine the search when there are many matches')),\n )\n\n def get_book_url(self, identifiers): # {{{\n import urllib2\n ozon_id = identifiers.get('ozon', None)\n res = None\n if ozon_id:\n # no affiliateId is used in search/detail\n url = '{}/context/detail/id/{}'.format(self.ozon_url, urllib2.quote(ozon_id), _get_affiliateId())\n res = ('ozon', ozon_id, url)\n return res\n\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}): # {{{\n from urllib import quote_plus\n\n # div_book -> search only books, ebooks and audio books\n search_url = self.ozon_url + '/?context=search&group=div_book&text='\n\n # for ozon.ru search we have to format ISBN with '-'\n isbn = _format_isbn(log, identifiers.get('isbn', None))\n if isbn and '-' not in isbn:\n log.error(\n \"%s requires formatted ISBN for search. %s cannot be formated - removed. (only Russian ISBN format is supported now)\"\n % (self.name, isbn))\n isbn = None\n\n ozonid = identifiers.get('ozon', None)\n\n qItems = {ozonid, isbn}\n\n # Added Russian variant of 'Unknown'\n unk = [_('Unknown').upper(), '\u041d\u0435\u0438\u0437\u0432.'.upper(), icu_upper('\u041d\u0435\u0438\u0437\u0432.')]\n\n # use only ozonid if specified otherwise ozon.ru does not like a combination\n if not ozonid:\n if title and title not in unk:\n qItems.add(title)\n\n if authors:\n for auth in authors:\n if icu_upper(auth) not in unk:\n qItems.add(auth)\n\n qItems.discard(None)\n qItems.discard('')\n searchText = u' '.join(qItems).strip()\n\n if isinstance(searchText, unicode):\n searchText = searchText.encode('utf-8')\n if not searchText:\n return None\n\n search_url += quote_plus(searchText)\n log.debug(u'search url: %s' % search_url)\n return search_url\n\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None,\n identifiers={}, timeout=90): # {{{\n from calibre.ebooks.chardet import xml_to_unicode\n from HTMLParser import HTMLParser\n from lxml import etree, html\n import json\n\n if not self.is_configured():\n return\n query = self.create_query(log, title=title, authors=authors, identifiers=identifiers)\n if not query:\n err = u'Insufficient metadata to construct query'\n log.error(err)\n return err\n\n try:\n raw = self.browser.open_novisit(query).read()\n except Exception as e:\n log.exception(u'Failed to make identify query: %r' % query)\n return as_unicode(e)\n\n try:\n doc = html.fromstring(xml_to_unicode(raw, verbose=True)[0])\n entries_block = doc.xpath(u'//div[@class=\"bSearchResult\"]')\n\n # log.debug(u'HTML: %s' % xml_to_unicode(raw, verbose=True)[0])\n\n if entries_block:\n entries = doc.xpath(u'//div[contains(@itemprop, \"itemListElement\")]')\n # log.debug(u'entries_block')\n # for entry in entries:\n # log.debug('entries %s' % entree.tostring(entry))\n metadata = self.get_metadata(log, entries, title, authors, identifiers)\n self.get_all_details(log, metadata, abort, result_queue, identifiers, timeout)\n else:\n # Redirect page: trying to extract ozon_id from javascript data\n h = HTMLParser()\n entry_string = (h.unescape(etree.tostring(doc, pretty_print=True, encoding=unicode)))\n json_pat = re.compile(u'dataLayer\\s=\\s(.+)?;')\n json_info = re.search(json_pat, entry_string)\n jsondata = json_info.group(1) if json_info else None\n if jsondata:\n idx = jsondata.rfind('}]')\n if idx > 0:\n jsondata = jsondata[:idx + 2]\n\n # log.debug(u'jsondata: %s' % jsondata)\n dataLayer = json.loads(jsondata) if jsondata else None\n\n ozon_id = None\n if dataLayer and dataLayer[0] and 'ecommerce' in dataLayer[0]:\n jsproduct = dataLayer[0]['ecommerce']['detail']['products'][0]\n ozon_id = as_unicode(jsproduct['id'])\n entry_title = as_unicode(jsproduct['name'])\n\n log.debug(u'ozon_id %s' % ozon_id)\n log.debug(u'entry_title %s' % entry_title)\n\n if ozon_id:\n metadata = self.to_metadata_for_single_entry(log, ozon_id, entry_title, authors)\n identifiers['ozon'] = ozon_id\n self.get_all_details(log, [metadata], abort, result_queue, identifiers, timeout, cachedPagesDict={})\n\n if not ozon_id:\n log.error('No SearchResults in Ozon.ru response found!')\n\n except Exception as e:\n log.exception('Failed to parse identify results')\n return as_unicode(e)\n\n # }}}\n\n def to_metadata_for_single_entry(self, log, ozon_id, title, authors): # {{{\n\n # parsing javascript data from the redirect page\n mi = Metadata(title, authors)\n mi.identifiers = {'ozon': ozon_id}\n\n return mi\n\n # }}}\n\n def get_metadata(self, log, entries, title, authors, identifiers): # {{{\n # some book titles have extra characters like this\n\n reRemoveFromTitle = re.compile(r'[?!:.,;+-/&%\"\\'=]')\n\n title = unicode(title).upper() if title else ''\n if reRemoveFromTitle:\n title = reRemoveFromTitle.sub('', title)\n authors = map(_normalizeAuthorNameWithInitials,\n map(unicode.upper, map(unicode, authors))) if authors else None\n\n ozon_id = identifiers.get('ozon', None)\n # log.debug(u'ozonid: ', ozon_id)\n\n unk = unicode(_('Unknown')).upper()\n\n if title == unk:\n title = None\n\n if authors == [unk] or authors == []:\n authors = None\n\n def in_authors(authors, miauthors):\n for author in authors:\n for miauthor in miauthors:\n # log.debug(u'=> %s <> %s'%(author, miauthor))\n if author in miauthor:\n return True\n return None\n\n def calc_source_relevance(mi): # {{{\n relevance = 0\n if title:\n mititle = unicode(mi.title).upper() if mi.title else ''\n\n if reRemoveFromTitle:\n mititle = reRemoveFromTitle.sub('', mititle)\n\n if title in mititle:\n relevance += 3\n elif mititle:\n # log.debug(u'!!%s!'%mititle)\n relevance -= 3\n else:\n relevance += 1\n\n if authors:\n miauthors = map(unicode.upper, map(unicode, mi.authors)) if mi.authors else []\n # log.debug('Authors %s vs miauthors %s'%(','.join(authors), ','.join(miauthors)))\n\n if (in_authors(authors, miauthors)):\n relevance += 3\n elif u''.join(miauthors):\n # log.debug(u'!%s!'%u'\|'.join(miauthors))\n relevance -= 3\n else:\n relevance += 1\n\n if ozon_id:\n mozon_id = mi.identifiers['ozon']\n if ozon_id == mozon_id:\n relevance += 100\n\n if relevance < 0:\n relevance = 0\n return relevance\n\n # }}}\n\n strict_match = self.prefs[self.optkey_strictmatch]\n metadata = []\n for entry in entries:\n\n mi = self.to_metadata(log, entry)\n relevance = calc_source_relevance(mi)\n # TODO findout which is really used\n mi.source_relevance = relevance\n mi.relevance_in_source = relevance\n\n if not strict_match or relevance > 0:\n # getting rid of a random book that shows up in results\n if not (mi.title == 'Unknown'):\n metadata.append(mi)\n # log.debug(u'added metadata %s %s.'%(mi.title, mi.authors))\n else:\n log.debug(u'skipped metadata title: %s, authors: %s. (does not match the query - relevance score: %s)'\n % (mi.title, u' '.join(mi.authors), relevance))\n return metadata\n\n # }}}\n\n def get_all_details(self, log, metadata, abort, result_queue, identifiers, timeout, cachedPagesDict={}): # {{{\n\n req_isbn = identifiers.get('isbn', None)\n\n for mi in metadata:\n if abort.is_set():\n break\n try:\n ozon_id = mi.identifiers['ozon']\n\n try:\n self.get_book_details(log, mi, timeout, cachedPagesDict[\n ozon_id] if cachedPagesDict and ozon_id in cachedPagesDict else None)\n except:\n log.exception(u'Failed to get details for metadata: %s' % mi.title)\n\n all_isbns = getattr(mi, 'all_isbns', [])\n if req_isbn and all_isbns and check_isbn(req_isbn) not in all_isbns:\n log.debug(u'skipped, no requested ISBN %s found' % req_isbn)\n continue\n\n for isbn in all_isbns:\n self.cache_isbn_to_identifier(isbn, ozon_id)\n\n if mi.ozon_cover_url:\n self.cache_identifier_to_cover_url(ozon_id, mi.ozon_cover_url)\n\n self.clean_downloaded_metadata(mi)\n result_queue.put(mi)\n\n except:\n log.exception(u'Failed to get details for metadata: %s' % mi.title)\n\n # }}}\n\n def to_metadata(self, log, entry): # {{{\n title = unicode(entry.xpath(u'normalize-space(.//div[@itemprop=\"name\"][1]/text())'))\n # log.debug(u'Title: -----> %s' % title)\n\n author = unicode(entry.xpath(u'normalize-space(.//div[contains(@class, \"mPerson\")])'))\n # log.debug(u'Author: -----> %s' % author)\n\n norm_authors = map(_normalizeAuthorNameWithInitials, map(unicode.strip, unicode(author).split(u',')))\n mi = Metadata(title, norm_authors)\n\n ozon_id = entry.get('data-href').split('/')[-2]\n\n if ozon_id:\n mi.identifiers = {'ozon': ozon_id}\n # log.debug(u'ozon_id: -----> %s' % ozon_id)\n\n mi.ozon_cover_url = None\n cover = entry.xpath(u'normalize-space(.//img[1]/@src)')\n log.debug(u'cover: -----> %s' % cover)\n if cover:\n mi.ozon_cover_url = _translateToBigCoverUrl(cover)\n # log.debug(u'mi.ozon_cover_url: -----> %s' % mi.ozon_cover_url)\n\n pub_year = None\n pub_year_block = entry.xpath(u'.//div[@class=\"bOneTileProperty\"]/text()')\n year_pattern = re.compile('\\d{4}')\n if pub_year_block:\n pub_year = re.search(year_pattern, pub_year_block[0])\n if pub_year:\n mi.pubdate = toPubdate(log, pub_year.group())\n # log.debug('pubdate %s' % mi.pubdate)\n\n mi.rating = self.get_rating(log, entry)\n # if not mi.rating:\n # log.debug('No rating found. ozon_id:%s'%ozon_id)\n\n return mi\n\n # }}}\n\n def get_rating(self, log, entry): # {{{\n # log.debug(entry)\n ozon_rating = None\n try:\n xp_rating_template = u'boolean(.//div[contains(@class, \"bStars\") and contains(@class, \"%s\")])'\n rating = None\n if entry.xpath(xp_rating_template % 'm5'):\n rating = 5.\n elif entry.xpath(xp_rating_template % 'm4'):\n rating = 4.\n elif entry.xpath(xp_rating_template % 'm3'):\n rating = 3.\n elif entry.xpath(xp_rating_template % 'm2'):\n rating = 2.\n elif entry.xpath(xp_rating_template % 'm1'):\n rating = 1.\n if rating:\n # 'rating', A floating point number between 0 and 10\n # OZON raion N of 5, calibre of 10, but there is a bug? in identify\n ozon_rating = float(rating)\n except:\n pass\n return ozon_rating\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n ozon_id = identifiers.get('ozon', None)\n if ozon_id is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n ozon_id = self.cached_isbn_to_identifier(isbn)\n if ozon_id is not None:\n url = self.cached_identifier_to_cover_url(ozon_id)\n return url\n\n # }}}\n\n def download_cover(self, log, result_queue, abort, title=None, authors=None, identifiers={}, timeout=30,\n get_best_cover=False): # {{{\n\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.debug('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors, identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(titl e=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n\n log.debug('Downloading cover from:', cached_url)\n try:\n cdata = self.browser.open_novisit(cached_url, timeout=timeout).read()\n if cdata:\n result_queue.put((self, cdata))\n except Exception as e:\n log.exception(u'Failed to download cover from: %s' % cached_url)\n return as_unicode(e)\n\n # }}}\n\n def get_book_details(self, log, metadata, timeout, cachedPage): # {{{\n from lxml import etree, html\n from calibre.ebooks.chardet import xml_to_unicode\n\n if not cachedPage:\n url = self.get_book_url(metadata.get_identifiers())[2]\n # log.debug(u'book_details_url', url)\n\n raw = self.browser.open_novisit(url, timeout=timeout).read()\n fulldoc = html.fromstring(xml_to_unicode(raw, verbose=True)[0])\n else:\n fulldoc = cachedPage\n log.debug(u'book_details -> using cached page')\n\n fullString = etree.tostring(fulldoc)\n doc = fulldoc.xpath(u'//div[@class=\"bDetailPage\"][1]')[0]\n\n # series \u0421\u0435\u0440\u0438\u044f/\u0421\u0435\u0440\u0438\u0438\n series_elem = doc.xpath(u'//div[contains(text(), \"\u0421\u0435\u0440\u0438\")]')\n if series_elem:\n series_text_elem = series_elem[0].getnext()\n metadata.series = series_text_elem.xpath(u'.//a/text()')[0]\n log.debug(u'*Seria: ', metadata.series)\n\n isbn = None\n isbn_elem = doc.xpath(u'//div[contains(text(), \"ISBN\")]')\n if isbn_elem:\n isbn = isbn_elem[0].getnext().xpath(u'normalize-space(./text())')\n metadata.identifiers['isbn'] = isbn\n\n # get authors/editors if no authors are available\n authors_joined = ','.join(metadata.authors)\n\n if authors_joined == '' or authors_joined == \"Unknown\":\n authors_from_detail = []\n editor_elem = doc.xpath(u'//div[contains(text(), \"\u0420\u0435\u0434\u0430\u043a\u0442\u043e\u0440 \")]')\n if editor_elem:\n editor = editor_elem[0].getnext().xpath(u'.//a/text()')[0]\n authors_from_detail.append(editor + u' (\u0440\u0435\u0434.)')\n authors_elem = doc.xpath(u'//div[contains(text(), \"\u0410\u0432\u0442\u043e\u0440\")]')\n if authors_elem:\n authors = authors_elem[0].getnext().xpath(u'.//a/text()') # list\n authors_from_detail.extend(authors)\n if len(authors_from_detail) > 0:\n metadata.authors = authors_from_detail\n\n cover = doc.xpath('.//img[contains(@class, \"fullImage\")]/@src')[0]\n metadata.ozon_cover_url = _translateToBigCoverUrl(cover)\n\n publishers = None\n publishers_elem = doc.xpath(u'//div[contains(text(), \"\u0418\u0437\u0434\u0430\u0442\u0435\u043b\u044c \")]')\n if publishers_elem:\n publishers_elem = publishers_elem[0].getnext()\n publishers = publishers_elem.xpath(u'.//a/text()')[0]\n\n if publishers:\n metadata.publisher = publishers\n\n displ_lang = None\n langs = None\n langs_elem = doc.xpath(u'//div[contains(text(), \"\u0437\u044b\u043a\")]')\n if langs_elem:\n langs_elem = langs_elem[0].getnext()\n langs = langs_elem.xpath(u'text()')[0].strip() if langs_elem else None\n if langs:\n lng_splt = langs.split(u',')\n if lng_splt:\n displ_lang = lng_splt[0].strip()\n # log.debug(u'displ_lang1: ', displ_lang)\n metadata.language = _translageLanguageToCode(displ_lang)\n # log.debug(u'Language: ', metadata.language)\n\n # can be set before from xml search response\n if not metadata.pubdate:\n pubdate_elem = doc.xpath(u'//div[contains(text(), \"\u0413\u043e\u0434 \u0432\u044b\u043f\u0443\u0441\u043a\u0430\")]')\n if pubdate_elem:\n pubYear = pubdate_elem[0].getnext().xpath(u'text()')[0].strip()\n if pubYear:\n matcher = re.search(r'\\d{4}', pubYear)\n if matcher:\n metadata.pubdate = toPubdate(log, matcher.group(0))\n # log.debug(u'Pubdate: ', metadata.pubdate)\n\n # comments, from Javascript data\n beginning = fullString.find(u'FirstBlock')\n end = fullString.find(u'}', beginning)\n comments = unicode(fullString[beginning + 75:end - 1]).decode(\"unicode-escape\")\n metadata.comments = replace_entities(comments, 'utf-8')\n # }}}\n\n\ndef _verifyISBNIntegrity(log, isbn): # {{{\n # Online ISBN-Check http://www.isbn-check.de/\n res = check_isbn(isbn)\n if not res:\n log.error(u'ISBN integrity check failed for \"%s\"' % isbn)\n return res is not None\n\n\n# }}}\n\n# TODO: make customizable\ndef _translateToBigCoverUrl(coverUrl): # {{{\n # //static.ozone.ru/multimedia/c200/1005748980.jpg\n # http://www.ozon.ru/multimedia/books_covers/1009493080.jpg\n m = re.match(r'.+\\/([^\\.\\\\]+).+$', coverUrl)\n if m:\n coverUrl = 'https://www.ozon.ru/multimedia/books_covers/' + m.group(1) + '.jpg'\n return coverUrl\n\n\n# }}}\n\ndef _get_affiliateId(): # {{{\n import random\n\n aff_id = 'romuk'\n # Use Kovid's affiliate id 30% of the time.\n if random.randint(1, 10) in (1, 2, 3):\n aff_id = 'kovidgoyal'\n return aff_id\n\n\n# }}}\n\ndef _format_isbn(log, isbn): # {{{\n # for now only RUS ISBN are supported\n # http://ru.wikipedia.org/wiki/ISBN_\u0440\u043e\u0441\u0441\u0438\u0439\u0441\u0 43a\u0438\u0445_\u0438\u0437\u0434\u0430\u0442\u04 35\u043b\u044c\u0441\u0442\u0432\n isbn_pat = re.compile(r\"\"\"\n ^\n (\\d{3})? # match GS1 Prefix for ISBN13\n (5) # group identifier for Russian-speaking countries\n ( # begin variable length for Publisher\n [01]\\d{1}\| # 2x\n [2-6]\\d{2}\| # 3x\n 7\\d{3}\| # 4x (starting with 7)\n 8[0-4]\\d{2}\| # 4x (starting with 8)\n 9[2567]\\d{2}\| # 4x (starting with 9)\n 99[26]\\d{1}\| # 4x (starting with 99)\n 8[5-9]\\d{3}\| # 5x (starting with 8)\n 9[348]\\d{3}\| # 5x (starting with 9)\n 900\\d{2}\| # 5x (starting with 900)\n 91[0-8]\\d{2}\| # 5x (starting with 91)\n 90[1-9]\\d{3}\| # 6x (starting with 90)\n 919\\d{3}\| # 6x (starting with 919)\n 99[^26]\\d{4} # 7x (starting with 99)\n ) # end variable length for Publisher\n (\\d+) # Title\n ([\\dX]) # Check digit\n $\n \"\"\", re.VERBOSE)\n\n res = check_isbn(isbn)\n if res:\n m = isbn_pat.match(res)\n if m:\n res = '-'.join([g for g in m.groups() if g])\n else:\n log.error('cannot format ISBN %s. Fow now only russian ISBNs are supported' % isbn)\n return res\n\n# }}}\n\n\ndef _translageLanguageToCode(displayLang): # {{{\n displayLang = unicode(displayLang).strip() if displayLang else None\n langTbl = {None: 'ru',\n u'\u0420\u0443\u0441\u0441\u043a\u0438\u0439': 'ru',\n u'\u041d\u0435\u043c\u0435\u0446\u043a\u0438\u0439 ': 'de',\n u'\u0410\u043d\u0433\u043b\u0438\u0439\u0441\u043a \u0438\u0439': 'en',\n u'\u0424\u0440\u0430\u043d\u0446\u0443\u0437\u0441 \u043a\u0438\u0439': 'fr',\n u'\u0418\u0442\u0430\u043b\u044c\u044f\u043d\u0441 \u043a\u0438\u0439': 'it',\n u'\u0418\u0441\u043f\u0430\u043d\u0441\u043a\u0438 \u0439': 'es',\n u'\u041a\u0438\u0442\u0430\u0439\u0441\u043a\u0438 \u0439': 'zh',\n u'\u042f\u043f\u043e\u043d\u0441\u043a\u0438\u0439 ': 'ja',\n u'\u0424\u0438\u043d\u0441\u043a\u0438\u0439': 'fi',\n u'\u041f\u043e\u043b\u044c\u0441\u043a\u0438\u0439 ': 'pl',\n u'\u0423\u043a\u0440\u0430\u0438\u043d\u0441\u043a \u0438\u0439': 'uk',}\n return langTbl.get(displayLang, None)\n\n\n# }}}\n\n# [\u0412.\u041f. \u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\u 043e\u0432 \| \u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\u 043e\u0432 \u0412.\u041f.]-> \u0412. \u041f. B\u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\ u043e\u0432\ndef _normalizeAuthorNameWithInitials(name): # {{{\n res = name\n if name:\n re1 = u'^(?P<lname>\\S+)\\s+(?P<fname>[^\\d\\W]\\.)(?:\\s(?P<mname>[^\\d\\W]\\.))?$'\n re2 = u'^(?P<fname>[^\\d\\W]\\.)(?:\\s(?P<mname>[^\\d\\W]\\.))?\\s+(?P<lname>\\S+)$'\n matcher = re.match(re1, unicode(name), re.UNICODE)\n if not matcher:\n matcher = re.match(re2, unicode(name), re.UNICODE)\n\n if matcher:\n d = matcher.groupdict()\n res = ' '.join(x for x in (d['fname'], d['mname'], d['lname']) if x)\n return res\n\n\n# }}}\n\ndef toPubdate(log, yearAsString): # {{{\n from calibre.utils.date import parse_only_date\n res = None\n if yearAsString:\n try:\n res = parse_only_date(u\"01.01.\" + yearAsString)\n except:\n log.error('cannot parse to date %s' % yearAsString)\n return res\n\n\n# }}}\n\ndef _listToUnicodePrintStr(lst): # {{{\n return u'[' + u', '.join(unicode(x) for x in lst) + u']'\n\n\n# }}}\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug src/calibre/ebooks/metadata/sources/ozon.py\n # comment some touched_fields before run thoses tests\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n title_test, authors_test, isbn_test)\n\n test_identify_plugin(Ozon.name, [\n # (\n # {'identifiers':{}, 'title':u'\u041d\u043e\u0440\u0432\u0435\u0436\u04 41\u043a\u0438\u0439 \u044f\u0437\u044b\u043a: \u041f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u 0441\u043a\u0438\u0439 \u043a\u0443\u0440\u0441',\n # 'authors':[u'\u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a \u043e\u0432 \u0412.\u041f.', u'\u0413.\u0412. \u0428\u0430\u0442\u043a\u043e\u0432']},\n # [title_test(u'\u041d\u043e\u0440\u0432\u0435\u0436\ u0441\u043a\u0438\u0439 \u044f\u0437\u044b\u043a: \u041f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u 0441\u043a\u0438\u0439 \u043a\u0443\u0440\u0441', exact=True),\n # authors_test([u'\u0412. \u041f. \u041a\u043e\u043b\u0435\u0441\u043d\u0438\u043a\u 043e\u0432', u'\u0413. \u0412. \u0428\u0430\u0442\u043a\u043e\u0432'])]\n # ),\n (\n {'identifiers': {'isbn': '9785916572629'}},\n [title_test(u'\u041d\u0430 \u0432\u0441\u0435 \u0447\u0435\u0442\u044b\u0440\u0435 \u0441\u0442\u043e\u0440\u043e\u043d\u044b', exact=True),\n authors_test([u'\u0410. \u0410. \u0413\u0438\u043b\u043b'])]\n ),\n (\n {'identifiers': {}, 'title': u'Der Himmel Kennt Keine Gunstlinge',\n 'authors': [u'Erich Maria Remarque']},\n [title_test(u'Der Himmel Kennt Keine Gunstlinge', exact=True),\n authors_test([u'Erich Maria Remarque'])]\n ),\n (\n {'identifiers': {}, 'title': u'\u041c\u0435\u0442\u0440\u043e 2033',\n 'authors': [u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e 2033', exact=False)]\n ),\n (\n {'identifiers': {'isbn': '9785170727209'}, 'title': u'\u041c\u0435\u0442\u0440\u043e 2033',\n 'authors': [u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e 2033', exact=True),\n authors_test([u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']),\n isbn_test('9785170727209')]\n ),\n (\n {'identifiers': {'isbn': '5-699-13613-4'}, 'title': u'\u041c\u0435\u0442\u0440\u043e 2033',\n 'authors': [u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e 2033', exact=True),\n authors_test([u'\u0414\u043c\u0438\u0442\u0440\u0438\u0439 \u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a\u 0438\u0439'])]\n ),\n (\n {'identifiers': {}, 'title': u'\u041c\u0435\u0442\u0440\u043e',\n 'authors': [u'\u0413\u043b\u0443\u0445\u043e\u0432\u0441\u043a \u0438\u0439']},\n [title_test(u'\u041c\u0435\u0442\u0440\u043e', exact=False)]\n ),\n])\n# }}}\n", "google": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\n# License: GPLv3 Copyright: 2011, Kovid Goyal <kovid at kovidgoyal.net>\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport hashlib\nimport re\nimport time\nfrom Queue import Empty, Queue\n\nfrom calibre import as_unicode\nfrom calibre.ebooks.chardet import xml_to_unicode\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.book.base import Metadata\nfrom calibre.ebooks.metadata.sources.base import Source\nfrom calibre.utils.cleantext import clean_ascii_chars\nfrom calibre.utils.localization import canonicalize_lang\n\nNAMESPACES = {\n 'openSearch': 'http://a9.com/-/spec/opensearchrss/1.0/',\n 'atom': 'http://www.w3.org/2005/Atom',\n 'dc': 'http://purl.org/dc/terms',\n 'gd': 'http://schemas.google.com/g/2005'\n}\n\n\ndef get_details(browser, url, timeout): # {{{\n try:\n raw = browser.open_novisit(url, timeout=timeout).read()\n except Exception as e:\n gc = getattr(e, 'getcode', lambda: -1)\n if gc() != 403:\n raise\n # Google is throttling us, wait a little\n time.sleep(2)\n raw = browser.open_novisit(url, timeout=timeout).read()\n\n return raw\n\n\n# }}}\n\nxpath_cache = {}\n\n\ndef XPath(x):\n ans = xpath_cache.get(x)\n if ans is None:\n from lxml import etree\n ans = xpath_cache[x] = etree.XPath(x, namespaces=NAMESPACES)\n return ans\n\n\ndef cleanup_title(title):\n if ':' in title:\n return title.partition(':')[0]\n return re.sub(r'(.+?) \$.+\$', r'\\1', title)\n\n\ndef to_metadata(browser, log, entry_, timeout): # {{{\n from lxml import etree\n\n # total_results = XPath('//openSearch:totalResults')\n # start_index = XPath('//openSearch:startIndex')\n # items_per_page = XPath('//openSearch:itemsPerPage')\n entry = XPath('//atom:entry')\n entry_id = XPath('descendant::atom:id')\n creator = XPath('descendant::dc:creator')\n identifier = XPath('descendant::dc:identifier')\n title = XPath('descendant::dc:title')\n date = XPath('descendant::dc:date')\n publisher = XPath('descendant::dcublisher')\n subject = XPath('descendant::dc:subject')\n description = XPath('descendant::dc:description')\n language = XPath('descendant::dc:language')\n\n # print(etree.tostring(entry_, pretty_print=True))\n\n def get_text(extra, x):\n try:\n ans = x(extra)\n if ans:\n ans = ans[0].text\n if ans and ans.strip():\n return ans.strip()\n except:\n log.exception('Programming error:')\n return None\n\n id_url = entry_id(entry_)[0].text\n google_id = id_url.split('/')[-1]\n title_ = ': '.join([x.text for x in title(entry_)]).strip()\n authors = [x.text.strip() for x in creator(entry_) if x.text]\n if not authors:\n authors = [_('Unknown')]\n if not id_url or not title:\n # Silently discard this entry\n return None\n\n mi = Metadata(title_, authors)\n mi.identifiers = {'google': google_id}\n try:\n raw = get_details(browser, id_url, timeout)\n feed = etree.fromstring(\n xml_to_unicode(clean_ascii_chars(raw), strip_encoding_pats=True)[0]\n )\n extra = entry(feed)[0]\n except:\n log.exception('Failed to get additional details for', mi.title)\n return mi\n\n mi.comments = get_text(extra, description)\n lang = canonicalize_lang(get_text(extra, language))\n if lang:\n mi.language = lang\n mi.publisher = get_text(extra, publisher)\n\n # ISBN\n isbns = []\n for x in identifier(extra):\n t = str(x.text).strip()\n if t[:5].upper() in ('ISBN:', 'LCCN:', 'OCLC:'):\n if t[:5].upper() == 'ISBN:':\n t = check_isbn(t[5:])\n if t:\n isbns.append(t)\n if isbns:\n mi.isbn = sorted(isbns, key=len)[-1]\n mi.all_isbns = isbns\n\n # Tags\n try:\n btags = [x.text for x in subject(extra) if x.text]\n tags = []\n for t in btags:\n atags = [y.strip() for y in t.split('/')]\n for tag in atags:\n if tag not in tags:\n tags.append(tag)\n except:\n log.exception('Failed to parse tags:')\n tags = []\n if tags:\n mi.tags = [x.replace(',', ';') for x in tags]\n\n # pubdate\n pubdate = get_text(extra, date)\n if pubdate:\n from calibre.utils.date import parse_date, utcnow\n try:\n default = utcnow().replace(day=15)\n mi.pubdate = parse_date(pubdate, assume_utc=True, default=default)\n except:\n log.error('Failed to parse pubdate %r' % pubdate)\n\n # Cover\n mi.has_google_cover = None\n for x in extra.xpath(\n '//[@href and @rel=\"http://schemas.google.com/books/2008/thumbnail\"]'\n ):\n mi.has_google_cover = x.get('href')\n break\n\n return mi\n\n\n# }}}\n\n\nclass GoogleBooks(Source):\n\n name = 'Google'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads metadata and covers from Google Books')\n\n capabilities = frozenset({'identify', 'cover'})\n touched_fields = frozenset({\n 'title', 'authors', 'tags', 'pubdate', 'comments', 'publisher',\n 'identifier:isbn', 'identifier:google', 'languages'\n })\n supports_gzip_transfer_encoding = True\n cached_cover_url_is_reliable = False\n\n GOOGLE_COVER = 'https://books.google.com/books?id=%s&printsec=frontcover&img=1'\n\n DUMMY_IMAGE_MD5 = frozenset(\n {'0de4383ebad0adad5eeb8975cd796657', 'a64fa89d7ebc97075c1d363fc5fea71f'}\n )\n\n def get_book_url(self, identifiers): # {{{\n goog = identifiers.get('google', None)\n if goog is not None:\n return ('google', goog, 'https://books.google.com/books?id=%s' % goog)\n\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}): # {{{\n from urllib import urlencode\n BASE_URL = 'https://books.google.com/books/feeds/volumes?'\n isbn = check_isbn(identifiers.get('isbn', None))\n q = ''\n if isbn is not None:\n q += 'isbn:' + isbn\n elif title or authors:\n\n def build_term(prefix, parts):\n return ' '.join('in' + prefix + ':' + x for x in parts)\n\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n q += build_term('title', title_tokens)\n author_tokens = list(self.get_author_tokens(authors, only_first_author=True))\n if author_tokens:\n q += ('+' if q else '') + build_term('author', author_tokens)\n\n if isinstance(q, unicode):\n q = q.encode('utf-8')\n if not q:\n return None\n return BASE_URL + urlencode({\n 'q': q,\n 'max-results': 20,\n 'start-index': 1,\n 'min-viewability': 'none',\n })\n\n # }}}\n\n def download_cover( # {{{\n self,\n log,\n result_queue,\n abort,\n title=None,\n authors=None,\n identifiers={},\n timeout=30,\n get_best_cover=False\n ):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(\n log,\n rq,\n abort,\n title=title,\n authors=authors,\n identifiers=identifiers\n )\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(\n key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers\n )\n )\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n br = self.browser\n for candidate in (0, 1):\n if abort.is_set():\n return\n url = cached_url + '&zoom={}'.format(candidate)\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(url, timeout=timeout).read()\n if cdata:\n if hashlib.md5(cdata).hexdigest() in self.DUMMY_IMAGE_MD5:\n log.warning('Google returned a dummy image, ignoring')\n else:\n result_queue.put((self, cdata))\n break\n except Exception:\n log.exception('Failed to download cover from:', cached_url)\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n goog = identifiers.get('google', None)\n if goog is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n goog = self.cached_isbn_to_identifier(isbn)\n if goog is not None:\n url = self.cached_identifier_to_cover_url(goog)\n\n return url\n\n # }}}\n\n def get_all_details( # {{{\n self,\n br,\n log,\n entries,\n abort,\n result_queue,\n timeout\n ):\n from lxml import etree\n for relevance, i in enumerate(entries):\n try:\n ans = to_metadata(br, log, i, timeout)\n if isinstance(ans, Metadata):\n ans.source_relevance = relevance\n goog = ans.identifiers['google']\n for isbn in getattr(ans, 'all_isbns', []):\n self.cache_isbn_to_identifier(isbn, goog)\n if getattr(ans, 'has_google_cover', False):\n self.cache_identifier_to_cover_url(\n goog, self.GOOGLE_COVER % goog\n )\n self.clean_downloaded_metadata(ans)\n result_queue.put(ans)\n except:\n log.exception(\n 'Failed to get metadata for identify entry:', etree.tostring(i)\n )\n if abort.is_set():\n break\n\n # }}}\n\n def identify( # {{{\n self,\n log,\n result_queue,\n abort,\n title=None,\n authors=None,\n identifiers={},\n timeout=30\n ):\n from lxml import etree\n entry = XPath('//atom:entry')\n\n query = self.create_query(\n log, title=title, authors=authors, identifiers=identifiers\n )\n if not query:\n log.error('Insufficient metadata to construct query')\n return\n br = self.browser\n log('Making query:', query)\n try:\n raw = br.open_novisit(query, timeout=timeout).read()\n except Exception as e:\n log.exception('Failed to make identify query: %r' % query)\n return as_unicode(e)\n\n try:\n parser = etree.XMLParser(recover=True, no_network=True)\n feed = etree.fromstring(\n xml_to_unicode(clean_ascii_chars(raw), strip_encoding_pats=True)[0],\n parser=parser\n )\n entries = entry(feed)\n except Exception as e:\n log.exception('Failed to parse identify results')\n return as_unicode(e)\n\n if not entries and title and not abort.is_set():\n if identifiers:\n log('No results found, retrying without identifiers')\n return self.identify(\n log,\n result_queue,\n abort,\n title=title,\n authors=authors,\n timeout=timeout\n )\n ntitle = cleanup_title(title)\n if ntitle and ntitle != title:\n log('No results found, retrying without sub-title')\n return self.identify(\n log,\n result_queue,\n abort,\n title=ntitle,\n authors=authors,\n timeout=timeout\n )\n\n # There is no point running these queries in threads as google\n # throttles requests returning 403 Forbidden errors\n self.get_all_details(br, log, entries, abort, result_queue, timeout)\n\n # }}}\n\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug\n # src/calibre/ebooks/metadata/sources/google.py\n from calibre.ebooks.metadata.sources.test import (\n test_identify_plugin, title_test, authors_test\n )\n tests = [({\n 'identifiers': {\n 'isbn': '0743273567'\n },\n 'title': 'Great Gatsby',\n 'authors': ['Fitzgerald']\n }, [\n title_test('The great gatsby', exact=True),\n authors_test(['F. Scott Fitzgerald'])\n ]), ({\n 'title': 'Flatland',\n 'authors': ['Abbott']\n }, [title_test('Flatland', exact=False)]), ({\n 'title':\n 'The Blood Red Indian Summer: A Berger and Mitry Mystery',\n 'authors': ['David Handler'],\n }, [title_test('The Blood Red Indian Summer: A Berger and Mitry Mystery')])]\n test_identify_plugin(GoogleBooks.name, tests[:])\n\n# }}}\n", "hashes": { "amazon": "b4e05a23d5977a29413f8c31a5d4221b139ed18d", "overdrive": "2e9fced7c6f8d8778ddfd30bca4dafce07e29667", "big_book_search": "be5d30f0338d7218ccc9ce789bc0c1abab782d20", "ozon": "7c0227525310f7b2cb09df4406a8c403ec12a908", "google": "9c0b40f729cfc7166015c5058730814832b6d4c5", "search_engines": "64c567211638569b273f38a309b640a4b6c94584", "edelweiss": "d16963f0cd71f91b620303660f51420cbec06097", "google_images": "e7e815ad0d8cafd3782cda61a4fbad0bb54f6518", "douban": "7b23c5f63e17c65f80cc630e22dd68b2342ba8ba", "openlibrary": "ad68135f861170468aab5fcb2a6a33e697c21459" }, "edelweiss": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nimport time, re\nfrom threading import Thread\nfrom Queue import Queue, Empty\n\nfrom calibre import as_unicode, random_user_agent\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source\n\n\ndef clean_html(raw):\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.cleantext import clean_ascii_chars\n return clean_ascii_chars(xml_to_unicode(raw, strip_encoding_pats=True,\n resolve_entities=True, assume_utf8=True)[0])\n\n\ndef parse_html(raw):\n raw = clean_html(raw)\n from html5_parser import parse\n return parse(raw)\n\n\ndef astext(node):\n from lxml import etree\n return etree.tostring(node, method='text', encoding=unicode,\n with_tail=False).strip()\n\n\nclass Worker(Thread): # {{{\n\n def __init__(self, basic_data, relevance, result_queue, br, timeout, log, plugin):\n Thread.__init__(self)\n self.daemon = True\n self.basic_data = basic_data\n self.br, self.log, self.timeout = br, log, timeout\n self.result_queue, self.plugin, self.sku = result_queue, plugin, self.basic_data['sku']\n self.relevance = relevance\n\n def run(self):\n url = ('https://www.edelweiss.plus/GetTreelineControl.aspx?controlName=/uc/product/two_Enhanced.ascx&'\n 'sku={0}&idPrefix=content_1_{0}&mode=0'.format(sel f.sku))\n try:\n raw = self.br.open_novisit(url, timeout=self.timeout).read()\n except:\n self.log.exception('Failed to load comments page: %r'%url)\n return\n\n try:\n mi = self.parse(raw)\n mi.source_relevance = self.relevance\n self.plugin.clean_downloaded_metadata(mi)\n self.result_queue.put(mi)\n except:\n self.log.exception('Failed to parse details for sku: %s'%self.sku)\n\n def parse(self, raw):\n from calibre.ebooks.metadata.book.base import Metadata\n from calibre.utils.date import UNDEFINED_DATE\n root = parse_html(raw)\n mi = Metadata(self.basic_data['title'], self.basic_data['authors'])\n\n # Identifiers\n if self.basic_data['isbns']:\n mi.isbn = self.basic_data['isbns'][0]\n mi.set_identifier('edelweiss', self.sku)\n\n # Tags\n if self.basic_data['tags']:\n mi.tags = self.basic_data['tags']\n mi.tags = [t[1:].strip() if t.startswith('&') else t for t in mi.tags]\n\n # Publisher\n mi.publisher = self.basic_data['publisher']\n\n # Pubdate\n if self.basic_data['pubdate'] and self.basic_data['pubdate'].year != UNDEFINED_DATE:\n mi.pubdate = self.basic_data['pubdate']\n\n # Rating\n if self.basic_data['rating']:\n mi.rating = self.basic_data['rating']\n\n # Comments\n comments = ''\n for cid in ('summary', 'contributorbio', 'quotes_reviews'):\n cid = 'desc_{}{}-content'.format(cid, self.sku)\n div = root.xpath('//[@id=\"{}\"]'.format(cid))\n if div:\n comments += self.render_comments(div[0])\n if comments:\n mi.comments = comments\n\n mi.has_cover = self.plugin.cached_identifier_to_cover_url(self.sk u) is not None\n return mi\n\n def render_comments(self, desc):\n from lxml import etree\n from calibre.library.comments import sanitize_comments_html\n for c in desc.xpath('descendant::noscript'):\n c.getparent().remove(c)\n for a in desc.xpath('descendant::a[@href]'):\n del a.attrib['href']\n a.tag = 'span'\n desc = etree.tostring(desc, method='html', encoding=unicode).strip()\n\n # remove all attributes from tags\n desc = re.sub(r'<([a-zA-Z0-9]+)\\s[^>]+>', r'<\\1>', desc)\n # Collapse whitespace\n # desc = re.sub('\\n+', '\\n', desc)\n # desc = re.sub(' +', ' ', desc)\n # Remove comments\n desc = re.sub(r'(?s)<!--.?-->', '', desc)\n return sanitize_comments_html(desc)\n# }}}\n\n\ndef get_basic_data(browser, log, skus):\n from calibre.utils.date import parse_only_date\n from mechanize import Request\n zeroes = ','.join('0' for sku in skus)\n data = {\n 'skus': ','.join(skus),\n 'drc': zeroes,\n 'startPosition': '0',\n 'sequence': '1',\n 'selected': zeroes,\n 'itemID': '0',\n 'orderID': '0',\n 'mailingID': '',\n 'tContentWidth': '926',\n 'originalOrder': ','.join(str(i) for i in range(len(skus))),\n 'selectedOrderID': '0',\n 'selectedSortColumn': '0',\n 'listType': '1',\n 'resultType': '32',\n 'blockView': '1',\n }\n items_data_url = 'https://www.edelweiss.plus/GetTreelineControl.aspx?controlName=/uc/listviews/ListView_Title_Multi.ascx'\n req = Request(items_data_url, data)\n response = browser.open_novisit(req)\n raw = response.read()\n root = parse_html(raw)\n for item in root.xpath('//div[@data-priority]'):\n row = item.getparent().getparent()\n sku = item.get('id').split('-')[-1]\n isbns = [x.strip() for x in row.xpath('descendant::[contains(@class, \"pev_sku\")]/text()')[0].split(',') if check_isbn(x.strip())]\n isbns.sort(key=len, reverse=True)\n try:\n tags = [x.strip() for x in astext(row.xpath('descendant::[contains(@class, \"pev_categories\")]')[0]).split('/')]\n except IndexError:\n tags = []\n rating = 0\n for bar in row.xpath('descendant::[contains(@class, \"bgdColorCommunity\")]/@style'):\n m = re.search('width: (\\d+)px;.max-width: (\\d+)px', bar)\n if m is not None:\n rating = float(m.group(1)) / float(m.group(2))\n break\n try:\n pubdate = parse_only_date(astext(row.xpath('descendant::[contains(@class, \"pev_shipDate\")]')[0]\n ).split(':')[-1].split(u'\\xa0')[-1].strip(), assume_utc=True)\n except Exception:\n log.exception('Error parsing published date')\n pubdate = None\n authors = []\n for x in [x.strip() for x in row.xpath('descendant::[contains(@class, \"pev_contributor\")]/@title')]:\n authors.extend(a.strip() for a in x.split(','))\n entry = {\n 'sku': sku,\n 'cover': row.xpath('descendant::img/@src')[0].split('?')[0],\n 'publisher': astext(row.xpath('descendant::[contains(@class, \"headerPublisher\")]')[0]),\n 'title': astext(row.xpath('descendant::[@id=\"title_{}\"]'.format(sku))[0]),\n 'authors': authors,\n 'isbns': isbns,\n 'tags': tags,\n 'pubdate': pubdate,\n 'format': ' '.join(row.xpath('descendant::[contains(@class, \"pev_format\")]/text()')).strip(),\n 'rating': rating,\n }\n if entry['cover'].startswith('/'):\n entry['cover'] = None\n yield entry\n\n\nclass Edelweiss(Source):\n\n name = 'Edelweiss'\n version = (2, 0, 1)\n minimum_calibre_version = (3, 6, 0)\n description = _('Downloads metadata and covers from Edelweiss - A catalog updated by book publishers')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset([\n 'title', 'authors', 'tags', 'pubdate', 'comments', 'publisher',\n 'identifier:isbn', 'identifier:edelweiss', 'rating'])\n supports_gzip_transfer_encoding = True\n has_html_comments = True\n\n @property\n def user_agent(self):\n # Pass in an index to random_user_agent() to test with a particular\n # user agent\n return random_user_agent(allow_ie=False)\n\n def _get_book_url(self, sku):\n if sku:\n return 'https://www.edelweiss.plus/#sku={}&page=1'.format(sku)\n\n def get_book_url(self, identifiers): # {{{\n sku = identifiers.get('edelweiss', None)\n if sku:\n return 'edelweiss', sku, self._get_book_url(sku)\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n sku = identifiers.get('edelweiss', None)\n if not sku:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n sku = self.cached_isbn_to_identifier(isbn)\n return self.cached_identifier_to_cover_url(sku)\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}):\n from urllib import urlencode\n import time\n BASE_URL = ('https://www.edelweiss.plus/GetTreelineControl.aspx?'\n 'controlName=/uc/listviews/controls/ListView_data.ascx&itemID=0&resultType=32&dashboar dType=8&itemType=1&dataType=products&keywordSearch &')\n keywords = []\n isbn = check_isbn(identifiers.get('isbn', None))\n if isbn is not None:\n keywords.append(isbn)\n elif title:\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n keywords.extend(title_tokens)\n author_tokens = self.get_author_tokens(authors, only_first_author=True)\n if author_tokens:\n keywords.extend(author_tokens)\n if not keywords:\n return None\n params = {\n 'q': (' '.join(keywords)).encode('utf-8'),\n '_': str(int(time.time()))\n }\n return BASE_URL+urlencode(params)\n\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=30):\n import json\n\n br = self.browser\n br.addheaders = [\n ('Referer', 'https://www.edelweiss.plus/'),\n ('X-Requested-With', 'XMLHttpRequest'),\n ('Cache-Control', 'no-cache'),\n ('Pragma', 'no-cache'),\n ]\n if 'edelweiss' in identifiers:\n items = [identifiers['edelweiss']]\n else:\n log.error('Currently Edelweiss returns random books for search queries')\n return\n query = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers)\n if not query:\n log.error('Insufficient metadata to construct query')\n return\n log('Using query URL:', query)\n try:\n raw = br.open(query, timeout=timeout).read().decode('utf-8')\n except Exception as e:\n log.exception('Failed to make identify query: %r'%query)\n return as_unicode(e)\n items = re.search('window[.]items\\s=\\s(.+?);', raw)\n if items is None:\n log.error('Failed to get list of matching items')\n log.debug('Response text:')\n log.debug(raw)\n return\n items = json.loads(items.group(1))\n\n if (not items and identifiers and title and authors and\n not abort.is_set()):\n return self.identify(log, result_queue, abort, title=title,\n authors=authors, timeout=timeout)\n\n if not items:\n return\n\n workers = []\n items = items[:5]\n for i, item in enumerate(get_basic_data(self.browser, log, items)):\n sku = item['sku']\n for isbn in item['isbns']:\n self.cache_isbn_to_identifier(isbn, sku)\n if item['cover']:\n self.cache_identifier_to_cover_url(sku, item['cover'])\n fmt = item['format'].lower()\n if 'audio' in fmt or 'mp3' in fmt:\n continue # Audio-book, ignore\n workers.append(Worker(item, i, result_queue, br.clone_browser(), timeout, log, self))\n\n if not workers:\n return\n\n for w in workers:\n w.start()\n # Don't send all requests at the same time\n time.sleep(0.1)\n\n while not abort.is_set():\n a_worker_is_alive = False\n for w in workers:\n w.join(0.2)\n if abort.is_set():\n break\n if w.is_alive():\n a_worker_is_alive = True\n if not a_worker_is_alive:\n break\n\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n br = self.browser\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(cached_url, timeout=timeout).read()\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n # }}}\n\n\nif __name__ == '__main__':\n from calibre.ebooks.metadata.sources.test import (\n test_identify_plugin, title_test, authors_test, comments_test, pubdate_test)\n tests = [\n ( # A title and author search\n {'title': 'The Husband\\'s Secret', 'authors':['Liane Moriarty']},\n [title_test('The Husband\\'s Secret', exact=True),\n authors_test(['Liane Moriarty'])]\n ),\n\n ( # An isbn present in edelweiss\n {'identifiers':{'isbn': '9780312621360'}, },\n [title_test('Flame: A Sky Chasers Novel', exact=True),\n authors_test(['Amy Kathleen Ryan'])]\n ),\n\n # Multiple authors and two part title and no general description\n ({'identifiers':{'edelweiss':'0321180607'}},\n [title_test(\n \"XQuery From the Experts:\u00a0A Guide to the W3C XML Query Language\"\n , exact=True), authors_test([\n 'Howard Katz', 'Don Chamberlin', 'Denise Draper', 'Mary Fernandez',\n 'Michael Kay', 'Jonathan Robie', 'Michael Rys', 'Jerome Simeon',\n 'Jim Tivy', 'Philip Wadler']), pubdate_test(2003, 8, 22),\n comments_test('J\u00e9r\u00f4me Sim\u00e9on'), lambda mi: bool(mi.comments and 'No title summary' not in mi.comments)\n ]),\n ]\n start, stop = 0, len(tests)\n\n tests = tests[start:stop]\n test_identify_plugin(Edelweiss.name, tests)\n", "google_images": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2013, Kovid Goyal <kovid@kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nfrom collections import OrderedDict\n\nfrom calibre import random_user_agent\nfrom calibre.ebooks.metadata.sources.base import Source, Option\n\n\ndef parse_html(raw):\n try:\n from html5_parser import parse\n except ImportError:\n # Old versions of calibre\n import html5lib\n return html5lib.parse(raw, treebuilder='lxml', namespaceHTMLElements=False)\n else:\n return parse(raw)\n\n\nclass GoogleImages(Source):\n\n name = 'Google Images'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads covers from a Google Image search. Useful to find larger/alternate covers.')\n capabilities = frozenset(['cover'])\n can_get_multiple_covers = True\n supports_gzip_transfer_encoding = True\n options = (Option('max_covers', 'number', 5, _('Maximum number of covers to get'),\n _('The maximum number of covers to process from the Google search result')),\n Option('size', 'choices', 'svga', _('Cover size'),\n _('Search for covers larger than the specified size'),\n choices=OrderedDict((\n ('any', _('Any size'),),\n ('l', _('Large'),),\n ('qsvga', _('Larger than %s')%'400x300',),\n ('vga', _('Larger than %s')%'640x480',),\n ('svga', _('Larger than %s')%'600x800',),\n ('xga', _('Larger than %s')%'1024x768',),\n ('2mp', _('Larger than %s')%'2 MP',),\n ('4mp', _('Larger than %s')%'4 MP',),\n ))),\n )\n\n def download_cover(self, log, result_queue, abort,\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n if not title:\n return\n timeout = max(60, timeout) # Needs at least a minute\n title = ' '.join(self.get_title_tokens(title))\n author = ' '.join(self.get_author_tokens(authors))\n urls = self.get_image_urls(title, author, log, abort, timeout)\n self.download_multiple_covers(title, authors, urls, get_best_cover, timeout, result_queue, abort, log)\n\n @property\n def user_agent(self):\n return random_user_agent(allow_ie=False)\n\n def get_image_urls(self, title, author, log, abort, timeout):\n from calibre.utils.cleantext import clean_ascii_chars\n from urllib import urlencode\n import json\n from collections import OrderedDict\n ans = OrderedDict()\n br = self.browser\n q = urlencode({'as_q': ('%s %s'%(title, author)).encode('utf-8')}).decode('utf-8')\n sz = self.prefs['size']\n if sz == 'any':\n sz = ''\n elif sz == 'l':\n sz = 'isz:l,'\n else:\n sz = 'isz:lt,islt:%s,' % sz\n # See https://www.google.com/advanced_image_search to understand this\n # URL scheme\n url = 'https://www.google.com/search?as_st=y&tbm=isch&{}&as_epq=&as_oq=&as_eq=&c r=&as_sitesearch=&safe=images&tbs={}iar:t,ift:jpg' .format(q, sz)\n log('Search URL: ' + url)\n raw = clean_ascii_chars(br.open(url).read().decode('utf-8'))\n root = parse_html(raw)\n for div in root.xpath('//div[@class=\"rg_meta notranslate\"]'):\n try:\n data = json.loads(div.text)\n except Exception:\n continue\n if 'ou' in data:\n ans[data['ou']] = True\n return list(ans.iterkeys())\n\n\ndef test():\n from Queue import Queue\n from threading import Event\n from calibre.utils.logging import default_log\n p = GoogleImages(None)\n p.log = default_log\n rq = Queue()\n p.download_cover(default_log, rq, Event(), title='The Heroes',\n authors=('Joe Abercrombie',))\n print('Downloaded', rq.qsize(), 'covers')\n\n\nif __name__ == '__main__':\n test()\n", "douban": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>; 2011, Li Fanxi <lifanxi@freemindworld.com>'\n__docformat__ = 'restructuredtext en'\n\nimport time\nfrom functools import partial\nfrom Queue import Queue, Empty\n\n\nfrom calibre.ebooks.metadata import check_isbn\nfrom calibre.ebooks.metadata.sources.base import Source\nfrom calibre.ebooks.metadata.book.base import Metadata\nfrom calibre import as_unicode\n\nNAMESPACES = {\n 'openSearch':'http://a9.com/-/spec/opensearchrss/1.0/',\n 'atom' : 'http://www.w3.org/2005/Atom',\n 'db': 'https://www.douban.com/xmlns/',\n 'gd': 'http://schemas.google.com/g/2005'\n }\n\n\ndef get_details(browser, url, timeout): # {{{\n try:\n if Douban.DOUBAN_API_KEY and Douban.DOUBAN_API_KEY != '':\n url = url + \"?apikey=\" + Douban.DOUBAN_API_KEY\n raw = browser.open_novisit(url, timeout=timeout).read()\n except Exception as e:\n gc = getattr(e, 'getcode', lambda : -1)\n if gc() != 403:\n raise\n # Douban is throttling us, wait a little\n time.sleep(2)\n raw = browser.open_novisit(url, timeout=timeout).read()\n\n return raw\n# }}}\n\n\ndef to_metadata(browser, log, entry_, timeout): # {{{\n from lxml import etree\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.date import parse_date, utcnow\n from calibre.utils.cleantext import clean_ascii_chars\n\n XPath = partial(etree.XPath, namespaces=NAMESPACES)\n entry = XPath('//atom:entry')\n entry_id = XPath('descendant::atom:id')\n title = XPath('descendant::atom:title')\n description = XPath('descendant::atom:summary')\n publisher = XPath(\"descendant::db:attribute[@name='publisher']\")\n isbn = XPath(\"descendant::db:attribute[@name='isbn13']\")\n date = XPath(\"descendant::db:attribute[@name='pubdate']\")\n creator = XPath(\"descendant::db:attribute[@name='author']\")\n booktag = XPath(\"descendant::db:tag/attribute::name\")\n rating = XPath(\"descendant::gd:rating/attribute::average\")\n cover_url = XPath(\"descendant::atom:link[@rel='image']/attribute::href\")\n\n def get_text(extra, x):\n try:\n ans = x(extra)\n if ans:\n ans = ans[0].text\n if ans and ans.strip():\n return ans.strip()\n except:\n log.exception('Programming error:')\n return None\n\n id_url = entry_id(entry_)[0].text.replace('http://', 'https://')\n douban_id = id_url.split('/')[-1]\n title_ = ': '.join([x.text for x in title(entry_)]).strip()\n authors = [x.text.strip() for x in creator(entry_) if x.text]\n if not authors:\n authors = [_('Unknown')]\n if not id_url or not title:\n # Silently discard this entry\n return None\n\n mi = Metadata(title_, authors)\n mi.identifiers = {'douban':douban_id}\n try:\n raw = get_details(browser, id_url, timeout)\n feed = etree.fromstring(xml_to_unicode(clean_ascii_chars( raw),\n strip_encoding_pats=True)[0])\n extra = entry(feed)[0]\n except:\n log.exception('Failed to get additional details for', mi.title)\n return mi\n mi.comments = get_text(extra, description)\n mi.publisher = get_text(extra, publisher)\n\n # ISBN\n isbns = []\n for x in [t.text for t in isbn(extra)]:\n if check_isbn(x):\n isbns.append(x)\n if isbns:\n mi.isbn = sorted(isbns, key=len)[-1]\n mi.all_isbns = isbns\n\n # Tags\n try:\n btags = [x for x in booktag(extra) if x]\n tags = []\n for t in btags:\n atags = [y.strip() for y in t.split('/')]\n for tag in atags:\n if tag not in tags:\n tags.append(tag)\n except:\n log.exception('Failed to parse tags:')\n tags = []\n if tags:\n mi.tags = [x.replace(',', ';') for x in tags]\n\n # pubdate\n pubdate = get_text(extra, date)\n if pubdate:\n try:\n default = utcnow().replace(day=15)\n mi.pubdate = parse_date(pubdate, assume_utc=True, default=default)\n except:\n log.error('Failed to parse pubdate %r'%pubdate)\n\n # Ratings\n if rating(extra):\n try:\n mi.rating = float(rating(extra)[0]) / 2.0\n except:\n log.exception('Failed to parse rating')\n mi.rating = 0\n\n # Cover\n mi.has_douban_cover = None\n u = cover_url(extra)\n if u:\n u = u[0].replace('/spic/', '/lpic/')\n # If URL contains \"book-default\", the book doesn't have a cover\n if u.find('book-default') == -1:\n mi.has_douban_cover = u\n return mi\n# }}}\n\n\nclass Douban(Source):\n\n name = 'Douban Books'\n author = 'Li Fanxi'\n version = (2, 1, 0)\n minimum_calibre_version = (2, 80, 0)\n\n description = _('Downloads metadata and covers from Douban.com. '\n 'Useful only for Chinese language books.')\n\n capabilities = frozenset(['identify', 'cover'])\n touched_fields = frozenset(['title', 'authors', 'tags',\n 'pubdate', 'comments', 'publisher', 'identifier:isbn', 'rating',\n 'identifier:douban']) # language currently disabled\n supports_gzip_transfer_encoding = True\n cached_cover_url_is_reliable = True\n\n DOUBAN_API_KEY = '0bd1672394eb1ebf2374356abec15c3d'\n DOUBAN_BOOK_URL = 'https://book.douban.com/subject/%s/'\n\n def get_book_url(self, identifiers): # {{{\n db = identifiers.get('douban', None)\n if db is not None:\n return ('douban', db, self.DOUBAN_BOOK_URL%db)\n # }}}\n\n def create_query(self, log, title=None, authors=None, identifiers={}): # {{{\n from urllib import urlencode\n SEARCH_URL = 'https://api.douban.com/book/subjects?'\n ISBN_URL = 'https://api.douban.com/book/subject/isbn/'\n SUBJECT_URL = 'https://api.douban.com/book/subject/'\n\n q = ''\n t = None\n isbn = check_isbn(identifiers.get('isbn', None))\n subject = identifiers.get('douban', None)\n if isbn is not None:\n q = isbn\n t = 'isbn'\n elif subject is not None:\n q = subject\n t = 'subject'\n elif title or authors:\n def build_term(prefix, parts):\n return ' '.join(x for x in parts)\n title_tokens = list(self.get_title_tokens(title))\n if title_tokens:\n q += build_term('title', title_tokens)\n author_tokens = list(self.get_author_tokens(authors,\n only_first_author=True))\n if author_tokens:\n q += ((' ' if q != '' else '') +\n build_term('author', author_tokens))\n t = 'search'\n q = q.strip()\n if isinstance(q, unicode):\n q = q.encode('utf-8')\n if not q:\n return None\n url = None\n if t == \"isbn\":\n url = ISBN_URL + q\n elif t == 'subject':\n url = SUBJECT_URL + q\n else:\n url = SEARCH_URL + urlencode({\n 'q': q,\n })\n if self.DOUBAN_API_KEY and self.DOUBAN_API_KEY != '':\n if t == \"isbn\" or t == \"subject\":\n url = url + \"?apikey=\" + self.DOUBAN_API_KEY\n else:\n url = url + \"&apikey=\" + self.DOUBAN_API_KEY\n return url\n # }}}\n\n def download_cover(self, log, result_queue, abort, # {{{\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n cached_url = self.get_cached_cover_url(identifiers)\n if cached_url is None:\n log.info('No cached cover found, running identify')\n rq = Queue()\n self.identify(log, rq, abort, title=title, authors=authors,\n identifiers=identifiers)\n if abort.is_set():\n return\n results = []\n while True:\n try:\n results.append(rq.get_nowait())\n except Empty:\n break\n results.sort(key=self.identify_results_keygen(\n title=title, authors=authors, identifiers=identifiers))\n for mi in results:\n cached_url = self.get_cached_cover_url(mi.identifiers)\n if cached_url is not None:\n break\n if cached_url is None:\n log.info('No cover found')\n return\n\n if abort.is_set():\n return\n br = self.browser\n log('Downloading cover from:', cached_url)\n try:\n cdata = br.open_novisit(cached_url, timeout=timeout).read()\n if cdata:\n result_queue.put((self, cdata))\n except:\n log.exception('Failed to download cover from:', cached_url)\n\n # }}}\n\n def get_cached_cover_url(self, identifiers): # {{{\n url = None\n db = identifiers.get('douban', None)\n if db is None:\n isbn = identifiers.get('isbn', None)\n if isbn is not None:\n db = self.cached_isbn_to_identifier(isbn)\n if db is not None:\n url = self.cached_identifier_to_cover_url(db)\n\n return url\n # }}}\n\n def get_all_details(self, br, log, entries, abort, # {{{\n result_queue, timeout):\n from lxml import etree\n for relevance, i in enumerate(entries):\n try:\n ans = to_metadata(br, log, i, timeout)\n if isinstance(ans, Metadata):\n ans.source_relevance = relevance\n db = ans.identifiers['douban']\n for isbn in getattr(ans, 'all_isbns', []):\n self.cache_isbn_to_identifier(isbn, db)\n if ans.has_douban_cover:\n self.cache_identifier_to_cover_url(db,\n ans.has_douban_cover)\n self.clean_downloaded_metadata(ans)\n result_queue.put(ans)\n except:\n log.exception(\n 'Failed to get metadata for identify entry:',\n etree.tostring(i))\n if abort.is_set():\n break\n # }}}\n\n def identify(self, log, result_queue, abort, title=None, authors=None, # {{{\n identifiers={}, timeout=30):\n from lxml import etree\n from calibre.ebooks.chardet import xml_to_unicode\n from calibre.utils.cleantext import clean_ascii_chars\n\n XPath = partial(etree.XPath, namespaces=NAMESPACES)\n entry = XPath('//atom:entry')\n\n query = self.create_query(log, title=title, authors=authors,\n identifiers=identifiers)\n if not query:\n log.error('Insufficient metadata to construct query')\n return\n br = self.browser\n try:\n raw = br.open_novisit(query, timeout=timeout).read()\n except Exception as e:\n log.exception('Failed to make identify query: %r'%query)\n return as_unicode(e)\n try:\n parser = etree.XMLParser(recover=True, no_network=True)\n feed = etree.fromstring(xml_to_unicode(clean_ascii_chars( raw),\n strip_encoding_pats=True)[0], parser=parser)\n entries = entry(feed)\n except Exception as e:\n log.exception('Failed to parse identify results')\n return as_unicode(e)\n if not entries and identifiers and title and authors and \\\n not abort.is_set():\n return self.identify(log, result_queue, abort, title=title,\n authors=authors, timeout=timeout)\n\n # There is no point running these queries in threads as douban\n # throttles requests returning 403 Forbidden errors\n self.get_all_details(br, log, entries, abort, result_queue, timeout)\n\n return None\n # }}}\n\n\nif __name__ == '__main__': # tests {{{\n # To run these test use: calibre-debug -e src/calibre/ebooks/metadata/sources/douban.py\n from calibre.ebooks.metadata.sources.test import (test_identify_plugin,\n title_test, authors_test)\n test_identify_plugin(Douban.name,\n [\n\n\n (\n {'identifiers':{'isbn': '9787536692930'}, 'title':'\u4e09\u4f53',\n 'authors':['\u5218\u6148\u6b23']},\n [title_test('\u4e09\u4f53', exact=True),\n authors_test(['\u5218\u6148\u6b23'])]\n ),\n\n (\n {'title': 'Linux\u5185\u6838\u4fee\u70bc\u4e4b\u9053', 'authors':['\u4efb\u6865\u4f1f']},\n [title_test('Linux\u5185\u6838\u4fee\u70bc\u4e4b\u9 053', exact=False)]\n ),\n ])\n# }}}\n", "openlibrary": "#!/usr/bin/env python2\n# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai\nfrom __future__ import (unicode_literals, division, absolute_import,\n print_function)\n\n__license__ = 'GPL v3'\n__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'\n__docformat__ = 'restructuredtext en'\n\nfrom calibre.ebooks.metadata.sources.base import Source\n\n\nclass OpenLibrary(Source):\n\n name = 'Open Library'\n version = (1, 0, 0)\n minimum_calibre_version = (2, 80, 0)\n description = _('Downloads covers from The Open Library')\n\n capabilities = frozenset(['cover'])\n\n OPENLIBRARY = 'https://covers.openlibrary.org/b/isbn/%s-L.jpg?default=false'\n\n def download_cover(self, log, result_queue, abort,\n title=None, authors=None, identifiers={}, timeout=30, get_best_cover=False):\n if 'isbn' not in identifiers:\n return\n isbn = identifiers['isbn']\n br = self.browser\n try:\n ans = br.open_novisit(self.OPENLIBRARY%isbn, timeout=timeout).read()\n result_queue.put((self, ans))\n except Exception as e:\n if callable(getattr(e, 'getcode', None)) and e.getcode() == 404:\n log.error('No cover for ISBN: %r found'%isbn)\n else:\n log.exception('Failed to download cover for ISBN:', isbn)\n", "search_engines": "#!/usr/bin/env python2\n# vim:fileencoding=utf-8\n# License: GPLv3 Copyright: 2017, Kovid Goyal <kovid at kovidgoyal.net>\n\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport json\nimport re\nimport time\nfrom collections import defaultdict, namedtuple\nfrom polyglot.builtins import map\nfrom urllib import quote_plus, urlencode\nfrom urlparse import parse_qs\n\nfrom lxml import etree\n\nfrom calibre import browser as _browser, prints, random_user_agent\nfrom calibre.utils.monotonic import monotonic\nfrom calibre.utils.random_ua import accept_header_for_ua\n\ncurrent_version = (1, 0, 1)\nminimum_calibre_version = (2, 80, 0)\n\n\nlast_visited = defaultdict(lambda: 0)\nResult = namedtuple('Result', 'url title cached_url')\n\n\ndef tostring(elem):\n return etree.tostring(elem, encoding=unicode, method='text', with_tail=False)\n\n\ndef browser():\n ua = random_user_agent(allow_ie=False)\n br = _browser(user_agent=ua)\n br.set_handle_gzip(True)\n br.addheaders += [\n ('Accept', accept_header_for_ua(ua)),\n ('Upgrade-insecure-requests', '1'),\n ]\n return br\n\n\ndef encode_query(query):\n q = {k.encode('utf-8'): v.encode('utf-8') for k, v in query.iteritems()}\n return urlencode(q).decode('utf-8')\n\n\ndef parse_html(raw):\n try:\n from html5_parser import parse\n except ImportError:\n # Old versions of calibre\n import html5lib\n return html5lib.parse(raw, treebuilder='lxml', namespaceHTMLElements=False)\n else:\n return parse(raw)\n\n\ndef query(br, url, key, dump_raw=None, limit=1, parser=parse_html, timeout=60):\n delta = monotonic() - last_visited[key]\n if delta < limit and delta > 0:\n time.sleep(delta)\n try:\n raw = br.open_novisit(url, timeout=timeout).read()\n finally:\n last_visited[key] = monotonic()\n if dump_raw is not None:\n with open(dump_raw, 'wb') as f:\n f.write(raw)\n return parser(raw)\n\n\ndef quote_term(x):\n return quote_plus(x.encode('utf-8')).decode('utf-8')\n\n\n# DDG + Wayback machine {{{\n\ndef ddg_term(t):\n t = t.replace('\"', '')\n if t.lower() in {'map', 'news'}:\n t = '\"' + t + '\"'\n if t in {'OR', 'AND', 'NOT'}:\n t = t.lower()\n return t\n\n\ndef ddg_href(url):\n if url.startswith('/'):\n q = url.partition('?')[2]\n url = parse_qs(q.encode('utf-8'))['uddg'][0].decode('utf-8')\n return url\n\n\ndef wayback_machine_cached_url(url, br=None, log=prints, timeout=60):\n q = quote_term(url)\n br = br or browser()\n data = query(br, 'https://archive.org/wayback/available?url=' +\n q, 'wayback', parser=json.loads, limit=0.25, timeout=timeout)\n try:\n closest = data['archived_snapshots']['closest']\n if closest['available']:\n return closest['url'].replace('http:', 'https:')\n except Exception:\n pass\n from pprint import pformat\n log('Response from wayback machine:', pformat(data))\n\n\ndef wayback_url_processor(url):\n if url.startswith('/'):\n # Use original URL instead of absolutizing to wayback URL as wayback is\n # slow\n m = re.search('https?:', url)\n if m is None:\n url = 'https://web.archive.org' + url\n else:\n url = url[m.start():]\n return url\n\n\ndef ddg_search(terms, site=None, br=None, log=prints, safe_search=False, dump_raw=None, timeout=60):\n # https://duck.co/help/results/syntax\n terms = map(ddg_term, terms)\n terms = [quote_term(t) for t in terms]\n if site is not None:\n terms.append(quote_term(('site:' + site)))\n q = '+'.join(terms)\n url = 'https://duckduckgo.com/html/?q={q}&kp={kp}'.format(\n q=q, kp=1 if safe_search else -1)\n log('Making ddg query: ' + url)\n br = br or browser()\n root = query(br, url, 'ddg', dump_raw, timeout=timeout)\n ans = []\n for a in root.xpath('//[@class=\"results\"]//[@class=\"result__title\"]/a[@href and @class=\"result__a\"]'):\n ans.append(Result(ddg_href(a.get('href')), tostring(a), None))\n return ans, url\n\n\ndef ddg_develop():\n br = browser()\n for result in ddg_search('heroes abercrombie'.split(), 'www.amazon.com', dump_raw='/t/raw.html', br=br)[0]:\n if '/dp/' in result.url:\n print(result.title)\n print(' ', result.url)\n print(' ', wayback_machine_cached_url(result.url, br))\n print()\n# }}}\n\n# Bing {{{\n\n\ndef bing_term(t):\n t = t.replace('\"', '')\n if t in {'OR', 'AND', 'NOT'}:\n t = t.lower()\n return t\n\n\ndef bing_url_processor(url):\n return url\n\n\ndef bing_search(terms, site=None, br=None, log=prints, safe_search=False, dump_raw=None, timeout=60):\n # http://vlaurie.com/computers2/Articles/bing_advanced_search.htm\n terms = map(bing_term, terms)\n terms = [quote_term(t) for t in terms]\n if site is not None:\n terms.append(quote_term(('site:' + site)))\n q = '+'.join(terms)\n url = 'https://www.bing.com/search?q={q}'.format(q=q)\n log('Making bing query: ' + url)\n br = br or browser()\n root = query(br, url, 'bing', dump_raw, timeout=timeout)\n ans = []\n for li in root.xpath('//[@id=\"b_results\"]/li[@class=\"b_algo\"]'):\n a = li.xpath('descendant::h2/a[@href]')[0]\n title = tostring(a)\n try:\n div = li.xpath('descendant::div[@class=\"b_attribution\" and @u]')[0]\n except IndexError:\n log('Ignoring {!r} as it has no cached page'.format(title))\n continue\n d, w = div.get('u').split('\|')[-2:]\n # The bing cache does not have a valid https certificate currently\n # (March 2017)\n cached_url = 'http://cc.bingj.com/cache.aspx?q={q}&d={d}&mkt=en-US&setlang=en-US&w={w}'.format(\n q=q, d=d, w=w)\n ans.append(Result(a.get('href'), title, cached_url))\n if not ans:\n title = ' '.join(root.xpath('//title/text()'))\n log('Failed to find any results on results page, with title:', title)\n return ans, url\n\n\ndef bing_develop():\n br = browser()\n for result in bing_search('heroes abercrombie'.split(), 'www.amazon.com', dump_raw='/t/raw.html', br=br)[0]:\n if '/dp/' in result.url:\n print(result.title)\n print(' ', result.url)\n print(' ', result.cached_url)\n print()\n# }}}\n\n# Google {{{\n\n\ndef google_term(t):\n t = t.replace('\"', '')\n if t in {'OR', 'AND', 'NOT'}:\n t = t.lower()\n return t\n\n\ndef google_url_processor(url):\n return url\n\n\ndef google_search(terms, site=None, br=None, log=prints, safe_search=False, dump_raw=None, timeout=60):\n terms = map(google_term, terms)\n terms = [quote_term(t) for t in terms]\n if site is not None:\n terms.append(quote_term(('site:' + site)))\n q = '+'.join(terms)\n url = 'https://www.google.com/search?q={q}'.format(q=q)\n log('Making google query: ' + url)\n br = br or browser()\n root = query(br, url, 'google', dump_raw, timeout=timeout)\n ans = []\n for div in root.xpath('//[@id=\"search\"]//[@id=\"rso\"]//[@class=\"g\"]'):\n try:\n a = div.xpath('descendant::h3[@class=\"r\"]/a[@href]')[0]\n except IndexError:\n log('Ignoring div with no descendant')\n continue\n title = tostring(a)\n try:\n c = div.xpath('descendant::div[@class=\"s\"]//a[@class=\"fl\"]')[0]\n except IndexError:\n log('Ignoring {!r} as it has no cached page'.format(title))\n continue\n cached_url = c.get('href')\n ans.append(Result(a.get('href'), title, cached_url))\n if not ans:\n title = ' '.join(root.xpath('//title/text()'))\n log('Failed to find any results on results page, with title:', title)\n return ans, url\n\n\ndef google_develop():\n br = browser()\n for result in google_search('1423146786'.split(), 'www.amazon.com', dump_raw='/t/raw.html', br=br)[0]:\n if '/dp/' in result.url:\n print(result.title)\n print(' ', result.url)\n print(' ', result.cached_url)\n print()\n# }}}\n\n\ndef resolve_url(url):\n prefix, rest = url.partition(':')[::2]\n if prefix == 'bing':\n return bing_url_processor(rest)\n if prefix == 'wayback':\n return wayback_url_processor(rest)\n return url\n" }][/CODE] that caused Spoiler: Code: calibre, version 3.31.0 ERROR: Download failed: Failed to download metadata. Click Show Details to see details Traceback (most recent call last): File "site-packages/calibre/utils/ipc/simple_worker.py", line 289, in main File "site-packages/calibre/ebooks/metadata/sources/worker.py", line 102, in single_identify File "site-packages/calibre/ebooks/metadata/sources/update.py", line 79, in patch_plugins File "site-packages/calibre/ebooks/metadata/sources/update.py", line 62, in patch_search_engines File "<string>", line 11, in <module> ImportError: No module named polyglot.builtins with metadata-sources-cache.json containing empty JSON dataset Code: {} and ran calibre-debug, got Spoiler: Code: calibre Debug log calibre 3.31 embedded-python: True is64bit: True Linux-4.15.0-33-lowlatency-x86_64-with-debian-stretch-sid Linux ('64bit', 'ELF') ('Linux', '4.15.0-33-lowlatency', '#36~16.04.1-Ubuntu SMP PREEMPT Wed Aug 15 19:09:25 UTC 2018') Python 2.7.12 Linux: ('debian', 'stretch/sid', '') Interface language: None Successfully initialized third party plugins: FictionDB (1, 0, 10) && Read MP3 AudioBook metadata (1, 0, 79) && Goodreads (1, 1, 14) && Job Spy (1, 0, 132) && View Manager (1, 4, 3) && Barnes & Noble (1, 2, 15) && Overdrive Link (2, 29, 0) && Goodreads Sync (1, 12, 0) && Find Duplicates (1, 6, 3) && EpubSplit (2, 4, 0) calibre 3.31 embedded-python: True is64bit: True Linux-4.15.0-33-lowlatency-x86_64-with-debian-stretch-sid Linux ('64bit', 'ELF') ('Linux', '4.15.0-33-lowlatency', '#36~16.04.1-Ubuntu SMP PREEMPT Wed Aug 15 19:09:25 UTC 2018') Python 2.7.12 Linux: ('debian', 'stretch/sid', '') Interface language: None Successfully initialized third party plugins: FictionDB (1, 0, 10) && Read MP3 AudioBook metadata (1, 0, 79) && Goodreads (1, 1, 14) && Job Spy (1, 0, 132) && View Manager (1, 4, 3) && Barnes & Noble (1, 2, 15) && Overdrive Link (2, 29, 0) && Goodreads Sync (1, 12, 0) && Find Duplicates (1, 6, 3) && EpubSplit (2, 4, 0) Turning on automatic hidpi scaling devicePixelRatio: 1.0 logicalDpi: 96.0 x 96.0 physicalDpi: 98.5496535797 x 98.4132841328 Using calibre Qt style: True [0.00] Starting up... [0.05] Showing splash screen... [0.41] splash screen shown [0.41] Initializing db... [0.63] db initialized [0.63] Constructing main UI... Job Spy has begun initialization... Calibre, and hence Job Spy, was gracefully shut down last time? True Last time daemon started: never Last time daemon failed: never Total daemon starts inception_to_date: 0 Total daemon failures inception-to-date: 0 libpng warning: iCCP: known incorrect sRGB profile Job Spy has finished initialization... DEBUG: 0.0 HttpHelper::__init__: proxy=None [6.05] main UI initialized... [6.05] Hiding splash screen [70.18] splash screen hidden [70.18] Started up in 70.18 seconds with 1551 books Metadata sources cache was recently updated not updating again Metadata sources cache was recently updated not updating again (mousepad:14125): GtkSourceView-CRITICAL : gtk_source_style_scheme_get_id: assertion 'GTK_SOURCE_IS_STYLE_SCHEME (scheme)' failed (mousepad:14125): GLib-CRITICAL : g_variant_new_string: assertion 'string != NULL' failed (mousepad:14125): GtkSourceView-CRITICAL : gtk_source_style_scheme_get_id: assertion 'GTK_SOURCE_IS_STYLE_SCHEME (scheme)' failed (mousepad:14125): GLib-CRITICAL : g_variant_new_string: assertion 'string != NULL' failed (mousepad:14125): GtkSourceView-CRITICAL *: gtk_source_style_scheme_get_id: assertion 'GTK_SOURCE_IS_STYLE_SCHEME (scheme)' failed with no error while downloading metadata this time. Kept working after several restarts. Hope this information helps.

09-10-2018, 09:41 PM

#56

BetterRed

null operator (he/him)

Posts: 22,404

Karma: 31000056

Join Date: Mar 2012

Location: Sydney Australia

Device: none

Quote:

Originally Posted by PacificNW

Hey! That was the solution I posted several hours ago in the other thread.

Moderator Notice

You neglected to slap a copyright notice on it

09-10-2018, 10:17 PM	#57
nwgal Member Posts: 11 Karma: 10 Join Date: Sep 2018 Location: Pacific Northwest Device: Nook	Sorry, but I'm not understanding what the solution is. I tried deleting the file, which is a temporary fix. I don't understand how to "change the file to zero bytes", and the post from kenmac999, sorry, but how do you replace the file with "empty JSON dataset". My apologies. I can follow directions, but I guess I"m just clueless on this. Thanks.

09-10-2018, 10:27 PM

#58

AlisaP

Junior Member

Posts: 7

Karma: 80

Join Date: Sep 2018

Device: none

Quote:

Originally Posted by nwgal

Sorry, but I'm not understanding what the solution is. I tried deleting the file, which is a temporary fix. I don't understand how to "change the file to zero bytes", and the post from kenmac999, sorry, but how do you replace the file with "empty JSON dataset".

My apologies. I can follow directions, but I guess I"m just clueless on this.

Thanks.

Open the file in a text editor (I use Edit Pad Lite).

Either delete everything in the file or replace it with this

{}

Save.

Hope that helps!

09-10-2018, 10:52 PM

#59

Kathleen1810

Junior Member

Posts: 1

Karma: 10

Join Date: Sep 2018

Location: Canada

Device: kindle paperwhite

Quote:

Originally Posted by GalacticHull

This seems to work. Though the file did re-appear on one occasion without closing Calibre. Strange stuff.

Thank you..! I ran into this problem this evening and have been searching for a solution. This worked for me. Deleted the file and was then able to download the meta data. I then closed Calibre and reopened it. Tried downloading the meta data for another book and it worked just fine.

09-10-2018, 11:00 PM	#60
kovidgoyal creator of calibre Posts: 46,033 Karma: 29579868 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Apologies, should be fine now. You might need to restart calibre, then start a metadata download. The first one might still fail, but after that it should be fine.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Unable to download metadata	Audra	Calibre	8	08-26-2017 10:13 PM
Unable to Download Metadata	Apache	Calibre	3	04-08-2015 11:07 AM
Unable to Download Metadata	Neptunus	Calibre	4	03-13-2014 07:03 AM
[Solved] How come I am unable to download metadata and covers?	schmuck281	Calibre	15	12-25-2013 01:01 AM
Unable to Download Metadata	Nalgarryn	Library Management	5	01-04-2013 01:31 PM