MobileRead Forums - View Single Post

davidfor · 09-27-2020, 06:06 AM

Quote:

Originally Posted by scorpion2782

i have an error when i try to download metadata from goodreads.
this is the log:

Spoiler:

Code:

Count Page/Word Statistics
do_count_statistics - book_path=C:\Users\dma02\AppData\Local\Temp\calibre_s0fbfhfj\oppqayyg_count_pages\1539.epub, pages_algorithm=2, page_count_mode=Download, statistics_to_run=['PageCount', 'WordCount', 'FleschReading', 'FleschGrade', 'GunningFog'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=C:\Users\dma02\AppData\Local\Temp\calibre_s0fbfhfj\oppqayyg_count_pages\1539.epub
-------------------------------
Logfile for book ID 1539 (Ninfee nere)
	Method of counting _page_count_mode=Download _download_sources=[('goodreads', '30831231')]
	results= {'PageCount': None, 'WordCount': 100640, 'FleschReading': 57.49500249521196, 'FleschGrade': 7.180003232503573, 'GunningFog': 12.521758139287016}
	FAILED TO GET PAGE COUNT FROM WEBSITE
	Found 100640 words
	Computed 57.5 Flesch Reading
	Computed 7.2 Flesch-Kincaid Grade
	Computed 12.5 Gunning Fog Index
1539
do_statistics_for_book:  C:\Users\dma02\AppData\Local\Temp\calibre_s0fbfhfj\oppqayyg_count_pages\1539.epub 2 Download [('goodreads', '30831231')] ['PageCount', 'WordCount', 'FleschReading', 'FleschGrade', 'GunningFog'] 1500 True
DownloadPagesWorker::run - source_id=30831231, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'http://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@id="details"]/div[@class="row"]/span[@itemprop="numberOfPages"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True}
DownloadPagesWorker::run - self.pages_regex=None
Download source book url: 'http://www.goodreads.com/book/show/30831231'
Failed to parse download source details page: 'http://www.goodreads.com/book/show/30831231'
	Word count using icu_wordcount - trying to count_words
	Word count - used count_words: 100640
	Word count: 100640
	Results of NLTK text analysis:
	  Number of characters: 545458
	  Number of words: 111245
	  Number of sentences: 14245
	  Number of syllables: 185952
	  Number of complex words: 26137
	  Average words per sentence: 7.809406809406809
For this book, using language=ita
	Flesch Reading Ease: 57.49500249521196
	Flesch Kincade Grade: 7.180003232503573
	Gunning Fog: 12.521758139287016
Traceback (most recent call last):
  File "calibre_plugins.count_pages.download", line 77, in _get_details
  File "site-packages\lxml\html\__init__.py", line 875, in fromstring
  File "site-packages\lxml\html\__init__.py", line 761, in document_fromstring
  File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1777, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "", line 1
lxml.etree.XMLSyntaxError: encoding not supported USC4 little endian, line 1, column 1

That is an encoding or language issue. I have attached a beta that should fix this. Plus I have added Czech support that @seeker supplied to me last month and I hadn't had a chance to integrate.

The changes in the beta are:

Fix: Wasn't getting the series info.
New: Czech translation - thanks to seeder
New: Add download page count from databazeknih.cz and cbdb.cz - thanks to seeder

I haven't done a lot of testing of these changes. The language makes it a little difficult for me. If anyone sees an problems, please report them here with examples so that I can look at them.

Edit:
I have replaced the attachment as I realised I had left a debug statement in the code that would break on most systems. But, I don't think anyone downloaded the beta.