View Single Post
Old 09-27-2020, 06:06 AM   #1381
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by scorpion2782 View Post
i have an error when i try to download metadata from goodreads.
this is the log:

Spoiler:
Code:
Count Page/Word Statistics
do_count_statistics - book_path=C:\Users\dma02\AppData\Local\Temp\calibre_s0fbfhfj\oppqayyg_count_pages\1539.epub, pages_algorithm=2, page_count_mode=Download, statistics_to_run=['PageCount', 'WordCount', 'FleschReading', 'FleschGrade', 'GunningFog'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=C:\Users\dma02\AppData\Local\Temp\calibre_s0fbfhfj\oppqayyg_count_pages\1539.epub
-------------------------------
Logfile for book ID 1539 (Ninfee nere)
	Method of counting _page_count_mode=Download _download_sources=[('goodreads', '30831231')]
	results= {'PageCount': None, 'WordCount': 100640, 'FleschReading': 57.49500249521196, 'FleschGrade': 7.180003232503573, 'GunningFog': 12.521758139287016}
	FAILED TO GET PAGE COUNT FROM WEBSITE
	Found 100640 words
	Computed 57.5 Flesch Reading
	Computed 7.2 Flesch-Kincaid Grade
	Computed 12.5 Gunning Fog Index
1539
do_statistics_for_book:  C:\Users\dma02\AppData\Local\Temp\calibre_s0fbfhfj\oppqayyg_count_pages\1539.epub 2 Download [('goodreads', '30831231')] ['PageCount', 'WordCount', 'FleschReading', 'FleschGrade', 'GunningFog'] 1500 True
DownloadPagesWorker::run - source_id=30831231, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'http://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@id="details"]/div[@class="row"]/span[@itemprop="numberOfPages"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True}
DownloadPagesWorker::run - self.pages_regex=None
Download source book url: 'http://www.goodreads.com/book/show/30831231'
Failed to parse download source details page: 'http://www.goodreads.com/book/show/30831231'
	Word count using icu_wordcount - trying to count_words
	Word count - used count_words: 100640
	Word count: 100640
	Results of NLTK text analysis:
	  Number of characters: 545458
	  Number of words: 111245
	  Number of sentences: 14245
	  Number of syllables: 185952
	  Number of complex words: 26137
	  Average words per sentence: 7.809406809406809
For this book, using language=ita
	Flesch Reading Ease: 57.49500249521196
	Flesch Kincade Grade: 7.180003232503573
	Gunning Fog: 12.521758139287016
Traceback (most recent call last):
  File "calibre_plugins.count_pages.download", line 77, in _get_details
  File "site-packages\lxml\html\__init__.py", line 875, in fromstring
  File "site-packages\lxml\html\__init__.py", line 761, in document_fromstring
  File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1777, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "", line 1
lxml.etree.XMLSyntaxError: encoding not supported USC4 little endian, line 1, column 1
That is an encoding or language issue. I have attached a beta that should fix this. Plus I have added Czech support that @seeker supplied to me last month and I hadn't had a chance to integrate.

The changes in the beta are:
  • Fix: Wasn't getting the series info.
  • New: Czech translation - thanks to seeder
  • New: Add download page count from databazeknih.cz and cbdb.cz - thanks to seeder

I haven't done a lot of testing of these changes. The language makes it a little difficult for me. If anyone sees an problems, please report them here with examples so that I can look at them.


Edit:
I have replaced the attachment as I realised I had left a debug statement in the code that would break on most systems. But, I don't think anyone downloaded the beta.
Attached Files
File Type: zip Count Pages-beta.zip (307.1 KB, 272 views)

Last edited by davidfor; 09-27-2020 at 06:36 AM. Reason: Updated attachment as I left a debug statement in.
davidfor is offline   Reply With Quote