I'm getting poor (or at least erratic) results using the Count Pages plugin (CP) when trying to get Goodreads.com data. I'm using Calibre 6.26.0, Count Pages 1.13.1 and Goodreads 1.7.9. See attachment for my CP control panel.
THANKS to kiwidude for these useful plugins!
In most cases, CP simply fails to get a page count from the Goodreads API. These results are typically described in the FAILURE section below.
In other, somewhat perplexing, cases, CP will succeed in getting a page count, IMMEDIATELY AFTER A FAILURE. See SUCCESS below.
By the way, these two logs were from consecutive attempts with CP. The first was after a clean startup in Calibre, and the second immediately followed.
My workflow is to import my EPUB book, get additional metadata using the Goodreads plugin (thanks again, kiwidude), potentially revise the Amazon ID if it does not correspond to the appropriate B* Kindle ID, and then use CP to retrieve page data.
What am I doing wrong? Or is this just the Goodreads API being flaky or recalcitrant?
Thanks for any information.
Willie
========= FAILURE ==========
Spoiler:
Count Page/Word Statistics
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
-------------------------------
Logfile for book ID 1738 (The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race - Walter Isaacson)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '55513377')]
results= {'PageCount': None}
FAILED TO GET PAGE COUNT FROM WEBSITE
1738
do_statistics_for_book: None 0 Download [('goodreads', '55513377')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=55513377, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/55513377'
_parse_page_count: start
_parse_page_count: root.__class__= HtmlElement
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= []
_parse_page_count: end
========= SUCCESS =========
Spoiler:
Count Page/Word Statistics
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
-------------------------------
Logfile for book ID 1738 (The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race - Walter Isaacson)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '55513377')]
results= {'download_source': 'goodreads', 'PageCount': 552}
Downloaded page count from Goodreads: 552
1738
do_statistics_for_book: None 0 Download [('goodreads', '55513377')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=55513377, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/55513377'
_parse_page_count: start
_parse_page_count: root.__class__= HtmlElement
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= ['552 pages, Kindle Edition']
_parse_page_count: pages[0]= 552 pages, Kindle Edition
_parse_page_count: pages_regex= ([0-9]+) pages
_parse_page_count: pages_text= 552
_parse_page_count: have pages_regex='([0-9]+) pages'
_parse_page_count: result from regex='552'
_parse_page_count: end