View Single Post
Old 09-13-2023, 10:29 AM   #1677
wanderson
Enthusiast
wanderson began at the beginning.
 
wanderson's Avatar
 
Posts: 28
Karma: 10
Join Date: Jul 2017
Location: Austin TX USA
Device: Kindle Paperwhite Gen 11, Likebook Mars & TCL NxtPaper 11
Quote:
Originally Posted by kiwidude View Post
Thanks for the fast response, kiwidude. I've tried 1.13.2 and it seems much improved. I still (typically) have to run it twice, but it seems to be reliably getting a good page count on the second effort, and occasionally on the first. Here are logs from two back-to-back runs; the first run got one out of 5, and the second run got the remaining four.

Thanks again.

======PARTIAL FAILURE======
Spoiler:
Count Page/Word Statistics
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
-------------------------------
Logfile for book ID 1848 (Our Kind of Traitor - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '7839766')]
results= {'PageCount': None}
FAILED TO GET PAGE COUNT FROM WEBSITE
1848
do_statistics_for_book: None 0 Download [('goodreads', '7839766')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=7839766, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/7839766'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= []
_parse_page_count: end

-------------------------------
Logfile for book ID 1847 (Our Game - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '22927861')]
results= {'PageCount': None}
FAILED TO GET PAGE COUNT FROM WEBSITE
1847
do_statistics_for_book: None 0 Download [('goodreads', '22927861')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=22927861, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/22927861'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= []
_parse_page_count: end

-------------------------------
Logfile for book ID 1844 (A Most Wanted Man - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '7006264')]
results= {'download_source': 'goodreads', 'PageCount': 336}
Downloaded page count from Goodreads: 336
1844
do_statistics_for_book: None 0 Download [('goodreads', '7006264')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=7006264, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/7006264'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= ['336 pages, ebook']
_parse_page_count: pages[0]= 336 pages, ebook
_parse_page_count: pages_regex= ([0-9]+) pages
_parse_page_count: pages_text= 336
_parse_page_count: have pages_regex='([0-9]+) pages'
_parse_page_count: result from regex='336'
_parse_page_count: end

-------------------------------
Logfile for book ID 1845 (A Perfect Spy - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '10069474')]
results= {'PageCount': None}
FAILED TO GET PAGE COUNT FROM WEBSITE
1845
do_statistics_for_book: None 0 Download [('goodreads', '10069474')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=10069474, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/10069474'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= []
_parse_page_count: end

-------------------------------
Logfile for book ID 1846 (A Small Town in Germany - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '56485308')]
results= {'PageCount': None}
FAILED TO GET PAGE COUNT FROM WEBSITE
1846
do_statistics_for_book: None 0 Download [('goodreads', '56485308')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=56485308, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/56485308'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= []
_parse_page_count: end


======SUCCESS======
Spoiler:
Count Page/Word Statistics
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
do_count_statistics - book_path=None, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=None
-------------------------------
Logfile for book ID 1846 (A Small Town in Germany - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '56485308')]
results= {'download_source': 'goodreads', 'PageCount': 336}
Downloaded page count from Goodreads: 336
1846
do_statistics_for_book: None 0 Download [('goodreads', '56485308')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=56485308, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/56485308'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= ['336 pages, ebook']
_parse_page_count: pages[0]= 336 pages, ebook
_parse_page_count: pages_regex= ([0-9]+) pages
_parse_page_count: pages_text= 336
_parse_page_count: have pages_regex='([0-9]+) pages'
_parse_page_count: result from regex='336'
_parse_page_count: end

-------------------------------
Logfile for book ID 1845 (A Perfect Spy - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '10069474')]
results= {'download_source': 'goodreads', 'PageCount': 604}
Downloaded page count from Goodreads: 604
1845
do_statistics_for_book: None 0 Download [('goodreads', '10069474')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=10069474, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/10069474'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= ['604 pages, Paperback']
_parse_page_count: pages[0]= 604 pages, Paperback
_parse_page_count: pages_regex= ([0-9]+) pages
_parse_page_count: pages_text= 604
_parse_page_count: have pages_regex='([0-9]+) pages'
_parse_page_count: result from regex='604'
_parse_page_count: end

-------------------------------
Logfile for book ID 1848 (Our Kind of Traitor - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '7839766')]
results= {'download_source': 'goodreads', 'PageCount': 306}
Downloaded page count from Goodreads: 306
1848
do_statistics_for_book: None 0 Download [('goodreads', '7839766')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=7839766, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/7839766'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= ['306 pages, Hardcover']
_parse_page_count: pages[0]= 306 pages, Hardcover
_parse_page_count: pages_regex= ([0-9]+) pages
_parse_page_count: pages_text= 306
_parse_page_count: have pages_regex='([0-9]+) pages'
_parse_page_count: result from regex='306'
_parse_page_count: end

-------------------------------
Logfile for book ID 1847 (Our Game - John le Carré)
Method of counting _page_count_mode=Download _download_sources=[('goodreads', '22927861')]
results= {'download_source': 'goodreads', 'PageCount': 353}
Downloaded page count from Goodreads: 353
1847
do_statistics_for_book: None 0 Download [('goodreads', '22927861')] ['PageCount'] 1500 True
DownloadPagesWorker::run - source_id=22927861, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'https://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'https://www.goodreads.com/book/show/22927861'
_parse_page_count: start
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= ['353 pages, Kindle Edition']
_parse_page_count: pages[0]= 353 pages, Kindle Edition
_parse_page_count: pages_regex= ([0-9]+) pages
_parse_page_count: pages_text= 353
_parse_page_count: have pages_regex='([0-9]+) pages'
_parse_page_count: result from regex='353'
_parse_page_count: end


EDIT: Alas, I am still seeing some persistent failures for books with page counts at Goodreads. I'll post additional logs if there is interest.

Last edited by wanderson; 09-13-2023 at 11:11 AM. Reason: Additional runs give some continuing issues.
wanderson is offline   Reply With Quote