@Kovid, thanks for the update. I haven't been doing as much work on the plugins as the beta was getting out of date. And I just had to step away from it at one point.
I have just posted a beta of the Count Pages plugin. Thanks to the poke from @JimmXinu. I'd done it ages ago, just hadn't posted it.
I do have one problem left in it. It can get the page count from Goodreads. To do that, it downloads the books page and parses it for the page count. Most of my test cases work. But, for
https://www.goodreads.com/book/show/33701864 it produces an error.
The code that does this is basically:
Code:
br = browser()
raw = br.open_novisit(self.url, timeout=self.timeout).read().strip()
raw = raw.decode('utf-8', errors='replace')
root = fromstring(clean_ascii_chars(raw))
For the link above it produces an error:
Code:
Traceback (most recent call last):
File "calibre_plugins.count_pages.download", line 77, in _get_details
File "__init__.py", line 875, in fromstring
File "__init__.py", line 761, in document_fromstring
File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 1
lxml.etree.XMLSyntaxError: encoding not supported USC4 little endian, line 1, column 1
I can't find anything useful on the web for dealing with this. Or working out where it is. Anyone have any ideas?