MobileRead Forums - View Single Post

davidfor · 05-02-2020, 04:36 AM

@Kovid, thanks for the update. I haven't been doing as much work on the plugins as the beta was getting out of date. And I just had to step away from it at one point.

I have just posted a beta of the Count Pages plugin. Thanks to the poke from @JimmXinu. I'd done it ages ago, just hadn't posted it.

I do have one problem left in it. It can get the page count from Goodreads. To do that, it downloads the books page and parses it for the page count. Most of my test cases work. But, for https://www.goodreads.com/book/show/33701864 it produces an error.

The code that does this is basically:

Code:

        br = browser()
        raw = br.open_novisit(self.url, timeout=self.timeout).read().strip()
        raw = raw.decode('utf-8', errors='replace')
        root = fromstring(clean_ascii_chars(raw))

For the link above it produces an error:

Code:

        Traceback (most recent call last):
          File "calibre_plugins.count_pages.download", line 77, in _get_details
          File "__init__.py", line 875, in fromstring
          File "__init__.py", line 761, in document_fromstring
          File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
          File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
          File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc
          File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
          File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
          File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
          File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
          File "<string>", line 1
        lxml.etree.XMLSyntaxError: encoding not supported USC4 little endian, line 1, column 1

I can't find anything useful on the web for dealing with this. Or working out where it is. Anyone have any ideas?