Quote:
Originally Posted by davidfor
@thiago.eec: If you are interested in getting the page count from a site, the Count Pages plugin supports this. The current sites supported use an XPath statement to find the page count. This plus the URL template and some names are all that are needed.
|
Hi, @davidfor.
I use the Count Pages plugin all the time. Nice tool.
Would it be possible to add skoob.com.br to the list?
I can provide you any information you need.
Book URL Template: https://www.skoob.com.br/livro/{skoob_id}
XPath* to pages count:
Code:
info_nodes = root.xpath('//div[@class="sidebar-desc"]/text()')
for info in info_nodes:pages = re.search('(Páginas: )([0-9]+)', info).groups(0)[1]
* There are a lot of info on a single DIV. This is how I managed to extract each info. I'm sure you can find a better way. Take a look at the DIV:
Code:
<div class="sidebar-desc">
ISBN-13: <span>9788528610390</span><br>ISBN-10: <span>852861039X</span><br>
Ano: 2003 / Páginas: 224<br> Idioma: português <br> Editora: <a href="/editora/1-bertrand-brasil">Bertrand Brasil</a><br> </div>