View Single Post
Old 06-13-2019, 05:00 PM   #7
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 945
Karma: 1183425
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by davidfor View Post
@thiago.eec: If you are interested in getting the page count from a site, the Count Pages plugin supports this. The current sites supported use an XPath statement to find the page count. This plus the URL template and some names are all that are needed.
Hi, @davidfor.
I use the Count Pages plugin all the time. Nice tool.
Would it be possible to add skoob.com.br to the list?
I can provide you any information you need.

Book URL Template: https://www.skoob.com.br/livro/{skoob_id}

XPath* to pages count:

Code:
info_nodes = root.xpath('//div[@class="sidebar-desc"]/text()')
for info in info_nodes:
pages = re.search('(Páginas: )([0-9]+)', info).groups(0)[1]

* There are a lot of info on a single DIV. This is how I managed to extract each info. I'm sure you can find a better way. Take a look at the DIV:

Code:
<div class="sidebar-desc">

                ISBN-13: <span>9788528610390</span><br>ISBN-10: <span>852861039X</span><br>

        Ano: 2003          / Páginas: 224<br>        Idioma: português  <br>        Editora: <a href="/editora/1-bertrand-brasil">Bertrand Brasil</a><br>    </div>

Last edited by thiago.eec; 06-13-2019 at 05:35 PM.
thiago.eec is offline   Reply With Quote