Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 01-15-2019, 12:10 AM   #1
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
BeautifulSoup on calibre

Hi, everyone.

How can I get the attribute of a tag using BeautifulSoup?

I was trying this:

Code:
from calibre.ebooks.BeautifulSoup import BeautifulStoneSoup

snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"

soup = BeautifulSoup(snippet)
tag = soup.contents[0]

if 'epub:type' in tag.attrs:
   epub_type = tag['epub:type']
But it doesn't work. The condition is being evaluted as false.

How should I look for the attribute?

Last edited by thiago.eec; 01-15-2019 at 12:15 AM.
thiago.eec is offline   Reply With Quote
Old 01-15-2019, 01:30 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
dont use beautiful soup, use lxml.

Code:
from calibre.ebooks.oeb.polish.parsing import parse
root = parse(binary_data)
kovidgoyal is offline   Reply With Quote
Advert
Old 01-15-2019, 05:46 AM   #3
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
dont use beautiful soup, use lxml.
Thanks, Kovid.

I tried selecting an attribute, but still doesn't work:

Code:
from calibre.ebooks.oeb.polish.parsing import parse

snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"
root = parse(snippet)

if 'epub:type' in root.attrib:
    epub_type = root.attrib['epub:type']
I want epub_type to read the 'epigraph' attribute.
thiago.eec is offline   Reply With Quote
Old 01-15-2019, 06:39 AM   #4
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
To better explain:

I am reading from a json file, a snippet like this:

Json:
Code:
"html": "<section xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:epub=\"http://www.idpf.org/2007/ops\" epub:type=\"cover\">"
The value for 'html' varies, so the element is not always 'section'.

Now, on the main script, I want to check if the html snippet has an 'epub:type' attribute. If it does, I want to save it to the 'epub_type' variable.

Last edited by thiago.eec; 01-15-2019 at 06:45 AM.
thiago.eec is offline   Reply With Quote
Old 01-15-2019, 06:43 AM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
@thiago.eec Note that Calibre comes with BeautifulSoup 3.0.5. (The current version is 4.4.)

For BeautifulSoup 3.0.5 you'll have to slightly change your code:

Code:
from calibre.ebooks.BeautifulSoup import BeautifulSoup
snippet = '"<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"'
soup = BeautifulSoup(snippet)
if soup.section.has_key('epub:type'):
    epub_type = soup.section['epub:type']

Last edited by Doitsu; 01-15-2019 at 08:52 AM.
Doitsu is offline   Reply With Quote
Advert
Old 01-15-2019, 08:24 AM   #6
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Thanks, @Doitsu

This worked!
thiago.eec is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6) hiperlink Recipes 6 08-06-2017 07:56 AM
[Android] Calibre Companion Noob, Need Help Making Calibre Library Show Up on Shelf bookiebabe Calibre Companion 4 02-10-2017 09:37 PM
How do I import Calibre settings from regular Win Calibre to Calibre Portable? ABW Calibre 9 05-20-2013 02:34 PM
Can not start recoll from calibre /opt/calibre/lib/libz.so.1: no version information Satas Development 5 04-19-2013 11:22 PM
Patch: Calibre adds tags to identify ebook formats created by calibre. siebert Calibre 1 07-18-2011 02:07 PM


All times are GMT -4. The time now is 12:03 AM.


MobileRead.com is a privately owned, operated and funded community.