BeautifulSoup on calibre

thiago.eec · 01-15-2019, 12:10 AM

Hi, everyone.

How can I get the attribute of a tag using BeautifulSoup?

I was trying this:

Code:

from calibre.ebooks.BeautifulSoup import BeautifulStoneSoup

snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"

soup = BeautifulSoup(snippet)
tag = soup.contents[0]

if 'epub:type' in tag.attrs:
   epub_type = tag['epub:type']

But it doesn't work. The condition is being evaluted as false.

How should I look for the attribute?

kovidgoyal · 01-15-2019, 01:30 AM

dont use beautiful soup, use lxml.

Code:

from calibre.ebooks.oeb.polish.parsing import parse
root = parse(binary_data)

thiago.eec · 01-15-2019, 05:46 AM

Quote:

Originally Posted by kovidgoyal

dont use beautiful soup, use lxml.

Thanks, Kovid.

I tried selecting an attribute, but still doesn't work:

Code:

from calibre.ebooks.oeb.polish.parsing import parse

snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"
root = parse(snippet)

if 'epub:type' in root.attrib:
    epub_type = root.attrib['epub:type']

I want epub_type to read the 'epigraph' attribute.

thiago.eec · 01-15-2019, 06:39 AM

To better explain:

I am reading from a json file, a snippet like this:

Json:

Code:

"html": "<section xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:epub=\"http://www.idpf.org/2007/ops\" epub:type=\"cover\">"

The value for 'html' varies, so the element is not always 'section'.

Now, on the main script, I want to check if the html snippet has an 'epub:type' attribute. If it does, I want to save it to the 'epub_type' variable.

Doitsu · 01-15-2019, 06:43 AM

@thiago.eec Note that Calibre comes with BeautifulSoup 3.0.5. (The current version is 4.4.)

For BeautifulSoup 3.0.5 you'll have to slightly change your code:

Code:

from calibre.ebooks.BeautifulSoup import BeautifulSoup
snippet = '"<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"'
soup = BeautifulSoup(snippet)
if soup.section.has_key('epub:type'):
    epub_type = soup.section['epub:type']

thiago.eec · 01-15-2019, 08:24 AM

Thanks, @Doitsu

This worked!

01-15-2019, 12:10 AM	#1
thiago.eec Guru Posts: 927 Karma: 1177583 Join Date: Dec 2016 Location: Goiânia - Brazil Device: iPad, Kindle Paperwhite	BeautifulSoup on calibre Hi, everyone. How can I get the attribute of a tag using BeautifulSoup? I was trying this: Code: from calibre.ebooks.BeautifulSoup import BeautifulStoneSoup snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">" soup = BeautifulSoup(snippet) tag = soup.contents[0] if 'epub:type' in tag.attrs: epub_type = tag['epub:type'] But it doesn't work. The condition is being evaluted as false. How should I look for the attribute? Last edited by thiago.eec; 01-15-2019 at 12:15 AM.

01-15-2019, 01:30 AM	#2
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	dont use beautiful soup, use lxml. Code: from calibre.ebooks.oeb.polish.parsing import parse root = parse(binary_data)

01-15-2019, 06:39 AM	#4
thiago.eec Guru Posts: 927 Karma: 1177583 Join Date: Dec 2016 Location: Goiânia - Brazil Device: iPad, Kindle Paperwhite	To better explain: I am reading from a json file, a snippet like this: Json: Code: "html": "<section xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:epub=\"http://www.idpf.org/2007/ops\" epub:type=\"cover\">" The value for 'html' varies, so the element is not always 'section'. Now, on the main script, I want to check if the html snippet has an 'epub:type' attribute. If it does, I want to save it to the 'epub_type' variable. Last edited by thiago.eec; 01-15-2019 at 06:45 AM.

01-15-2019, 06:43 AM	#5
Doitsu Grand Sorcerer Posts: 5,584 Karma: 22735033 Join Date: Dec 2010 Device: Kindle PW2	@thiago.eec Note that Calibre comes with BeautifulSoup 3.0.5. (The current version is 4.4.) For BeautifulSoup 3.0.5 you'll have to slightly change your code: Code: from calibre.ebooks.BeautifulSoup import BeautifulSoup snippet = '"<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"' soup = BeautifulSoup(snippet) if soup.section.has_key('epub:type'): epub_type = soup.section['epub:type'] Last edited by Doitsu; 01-15-2019 at 08:52 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6)	hiperlink	Recipes	6	08-06-2017 07:56 AM
[Android] Calibre Companion Noob, Need Help Making Calibre Library Show Up on Shelf	bookiebabe	Calibre Companion	4	02-10-2017 09:37 PM
How do I import Calibre settings from regular Win Calibre to Calibre Portable?	ABW	Calibre	9	05-20-2013 02:34 PM
Can not start recoll from calibre /opt/calibre/lib/libz.so.1: no version information	Satas	Development	5	04-19-2013 11:22 PM
Patch: Calibre adds tags to identify ebook formats created by calibre.	siebert	Calibre	1	07-18-2011 02:07 PM

01-15-2019, 08:24 AM	#6
thiago.eec Guru Posts: 927 Karma: 1177583 Join Date: Dec 2016 Location: Goiânia - Brazil Device: iPad, Kindle Paperwhite	Thanks, @Doitsu This worked!

Advert

Advert