01-15-2019, 12:10 AM | #1 |
Guru
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
BeautifulSoup on calibre
Hi, everyone.
How can I get the attribute of a tag using BeautifulSoup? I was trying this: Code:
from calibre.ebooks.BeautifulSoup import BeautifulStoneSoup snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">" soup = BeautifulSoup(snippet) tag = soup.contents[0] if 'epub:type' in tag.attrs: epub_type = tag['epub:type'] How should I look for the attribute? Last edited by thiago.eec; 01-15-2019 at 12:15 AM. |
01-15-2019, 01:30 AM | #2 |
creator of calibre
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
dont use beautiful soup, use lxml.
Code:
from calibre.ebooks.oeb.polish.parsing import parse root = parse(binary_data) |
Advert | |
|
01-15-2019, 05:46 AM | #3 |
Guru
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Thanks, Kovid.
I tried selecting an attribute, but still doesn't work: Code:
from calibre.ebooks.oeb.polish.parsing import parse snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">" root = parse(snippet) if 'epub:type' in root.attrib: epub_type = root.attrib['epub:type'] |
01-15-2019, 06:39 AM | #4 |
Guru
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
To better explain:
I am reading from a json file, a snippet like this: Json: Code:
"html": "<section xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:epub=\"http://www.idpf.org/2007/ops\" epub:type=\"cover\">" Now, on the main script, I want to check if the html snippet has an 'epub:type' attribute. If it does, I want to save it to the 'epub_type' variable. Last edited by thiago.eec; 01-15-2019 at 06:45 AM. |
01-15-2019, 06:43 AM | #5 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
@thiago.eec Note that Calibre comes with BeautifulSoup 3.0.5. (The current version is 4.4.)
For BeautifulSoup 3.0.5 you'll have to slightly change your code: Code:
from calibre.ebooks.BeautifulSoup import BeautifulSoup snippet = '"<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"' soup = BeautifulSoup(snippet) if soup.section.has_key('epub:type'): epub_type = soup.section['epub:type'] Last edited by Doitsu; 01-15-2019 at 08:52 AM. |
Advert | |
|
01-15-2019, 08:24 AM | #6 |
Guru
Posts: 927
Karma: 1177583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Thanks, @Doitsu
This worked! |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6) | hiperlink | Recipes | 6 | 08-06-2017 07:56 AM |
[Android] Calibre Companion Noob, Need Help Making Calibre Library Show Up on Shelf | bookiebabe | Calibre Companion | 4 | 02-10-2017 09:37 PM |
How do I import Calibre settings from regular Win Calibre to Calibre Portable? | ABW | Calibre | 9 | 05-20-2013 02:34 PM |
Can not start recoll from calibre /opt/calibre/lib/libz.so.1: no version information | Satas | Development | 5 | 04-19-2013 11:22 PM |
Patch: Calibre adds tags to identify ebook formats created by calibre. | siebert | Calibre | 1 | 07-18-2011 02:07 PM |