![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,211
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
BeautifulSoup on calibre
Hi, everyone.
How can I get the attribute of a tag using BeautifulSoup? I was trying this: Code:
from calibre.ebooks.BeautifulSoup import BeautifulStoneSoup snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">" soup = BeautifulSoup(snippet) tag = soup.contents[0] if 'epub:type' in tag.attrs: epub_type = tag['epub:type'] How should I look for the attribute? Last edited by thiago.eec; 01-15-2019 at 12:15 AM. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
dont use beautiful soup, use lxml.
Code:
from calibre.ebooks.oeb.polish.parsing import parse root = parse(binary_data) |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,211
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
Thanks, Kovid.
I tried selecting an attribute, but still doesn't work: Code:
from calibre.ebooks.oeb.polish.parsing import parse snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">" root = parse(snippet) if 'epub:type' in root.attrib: epub_type = root.attrib['epub:type'] |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,211
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
To better explain:
I am reading from a json file, a snippet like this: Json: Code:
"html": "<section xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:epub=\"http://www.idpf.org/2007/ops\" epub:type=\"cover\">" Now, on the main script, I want to check if the html snippet has an 'epub:type' attribute. If it does, I want to save it to the 'epub_type' variable. Last edited by thiago.eec; 01-15-2019 at 06:45 AM. |
![]() |
![]() |
![]() |
#5 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
@thiago.eec Note that Calibre comes with BeautifulSoup 3.0.5. (The current version is 4.4.)
For BeautifulSoup 3.0.5 you'll have to slightly change your code: Code:
from calibre.ebooks.BeautifulSoup import BeautifulSoup snippet = '"<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"' soup = BeautifulSoup(snippet) if soup.section.has_key('epub:type'): epub_type = soup.section['epub:type'] Last edited by Doitsu; 01-15-2019 at 08:52 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,211
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
Thanks, @Doitsu
This worked! |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6) | hiperlink | Recipes | 6 | 08-06-2017 07:56 AM |
[Android] Calibre Companion Noob, Need Help Making Calibre Library Show Up on Shelf | bookiebabe | Calibre Companion | 4 | 02-10-2017 09:37 PM |
How do I import Calibre settings from regular Win Calibre to Calibre Portable? | ABW | Calibre | 9 | 05-20-2013 02:34 PM |
Can not start recoll from calibre /opt/calibre/lib/libz.so.1: no version information | Satas | Development | 5 | 04-19-2013 11:22 PM |
Patch: Calibre adds tags to identify ebook formats created by calibre. | siebert | Calibre | 1 | 07-18-2011 02:07 PM |