|
|
#1 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,294
Karma: 1436993
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
BeautifulSoup on calibre
Hi, everyone.
How can I get the attribute of a tag using BeautifulSoup? I was trying this: Code:
from calibre.ebooks.BeautifulSoup import BeautifulStoneSoup snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">" soup = BeautifulSoup(snippet) tag = soup.contents[0] if 'epub:type' in tag.attrs: epub_type = tag['epub:type'] How should I look for the attribute? Last edited by thiago.eec; 01-15-2019 at 01:15 AM. |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,664
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
dont use beautiful soup, use lxml.
Code:
from calibre.ebooks.oeb.polish.parsing import parse root = parse(binary_data) |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,294
Karma: 1436993
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
Thanks, Kovid.
I tried selecting an attribute, but still doesn't work: Code:
from calibre.ebooks.oeb.polish.parsing import parse
snippet = "<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"
root = parse(snippet)
if 'epub:type' in root.attrib:
epub_type = root.attrib['epub:type']
|
|
|
|
|
|
#4 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,294
Karma: 1436993
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
To better explain:
I am reading from a json file, a snippet like this: Json: Code:
"html": "<section xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:epub=\"http://www.idpf.org/2007/ops\" epub:type=\"cover\">" Now, on the main script, I want to check if the html snippet has an 'epub:type' attribute. If it does, I want to save it to the 'epub_type' variable. Last edited by thiago.eec; 01-15-2019 at 07:45 AM. |
|
|
|
|
|
#5 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,763
Karma: 24088559
Join Date: Dec 2010
Device: Kindle PW2
|
@thiago.eec Note that Calibre comes with BeautifulSoup 3.0.5. (The current version is 4.4.)
For BeautifulSoup 3.0.5 you'll have to slightly change your code: Code:
from calibre.ebooks.BeautifulSoup import BeautifulSoup snippet = '"<section xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:type="epigraph">"' soup = BeautifulSoup(snippet) if soup.section.has_key('epub:type'): epub_type = soup.section['epub:type'] Last edited by Doitsu; 01-15-2019 at 09:52 AM. |
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,294
Karma: 1436993
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
Thanks, @Doitsu
This worked! |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6) | hiperlink | Recipes | 6 | 08-06-2017 08:56 AM |
| [Android] Calibre Companion Noob, Need Help Making Calibre Library Show Up on Shelf | bookiebabe | Calibre Companion | 4 | 02-10-2017 10:37 PM |
| How do I import Calibre settings from regular Win Calibre to Calibre Portable? | ABW | Calibre | 9 | 05-20-2013 03:34 PM |
| Can not start recoll from calibre /opt/calibre/lib/libz.so.1: no version information | Satas | Development | 5 | 04-20-2013 12:22 AM |
| Patch: Calibre adds tags to identify ebook formats created by calibre. | siebert | Calibre | 1 | 07-18-2011 03:07 PM |