06-19-2020, 07:17 PM | #1 |
Member
Posts: 16
Karma: 10
Join Date: Jun 2020
Device: nook simple touch
|
Issue parsing HTML tags
Hi guys,
I am developing an (edit book) plugin, and I don't seem to be able to parse html tags. My `main.py` file looks like so: Code:
import lxml.etree from PyQt5.Qt import QAction, QInputDialog # The base class that all tools must inherit from from calibre.gui2.tweak_book.plugin import Tool from calibre import force_unicode from calibre.gui2 import error_dialog from calibre.ebooks.oeb.polish.container import OEB_DOCS, serialize class MyTool(Tool): name = 'my-tool' allowed_in_toolbar = True allowed_in_menu = True def create_action(self, for_toolbar=True): ac = QAction(get_icons('icon/icon.png'), 'My Tool', self.gui) if not for_toolbar: self.register_shortcut(ac, 'my-tool', default_keys=('Ctrl+Shift+A',)) ac.triggered.connect(self.run) return ac def run(self): container = self.current_container # iterate over book files for name, media_type in container.mime_map.items(): if media_type in OEB_DOCS: self.my_method(container.parsed(name)) container.dirty(name) def my_method(self, root): for el in root.iter('div'): el.attrib['class'] = 'my_class' # debug print('found a div tag') What am I getting wrong here? Many thanks! |
06-19-2020, 11:52 PM | #2 |
creator of calibre
Posts: 43,906
Karma: 22666668
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
This is XHTML, so you cant use bare tag names, use
root.xpath('//*[local-name()="div"]') |
Advert | |
|
06-20-2020, 06:41 AM | #3 |
Member
Posts: 16
Karma: 10
Join Date: Jun 2020
Device: nook simple touch
|
Thanks so much!!
I need to learn about xpath. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Parsing tags from a bibliographic entry and/or getting tags from Library of Congress | kirk8677 | Library Management | 2 | 05-27-2020 06:48 PM |
HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 03:24 PM |
Problem with html -> Mobi conversion - html tags visible. | khromov | Calibre | 9 | 08-06-2011 11:25 AM |
Issue importing html zip archives and metadata parsing | KevinH | Calibre | 20 | 12-25-2010 11:57 PM |