Issue parsing HTML tags

wolf123 · 06-19-2020, 07:17 PM

Hi guys,

I am developing an (edit book) plugin, and I don't seem to be able to parse html tags. My `main.py` file looks like so:

Code:

import lxml.etree
from PyQt5.Qt import QAction, QInputDialog

# The base class that all tools must inherit from
from calibre.gui2.tweak_book.plugin import Tool

from calibre import force_unicode
from calibre.gui2 import error_dialog
from calibre.ebooks.oeb.polish.container import OEB_DOCS, serialize


class MyTool(Tool):
    name = 'my-tool'
    allowed_in_toolbar = True
    allowed_in_menu = True

    def create_action(self, for_toolbar=True):
        ac = QAction(get_icons('icon/icon.png'), 'My Tool', self.gui)

        if not for_toolbar:
            self.register_shortcut(ac, 'my-tool',
                                   default_keys=('Ctrl+Shift+A',))

        ac.triggered.connect(self.run)
        return ac

    def run(self):
        container = self.current_container

        # iterate over book files
        for name, media_type in container.mime_map.items():
            if media_type in OEB_DOCS:

                self.my_method(container.parsed(name))

                container.dirty(name)

                
    def my_method(self, root):

        for el in root.iter('div'):
            el.attrib['class'] = 'my_class'

            # debug
            print('found a div tag')

I would expect divs to get a 'my_class' class, or at least to read debug lines printed in the terminal; but I don't.

What am I getting wrong here?

Many thanks!

kovidgoyal · 06-19-2020, 11:52 PM

This is XHTML, so you cant use bare tag names, use

root.xpath('//*[local-name()="div"]')

wolf123 · 06-20-2020, 06:41 AM

Thanks so much!!
I need to learn about xpath.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Parsing tags from a bibliographic entry and/or getting tags from Library of Congress	kirk8677	Library Management	2	05-27-2020 06:48 PM
HTML input plugin stripping text within toc tags in child html file	nimblebooks	Conversion	3	02-21-2012 03:24 PM
Problem with html -> Mobi conversion - html tags visible.	khromov	Calibre	9	08-06-2011 11:25 AM
Issue importing html zip archives and metadata parsing	KevinH	Calibre	20	12-25-2010 11:57 PM

06-19-2020, 11:52 PM	#2
kovidgoyal creator of calibre Posts: 43,906 Karma: 22666668 Join Date: Oct 2006 Location: Mumbai, India Device: Various	This is XHTML, so you cant use bare tag names, use root.xpath('//*[local-name()="div"]')

06-20-2020, 06:41 AM	#3
wolf123 Member Posts: 16 Karma: 10 Join Date: Jun 2020 Device: nook simple touch	Thanks so much!! I need to learn about xpath.

Advert