Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 06-19-2020, 07:17 PM   #1
wolf123
Member
wolf123 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jun 2020
Device: nook simple touch
Issue parsing HTML tags

Hi guys,

I am developing an (edit book) plugin, and I don't seem to be able to parse html tags. My `main.py` file looks like so:
Code:
import lxml.etree
from PyQt5.Qt import QAction, QInputDialog

# The base class that all tools must inherit from
from calibre.gui2.tweak_book.plugin import Tool

from calibre import force_unicode
from calibre.gui2 import error_dialog
from calibre.ebooks.oeb.polish.container import OEB_DOCS, serialize


class MyTool(Tool):
    name = 'my-tool'
    allowed_in_toolbar = True
    allowed_in_menu = True

    def create_action(self, for_toolbar=True):
        ac = QAction(get_icons('icon/icon.png'), 'My Tool', self.gui)

        if not for_toolbar:
            self.register_shortcut(ac, 'my-tool',
                                   default_keys=('Ctrl+Shift+A',))

        ac.triggered.connect(self.run)
        return ac

    def run(self):
        container = self.current_container

        # iterate over book files
        for name, media_type in container.mime_map.items():
            if media_type in OEB_DOCS:

                self.my_method(container.parsed(name))

                container.dirty(name)

                
    def my_method(self, root):

        for el in root.iter('div'):
            el.attrib['class'] = 'my_class'

            # debug
            print('found a div tag')
I would expect divs to get a 'my_class' class, or at least to read debug lines printed in the terminal; but I don't.

What am I getting wrong here?

Many thanks!
wolf123 is offline   Reply With Quote
Old 06-19-2020, 11:52 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,906
Karma: 22666668
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This is XHTML, so you cant use bare tag names, use

root.xpath('//*[local-name()="div"]')
kovidgoyal is offline   Reply With Quote
Advert
Old 06-20-2020, 06:41 AM   #3
wolf123
Member
wolf123 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jun 2020
Device: nook simple touch
Thanks so much!!
I need to learn about xpath.
wolf123 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing tags from a bibliographic entry and/or getting tags from Library of Congress kirk8677 Library Management 2 05-27-2020 06:48 PM
HTML input plugin stripping text within toc tags in child html file nimblebooks Conversion 3 02-21-2012 03:24 PM
Problem with html -> Mobi conversion - html tags visible. khromov Calibre 9 08-06-2011 11:25 AM
Issue importing html zip archives and metadata parsing KevinH Calibre 20 12-25-2010 11:57 PM


All times are GMT -4. The time now is 06:21 AM.


MobileRead.com is a privately owned, operated and funded community.