View Single Post
Old 08-19-2022, 08:20 AM   #13
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by lizzie1170 View Post
I tried to simulate the "Count Pages" code to read the content of Epubs, when I run the code it tells me: "typeerror 'function' object is not iterable"

Code:
import re
from calibre_plugins.action_chains.actions.base import ChainAction
RE_HTML_BODY = re.compile(u'<body[^>]*>(.*)</body>', re.UNICODE | re.DOTALL | re.IGNORECASE)
with open("test_dict.txt", "r") as f:
    tags_dict = f.read()

def _extract_body_text(data):
    '''Get the body text of this html content wit any html tags stripped'''
    body = RE_HTML_BODY.findall(data)

def tags_from_epub(path_to_epub):
    temp = []
    res = dict()
    for line in _extract_body_text:
        for key,value in tags_dict.items():
         if re.search(rf'{value}', line):
            if value not in temp:
                temp.append(value)
                res[key] = value                
                regex = re.compile(value) 
                match_array = regex.finditer(line) 
                match_list = list(match_array)
                for m in match_list:
                    print(key, ":",m.group())
    
def run(gui, settings, chain):
    db = gui.current_db
    for book_id in chain.scope().get_book_ids():
        fmts = [ fmt.strip() for fmt in db.formats(book_id, index_is_id=True).split(',') ]
        if 'EPUB' in fmts:
            path_to_epub = db.format_abspath(book_id, 'EPUB', index_is_id=True)
            tags_from_epub(path_to_epub)
Well, you get that error because you didn't actually call the method. "_extract_body_text" appears to be a method that takes a string of some sort. But, when you used it, you treated it as something else.

And that doesn't look anything like what Page Count does. It will open the epub as an iterator, then iterate through the files in the spine, extract the text from each of them and combine them into a big long chunk of text. Then it process that. You have passed "path_to_epub" into your method, but, never actually used it. From the Count Pages plugin, you need to look at statistic.py and follow the flow starting with "get_word_count"
davidfor is offline   Reply With Quote