Quote:
Originally Posted by davidfor
You need to have a Python module called "epub_conversion.utils" available to you. You have that in your original script. Where is it coming from? It does not look like a calibre module and I cannot find "convert_epub_to_lines" in the calibre source. You will either need to add this module so that you can see it when running in calibre. Or change the code to use calibre functions. I know the Count Pages plugin does this (extract the text from an epub), so you can look at that for how to do it.
|

I tried to simulate the "Count Pages" code to read the content of Epubs, when I run the code it tells me: "typeerror 'function' object is not iterable"
Code:
import re
from calibre_plugins.action_chains.actions.base import ChainAction
RE_HTML_BODY = re.compile(u'<body[^>]*>(.*)</body>', re.UNICODE | re.DOTALL | re.IGNORECASE)
with open("test_dict.txt", "r") as f:
tags_dict = f.read()
def _extract_body_text(data):
'''Get the body text of this html content wit any html tags stripped'''
body = RE_HTML_BODY.findall(data)
def tags_from_epub(path_to_epub):
temp = []
res = dict()
for line in _extract_body_text:
for key,value in tags_dict.items():
if re.search(rf'{value}', line):
if value not in temp:
temp.append(value)
res[key] = value
regex = re.compile(value)
match_array = regex.finditer(line)
match_list = list(match_array)
for m in match_list:
print(key, ":",m.group())
def run(gui, settings, chain):
db = gui.current_db
for book_id in chain.scope().get_book_ids():
fmts = [ fmt.strip() for fmt in db.formats(book_id, index_is_id=True).split(',') ]
if 'EPUB' in fmts:
path_to_epub = db.format_abspath(book_id, 'EPUB', index_is_id=True)
tags_from_epub(path_to_epub)