View Single Post
Old 08-14-2022, 11:51 PM   #1
lizzie1170
Member
lizzie1170 began at the beginning.
 
lizzie1170's Avatar
 
Posts: 12
Karma: 10
Join Date: Jul 2022
Device: none
Extract text from selected books, convert them to tags, and add them to metadata.

I created a python code in VScode that allows me to perform text searches within an epub book, these searches consist of matching the text of the book with regular expressions. These regular expressions come from patterns that I formulated for the tags in my library. I have already managed to get over 400 tags this way and I have a custom column for them, I add the @ symbol at the beginning to differentiate them from tags downloaded from other sources. I have 3000+ books and I want each of them to be attacked by these 400+ regular expressions.

I need help because my code only contemplates the search in a single book and what I want to configure is:
** Run the code on selected books from my library (books_ids).
** Found tags are added to the metadata.
** Add a verification tag confirming that the book was processed.

The truth is that my knowledge of python is very poor, I only learned about regular expressions thanks to Calibre.

Code:
import re
import ast
from epub_conversion.utils import open_book, convert_epub_to_lines
import colorama
colorama.init()

book = open_book("Cthulhu Mythos.epub")
lines = convert_epub_to_lines(book)
with open("test_dict.txt", "r") as data:
    tags_dict = ast.literal_eval(data.read())

print(colorama.Back.YELLOW + 'Matches(regex - book text):',colorama.Style.RESET_ALL)
temp = []
res = dict()
for line in lines:
    for key,value in tags_dict.items():
         if re.search(rf'{value}', line):
            if value not in temp:
                temp.append(value)
                res[key] = value                
                regex = re.compile(value) 
                match_array = regex.finditer(line) 
                match_list = list(match_array)
                for m in match_list:
                    print(colorama.Fore.MAGENTA + key, ":",colorama.Style.RESET_ALL + m.group())

print('\n',colorama.Back.YELLOW + 'Found tags:',colorama.Style.RESET_ALL)
temp = []
res = dict()
for line in lines:
    for key,value in tags_dict.items():
         if re.search(rf'{value}', line):
            if value not in temp:
                temp.append(value)
                res[key] = value
                print(colorama.Fore.GREEN + key, end=", ")                              

print('\n\n' + colorama.Back.YELLOW + "N° found tags:",colorama.Style.RESET_ALL, len(temp))

I show you in images, what I need to execute.

I'ld appreciate any help with the code, thank you very much.
Attached Thumbnails
Click image for larger version

Name:	print_code.PNG
Views:	149
Size:	40.0 KB
ID:	195870   Click image for larger version

Name:	Execute.png
Views:	137
Size:	130.9 KB
ID:	195872  
Attached Files
File Type: txt test_dict.txt (383 Bytes, 79 views)
lizzie1170 is offline   Reply With Quote