MobileRead Forums - View Single Post

lomkiri · 09-11-2025, 03:37 PM

Edit: Ah, I think I missed the point, you probably use the "count words" of the report tool.

In that case, the solution I would imagine is to write a specific regex-function creating a table of all words, copy of the one of the report tool, but excluding all tagged words as defined above.
The regex will select all the text inside the <body>, and the regex-function will create a dict {'w1': n1, 'w2': n2, ...} for all the words found but not preceded by the defined tag.

It is not a trivial function to write, but it's not too hard either. You could find this sort of idea in my function searching for the number of occurrences of each tag in an epub: https://www.mobileread.com/forums/sh...4&postcount=79

09-11-2025, 03:37 PM	#3
lomkiri Groupie Posts: 178 Karma: 1537710 Join Date: Jul 2021 Device: N/A	Edit: Ah, I think I missed the point, you probably use the "count words" of the report tool. In that case, the solution I would imagine is to write a specific regex-function creating a table of all words, copy of the one of the report tool, but excluding all tagged words as defined above. The regex will select all the text inside the <body>, and the regex-function will create a dict {'w1': n1, 'w2': n2, ...} for all the words found but not preceded by the defined tag. It is not a trivial function to write, but it's not too hard either. You could find this sort of idea in my function searching for the number of occurrences of each tag in an epub: https://www.mobileread.com/forums/sh...4&postcount=79