View Single Post
Old 01-31-2023, 04:59 AM   #494
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,741
Karma: 24031403
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by philm View Post
I need to count rendered words, not everything from code.
Tools > Reports > HTML Files > All words

Quote:
Originally Posted by philm View Post
Actually I need to spot every 280th rendered word.
You could use BeautifulSoup as a text filter:

Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from sigil_bs4 import BeautifulSoup

# main routine
def run(bk):

    # iterate over all html files
    for html_id, href in bk.text_iter():

        # read orignal html code from file
        html = bk.readfile(html_id)
        name = os.path.basename(href)

        # use BeautifulSoup to remove the tags
        soup = BeautifulSoup(html, 'html.parser')
        text = soup.get_text()

        # split text 
        words = text.split(' ')
        
        # get the 280th word
        if len(words) >= 280:
            print(name, '280:', words[279])

    return 0

def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())
Doitsu is offline   Reply With Quote