You would have to parse the file using the gumbo bs4 adapter, the each node of the parse tree is given extra information fields:
Code:
def _add_source_info(obj, original_text, start_pos, end_pos):
obj.original = _fromutf8(bytes(original_text))
obj.line = start_pos.line
obj.col = start_pos.column
obj.offset = start_pos.offset
if end_pos:
obj.end_line = end_pos.line
obj.end_col = end_pos.column
obj.end_offset = end_pos.offset
See:
https://github.com/Sigil-Ebook/Sigil...bs4_adapter.py
And from the testme3 plugin posted at the start of this thread is how to use the gumbo parser:
Code:
# examples for using the bs4/gumbo parser to process xhtml
print("\nExercising: the gumbo bs4 adapter")
import sigil_gumbo_bs4_adapter as gumbo_bs4
samp = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-US">
<head><title>testing & entities</title></head>
<body>
<p class="first second">this is*the*<i><b>copyright</i></b> symbol "©"</p>
<p xmlns:xlink="http://www.w3.org/xlink" class="second" xlink:href="http://www.ggogle.com">this used to test atribute namespaces</p>
</body>
</html>
"""
soup = gumbo_bs4.parse(samp)
for node in soup.find_all(attrs={'class':'second'}):
print(node)
So you should be able to access them via node.line, node.col, and node.offset but I can not prove that now as all I have access to is my old iPad.
Please give that a try.