MobileRead Forums - View Single Post - Problems with Beautifulsoup with custom tags

ebray187 · 03-27-2021, 11:37 PM

Quote:

Originally Posted by KevinH

What are the double xml escaped "<" as part of the text for?

They are to return to the call of the reference in the text. Like a back button:

Quote:

[1] This is a note in the notes.xhtml chapter of the book. This arrows on the right are to return to the note call in chapter.xhtml. <<

On the xhtml they are in the < form.

Quote:

Originally Posted by KevinH

How are getting the OUT?

From the output of the print() function shown in the Plugin Runner. Its output is consistent with the bk.writefile().

Quote:

Originally Posted by KevinH

If you print it from the plugin, it will pass through an xml encode xml decode pass when being returned from the plugin process over stdout as xml. So instead of printing to see this value, simply write to a log file from the plugin so you can see exactly what BeautifulSoup is generating. Here, my guess it is exactly identical to what you see outside, it is just getting unencoded passing back in the stdout xml file from the plugin.

I'm getting the same wrong output on a log file:

Quote:

OUT:
[3] Ibid. Note 1. <a href="../Text/Section0001.xhtml#nt3"><<</a>

But running it outside Sigil works fine.

Here its the exact code:

Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os, re
import xml.etree.ElementTree as ET

try:
    from sigil_bs4 import BeautifulSoup
except:
    from bs4 import BeautifulSoup

def run(bk):
    html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3">&lt;&lt;</a></p>'
    
    ## BeautifulSoup parser
    soup = BeautifulSoup(html, "html.parser")
    orig_soup = str(soup)
    original_tag = soup.p

    dict_atributes = {"xml:lang" : "la"}
    new_tag = soup.new_tag("i", attrs=dict_atributes)
    new_tag.string = "Ibid"
    original_tag.insert(1, " ")
    original_tag.insert(2, new_tag)
    original_tag.insert(3, ".")
    
    output = "OUT:\n" + str(original_tag)

    f = open("log.txt", "w")
    f.write(output)
    f.close()
    
    print(output)

    return 0

def main():
    html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3">&lt;&lt;</a></p>'
    
    ## BeautifulSoup parser
    soup = BeautifulSoup(html, "html.parser")
    orig_soup = str(soup)
    original_tag = soup.p

    dict_atributes = {"xml:lang" : "la"}
    new_tag = soup.new_tag("i", attrs=dict_atributes)
    new_tag.string = "Ibid"
    original_tag.insert(1, " ")
    original_tag.insert(2, new_tag)
    original_tag.insert(3, ".")
    
    output = "OUT:\n" + str(original_tag)

    f = open("log.txt", "w")
    f.write(output)
    f.close()
    
    print(output)

if __name__ == "__main__":
    sys.exit(main())

Thanks a lot!