Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 03-27-2021, 09:11 PM   #1
ebray187
Member
ebray187 began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
Problems with Beautifulsoup with custom tags

Hi!, i'm having troubles to add a custom tag with my plugin using Beautifulsoup:
The code:
Code:
    html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3">&lt;&lt;</a></p>'
    
    ## BeautifulSoup parser
    soup = BeautifulSoup(html, "html.parser")
    orig_soup = str(soup)
    original_tag = soup.p

    dict_atributes = {"xml:lang" : "la"}
    new_tag = soup.new_tag("i", attrs=dict_atributes)
    new_tag.string = "Ibid"
    original_tag.insert(1, " ")
    original_tag.insert(2, new_tag)
    original_tag.insert(3, ".")
    
    print("OUT:\n" + str(original_tag))
Outside Sigil everything OK:
Code:
$ python test.py
OUT:
<p id="nt3"><sup>[3]</sup> <i xml:lang="la">Ibid</i>. Note 1. <a href="../Text/Section0001.xhtml#nt3">&lt;&lt;</a></p>
But from Sigil i get:
Code:
OUT:
<p id="nt3"><sup>[3]</sup> <i attrs="{'xml:lang': 'la'}">Ibid</i>. Note 1. <a href="../Text/Section0001.xhtml#nt3"><<</a></p>
Any ideas? I can't find info about Beautifulsoup Sigil's implementation.
Thanks!

PS: Using python 3.8 and Sigil 1.4.3
ebray187 is offline   Reply With Quote
Old 03-27-2021, 10:49 PM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
What are the double xml escaped "<" as part of the text for?

How are getting the OUT?

If you print it from the plugin, it will pass through an xml encode xml decode pass when being returned from the plugin process over stdout as xml. So instead of printing to see this value, simply write to a log file from the plugin so you can see exactly what BeautifulSoup is generating. Here, my guess it is exactly identical to what you see outside, it is just getting unencoded passing back in the stdout xml file from the plugin.
KevinH is offline   Reply With Quote
Advert
Old 03-27-2021, 11:37 PM   #3
ebray187
Member
ebray187 began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
Quote:
Originally Posted by KevinH View Post
What are the double xml escaped "<" as part of the text for?
They are to return to the call of the reference in the text. Like a back button:
Quote:
[1] This is a note in the notes.xhtml chapter of the book. This arrows on the right are to return to the note call in chapter.xhtml. <<
On the xhtml they are in the &lt; form.

Quote:
Originally Posted by KevinH View Post
How are getting the OUT?
From the output of the print() function shown in the Plugin Runner. Its output is consistent with the bk.writefile().

Quote:
Originally Posted by KevinH View Post
If you print it from the plugin, it will pass through an xml encode xml decode pass when being returned from the plugin process over stdout as xml. So instead of printing to see this value, simply write to a log file from the plugin so you can see exactly what BeautifulSoup is generating. Here, my guess it is exactly identical to what you see outside, it is just getting unencoded passing back in the stdout xml file from the plugin.
I'm getting the same wrong output on a log file:
Quote:
OUT:
<p id="nt3"><sup>[3]</sup> <i attrs="{'xml:lang': 'la'}">Ibid</i>. Note 1. <a href="../Text/Section0001.xhtml#nt3">&lt;&lt;</a></p>
But running it outside Sigil works fine.

Here its the exact code:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os, re
import xml.etree.ElementTree as ET

try:
    from sigil_bs4 import BeautifulSoup
except:
    from bs4 import BeautifulSoup

def run(bk):
    html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3">&lt;&lt;</a></p>'
    
    ## BeautifulSoup parser
    soup = BeautifulSoup(html, "html.parser")
    orig_soup = str(soup)
    original_tag = soup.p

    dict_atributes = {"xml:lang" : "la"}
    new_tag = soup.new_tag("i", attrs=dict_atributes)
    new_tag.string = "Ibid"
    original_tag.insert(1, " ")
    original_tag.insert(2, new_tag)
    original_tag.insert(3, ".")
    
    output = "OUT:\n" + str(original_tag)

    f = open("log.txt", "w")
    f.write(output)
    f.close()
    
    print(output)

    return 0

def main():
    html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3">&lt;&lt;</a></p>'
    
    ## BeautifulSoup parser
    soup = BeautifulSoup(html, "html.parser")
    orig_soup = str(soup)
    original_tag = soup.p

    dict_atributes = {"xml:lang" : "la"}
    new_tag = soup.new_tag("i", attrs=dict_atributes)
    new_tag.string = "Ibid"
    original_tag.insert(1, " ")
    original_tag.insert(2, new_tag)
    original_tag.insert(3, ".")
    
    output = "OUT:\n" + str(original_tag)

    f = open("log.txt", "w")
    f.write(output)
    f.close()
    
    print(output)

if __name__ == "__main__":
    sys.exit(main())
Thanks a lot!

Last edited by ebray187; 03-27-2021 at 11:40 PM.
ebray187 is offline   Reply With Quote
Old 03-28-2021, 11:49 AM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
If you compare that to your first post you will see they are not the same. The printed output is showing the &lt; &lt; decoded when it should not be to be safely used.

The issue is you trying to assign an attribute as a dict. It is being converted to what is needed when run outside of the plugin environment but not inside. My guess is the default dict type is different. One may be an ordered dict collection while the other is not.

Have you tried assigning that attribute in a different way? Sigil's internal bs4 version has many modifications to work on older Python 3 versions back to 3.4, so it may be using different types than a recent BS4 version that only runs on a limited set of Python3 versions.
KevinH is offline   Reply With Quote
Old 03-28-2021, 11:53 AM   #5
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
I did notice this:

Quote:
This is a new feature in Beautiful Soup 4.4.0.)

What if you need to create a whole new tag? The best solution is to call the factory method BeautifulSoup.new_tag():

soup = BeautifulSoup("<b></b>", 'html.parser')
original_tag = soup.b

new_tag = soup.new_tag("a", href="http://www.example.com")
original_tag.append(new_tag)
original_tag
# <b><a href="http://www.example.com"></a></b>

new_tag.string = "Link text."
original_tag
# <b><a href="http://www.example.com">Link text.</a></b>
So my guess is that Sigil's internal version is not supporting adding the attributes the way you do with that method.

Last edited by KevinH; 03-28-2021 at 11:57 AM.
KevinH is offline   Reply With Quote
Advert
Old 03-28-2021, 12:03 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
Here are alternative ways to add an attribute ...

Quote:
Attributes¶
A tag may have any number of attributes. The tag <b id="boldest"> has an attribute “id” whose value is “boldest”. You can access a tag’s attributes by treating the tag like a dictionary:

tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
tag['id']
# 'boldest'
You can access that dictionary directly as .attrs:

tag.attrs
# {'id': 'boldest'}
You can add, remove, and modify a tag’s attributes. Again, this is done by treating the tag as a dictionary:

tag['id'] = 'verybold'
tag['another-attribute'] = 1
tag
# <b another-attribute="1" id="verybold"></b>

del tag['id']
del tag['another-attribute']
tag
# <b>bold</b>

tag['id']
# KeyError: 'id'
tag.get('id')
# None
So I would remove the attrs= parameter on the new tag method, and instead create the tag then either use the new tag in its dict mode to add the attributes needed one by one or assign it to the tags's .attrs if possible.
KevinH is offline   Reply With Quote
Old 03-28-2021, 12:18 PM   #7
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
I took a peek at the latest BS4 source at launchpad and they have changed how they handle passing the attrs attribute.

So doing it in two steps will be more compliant with other versions of both bs4 and python3 implementations.
KevinH is offline   Reply With Quote
Old 03-28-2021, 12:21 PM   #8
ebray187
Member
ebray187 began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
Quote:
Originally Posted by KevinH View Post
Here are alternative ways to add an attribute ...



So I would remove the attrs= parameter on the new tag method, and instead create the tag then either use the new tag in its dict mode to add the attributes needed one by one or assign it to the tags's .attrs if possible.
I have troubles with the ":" symbol in the xml:lang

Last edited by ebray187; 03-28-2021 at 12:25 PM.
ebray187 is offline   Reply With Quote
Old 03-28-2021, 12:28 PM   #9
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
There is a fully html5 compliant gumbo parser already there as well as a very simple serial parser called quickparser in place, and there is also a html5lib parser as well that is guaranteed to be there in for use by Sigil plugins.

Surely one of those will do what you need. As for using bs4 as long as you split the new_tag creation from attribute addition in that piece, it does work on all versions of BS4 and back to Python 3.4.
KevinH is offline   Reply With Quote
Old 03-28-2021, 12:31 PM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
It (the colon) should just be a string when used as an attribute name.

tag["xml:lang"] = "la"

to be more compatible with all version of BeautifulSoup
DiapDealer is offline   Reply With Quote
Old 03-28-2021, 12:41 PM   #11
ebray187
Member
ebray187 began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
Quote:
Originally Posted by DiapDealer View Post
It (the colon) should just be a string when used as an attribute name.

tag["xml:lang"] = "la"

to be more compatible with all version of BeautifulSoup
Thanks! it was just a typo... sorry

Thanks KevinH for your help.

Last edited by ebray187; 03-28-2021 at 12:43 PM.
ebray187 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column from tags jelzin Library Management 4 03-15-2021 02:30 PM
Custom Tags Disappearing wolffe Library Management 3 01-05-2019 04:48 PM
custom columns from only certain tags areyou Library Management 2 12-15-2012 05:33 AM
Custom columns vs tags Artha Calibre 3 11-22-2011 09:25 AM
Help with template for custom column from tags africalass Library Management 2 07-16-2011 11:47 AM


All times are GMT -4. The time now is 10:05 AM.


MobileRead.com is a privately owned, operated and funded community.