Plugin Development - Page 27

slowsmile · 03-27-2018, 07:37 PM

@Diapdealer...Thanks for that. Then, since bs4 is causing this problem, how do I stop bs4 changing the svg tags contents from camel case to lower case or will I have to code this fix myself in bs4 ?

slowsmile · 03-27-2018, 07:52 PM

@DiapDealer...The reason I want this svg/bs4 problem resolved is because most of my plugins use bs4 and, as a result, I have had to advise people in the release notes that they should not use svg images with these plugins. I' would really like to resolve this issue so that all my plugins will allow svg code.

KevinH · 03-27-2018, 08:04 PM

Only an html5 parser will grok svg properly. bs4 allows different parsers to work including gumbo, html, lxml, and html5lib. So have you tried using either the gumbo or html5lib parsers with bs4 (both are html5 parsers)?

Gumbo uses a lookup table to fix svg attribute names and html5lib uses a simple dictionary to fix the strange case issues with svg attributes.

See here for the html5lib version (and search for viewbox)

https://github.com/html5lib/html5lib...b/constants.py

At worst case you can post process the source to fix the bad all lowercase attributes.

KevinH · 03-27-2018, 08:11 PM

Also post processing the output in gumbo and serializing it should fix the issues.

DiapDealer · 03-27-2018, 08:18 PM

I've found that using bs4's "html5lib" parser will preserve case in attribute names. As will sigil_gumbo_bs4_adapter.

Provided the xhtml variable contains markup with svg:

Code:

from sigil_bs4 import BeautifulSoup as bs4

soup=bs4(xhtml, "html5lib")
print(soup.prettify_xhtml())

or

Code:

import sigil_gumbo_bs4_adapter as bs4

soup=bs4.parse(xhtml)
print(soup.prettify_xhtml())

slowsmile · 03-27-2018, 08:28 PM

@DiapDealer and KevinH...Wow!!...What a simple solution. I'll try using the html5lib parser, that seems the easiest. Using html5lib should speed things up as well. Thanks for your help.

Maui · 03-28-2018, 02:58 AM

Another workaround if htnl5 Parser is causing other problems: just a simple string replace

Maui · 03-28-2018, 10:25 AM

Hi,

i'm just playing around with the beautifulsoup inside sigil:

i'm interest if an element contains some text or not.

This one works fine:

Code:

<p>Test</p>

tag.string contains Test.

But

Code:

<p><a id="x1"></a>Test</p>

doesnt work, tag.string is None, but tag.content now contains two items, the anchor and the text,

Is there an easy way to get always the information if i have text inside a block element or not?

Code:

    for (id, href) in bk.text_iter():
        html = bk.readfile(id)
        soup = BeautifulSoup(html)
        body = soup.body
        tags = body.find_all(True)
        for tag in tags:  
            print(tag.name, tag.attrs, tag.contents, tag.string)

regards
Maui

KevinH · 03-28-2018, 11:33 AM

In most dom implementations, not without walking the tree (either recursively or with a tree walker). The .string value is just the text that immediately follows the tag up to the next (possibly child) tag.

In bs4, I think there is a get_text() that will work on any element to get all of the text under that element. But I am not 100% on that name.

Doitsu · 03-28-2018, 01:36 PM

Quote:

Originally Posted by KevinH

In bs4, I think there is a get_text() that will work on any element to get all of the text under that element. But I am not 100% on that name.

get_text() is the correct name.

@Maui: If you're looking for a bs4 innerHTML equivalent, you could use decode_contents() to return the inner HTML code of an element:

Code:

soup = BeautifulSoup('<p><a id="x1"></a>Test</p>')
soup.p.decode_contents() 
# returns '<a id="x1"></a>Test'

Maui · 04-03-2018, 04:34 AM

Hi agaim,

after climbing the html tree up and down i've collected everything i need. In certain cases i now want to insert a new element right below the body element.

Code:

        soup = BeautifulSoup('<body></body>')
        body = soup.body
        new_tag = soup.new_tag('h1')
        body.append(new_tag)
        print(body)

So far so good. But the inserted tag does have a couple of attributes i need to insert righjt now. i've collected the attributes in a dictionary. The other way round, to get all attributes is quite simple. But i fail to do it the other way round. Is there an easy was to insert the attributes from a dictionary to the tag? Unfortunately, the values can be a list as well as an attribute may have a tupel of entries:

Code:

<h1 class="class1 class2 class3">Header</h1>

i'm unable to find a function to insert the following dictionary

Code:

"{'class': ['invis','big'], 'id': 'page_i'}

into the tag so the result is

Code:

<h1 class="invis big" id='page_i'>Header</h1>

Hopefully there is a function in existence and i don't need to do it for myself

|\/|aui

Doitsu · 04-03-2018, 06:00 AM

Quote:

Originally Posted by Maui

Is there an easy was to insert the attributes from a dictionary to the tag?

bs4 uses the default dictionary syntax for tag attributes.
E.g., you'd use the following code to add attributes:

Code:

new_tag = soup.new_tag('h1')
new_tag['class'] = 'invis big'
new_tag['id'] = 'page_i'

you could also manipulate the attrs property:

Code:

new_tag = soup.new_tag('h1')
new_tag.attrs = {'class': 'invis big', 'id': 'page_i'}

Maui · 04-03-2018, 06:24 AM

Quote:

Originally Posted by Doitsu

you could also manipulate the attrs property:

Code:

new_tag = soup.new_tag('h1')
new_tag.attrs = {'class': 'invis big', 'id': 'page_i'}

Wow! Didnt thought that that it is so easy... i read the manual so often to find a function to do that

DiapDealer · 04-05-2018, 08:14 AM

You could also use dictionary comprehension if a dictionary only needs a little massaging before assigning it to tag.attrs. I know bs4 returns a list when retrieving certain attributes (like class), but I wasn't aware it returned any tuples. Lists AND tuples would probably over-complicate a one-line dictionary comprehension (to the point where it would be easier to just iterate over the dictionary to handle all the exceptions).

I'm not near a development machine, but something like the following should work if list values are the ONLY exception (assuming attr_dict is your existing dictionary of attributes):

Code:

new_tag.attrs = {k:(' '.join(v) if isinstance(v, list) else v) for (k,v) in attr_dict.items()}

You could make it a two-step process if you thought there were any tuples in the dictionary values:

Code:

intermediate_dict = {k:list(v) if isinstance(v, tuple) else v) for (k,v) in attr_dict.items()}
new_tag.attrs = {k:(' '.join(v) if isinstance(v, list) else v) for (k,v) in attr_intermediate_dict.items()}

The above would really only be useful if you were looking to retrieve a dictionary of attributes from one tag (via bs4) and transfer it to a newly created tag. Otherwise... if you have to manually specify the new attributes (or build an attr dictionary from scratch), you may as well assign them as Doitsu suggested; or just iterate over the dictionary and assign them to the new tag--handling any specific list or tuple conditions along the way.

Doitsu · 05-08-2018, 07:51 AM

I'm working on a Windows-only plugin that'll crash Sigil, if the click Plugin Runner window close button (X) is clicked while the plugin is running. (On my Linux machine, Sigil displays: Error Parsing Result XML: Encountered incorrectly encoded content., but it doesn't crash.)

Steps to reproduce this issue:

1. Install and run the crash plugin.
2. Click the first option in the list box.
3. Click the Plugin Runner window close button (X).

The plugin uses the following code:

Spoiler:

How do I need to change the code to gracefully exit the plugin, if the Plugin Runner window is closed?

03-27-2018, 07:37 PM	#391
slowsmile Witchman Posts: 628 Karma: 788808 Join Date: May 2013 Location: Philippines Device: Android S5	@Diapdealer...Thanks for that. Then, since bs4 is causing this problem, how do I stop bs4 changing the svg tags contents from camel case to lower case or will I have to code this fix myself in bs4 ? Last edited by slowsmile; 03-27-2018 at 07:47 PM.

03-27-2018, 07:52 PM	#392
slowsmile Witchman Posts: 628 Karma: 788808 Join Date: May 2013 Location: Philippines Device: Android S5	@DiapDealer...The reason I want this svg/bs4 problem resolved is because most of my plugins use bs4 and, as a result, I have had to advise people in the release notes that they should not use svg images with these plugins. I' would really like to resolve this issue so that all my plugins will allow svg code. Last edited by slowsmile; 03-27-2018 at 07:54 PM.

03-27-2018, 08:04 PM	#393
KevinH Sigil Developer Posts: 8,789 Karma: 6000000 Join Date: Nov 2009 Device: many	Only an html5 parser will grok svg properly. bs4 allows different parsers to work including gumbo, html, lxml, and html5lib. So have you tried using either the gumbo or html5lib parsers with bs4 (both are html5 parsers)? Gumbo uses a lookup table to fix svg attribute names and html5lib uses a simple dictionary to fix the strange case issues with svg attributes. See here for the html5lib version (and search for viewbox) https://github.com/html5lib/html5lib...b/constants.py At worst case you can post process the source to fix the bad all lowercase attributes. Last edited by KevinH; 03-27-2018 at 08:10 PM.

03-27-2018, 08:18 PM	#395
DiapDealer Grand Sorcerer Posts: 28,587 Karma: 204624552 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	I've found that using bs4's "html5lib" parser will preserve case in attribute names. As will sigil_gumbo_bs4_adapter. Provided the xhtml variable contains markup with svg: Code: from sigil_bs4 import BeautifulSoup as bs4 soup=bs4(xhtml, "html5lib") print(soup.prettify_xhtml()) or Code: import sigil_gumbo_bs4_adapter as bs4 soup=bs4.parse(xhtml) print(soup.prettify_xhtml())

03-28-2018, 10:25 AM	#398
Maui Connoisseur Posts: 57 Karma: 600000 Join Date: Jan 2018 Device: Galaxy Tab S2	Hi, i'm just playing around with the beautifulsoup inside sigil: i'm interest if an element contains some text or not. This one works fine: Code: <p>Test</p> tag.string contains Test. But Code: <p><a id="x1"></a>Test</p> doesnt work, tag.string is None, but tag.content now contains two items, the anchor and the text, Is there an easy way to get always the information if i have text inside a block element or not? Code: for (id, href) in bk.text_iter(): html = bk.readfile(id) soup = BeautifulSoup(html) body = soup.body tags = body.find_all(True) for tag in tags: print(tag.name, tag.attrs, tag.contents, tag.string) regards Maui

03-27-2018, 08:11 PM	#394
KevinH Sigil Developer Posts: 8,789 Karma: 6000000 Join Date: Nov 2009 Device: many	Also post processing the output in gumbo and serializing it should fix the issues.

03-27-2018, 08:28 PM	#396
slowsmile Witchman Posts: 628 Karma: 788808 Join Date: May 2013 Location: Philippines Device: Android S5	@DiapDealer and KevinH...Wow!!...What a simple solution. I'll try using the html5lib parser, that seems the easiest. Using html5lib should speed things up as well. Thanks for your help.

03-28-2018, 02:58 AM	#397
Maui Connoisseur Posts: 57 Karma: 600000 Join Date: Jan 2018 Device: Galaxy Tab S2	Another workaround if htnl5 Parser is causing other problems: just a simple string replace

03-28-2018, 11:33 AM	#399
KevinH Sigil Developer Posts: 8,789 Karma: 6000000 Join Date: Nov 2009 Device: many	In most dom implementations, not without walking the tree (either recursively or with a tree walker). The .string value is just the text that immediately follows the tag up to the next (possibly child) tag. In bs4, I think there is a get_text() that will work on any element to get all of the text under that element. But I am not 100% on that name. Last edited by KevinH; 03-28-2018 at 11:52 AM.

04-03-2018, 04:34 AM	#401
Maui Connoisseur Posts: 57 Karma: 600000 Join Date: Jan 2018 Device: Galaxy Tab S2	Hi agaim, after climbing the html tree up and down i've collected everything i need. In certain cases i now want to insert a new element right below the body element. Code: soup = BeautifulSoup('<body></body>') body = soup.body new_tag = soup.new_tag('h1') body.append(new_tag) print(body) So far so good. But the inserted tag does have a couple of attributes i need to insert righjt now. i've collected the attributes in a dictionary. The other way round, to get all attributes is quite simple. But i fail to do it the other way round. Is there an easy was to insert the attributes from a dictionary to the tag? Unfortunately, the values can be a list as well as an attribute may have a tupel of entries: Code: <h1 class="class1 class2 class3">Header</h1> i'm unable to find a function to insert the following dictionary Code: "{'class': ['invis','big'], 'id': 'page_i'} into the tag so the result is Code: <h1 class="invis big" id='page_i'>Header</h1> Hopefully there is a function in existence and i don't need to do it for myself \|\/\|aui Last edited by Maui; 04-03-2018 at 04:37 AM.

04-05-2018, 08:14 AM	#404
DiapDealer Grand Sorcerer Posts: 28,587 Karma: 204624552 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	You could also use dictionary comprehension if a dictionary only needs a little massaging before assigning it to tag.attrs. I know bs4 returns a list when retrieving certain attributes (like class), but I wasn't aware it returned any tuples. Lists AND tuples would probably over-complicate a one-line dictionary comprehension (to the point where it would be easier to just iterate over the dictionary to handle all the exceptions). I'm not near a development machine, but something like the following should work if list values are the ONLY exception (assuming attr_dict is your existing dictionary of attributes): Code: new_tag.attrs = {k:(' '.join(v) if isinstance(v, list) else v) for (k,v) in attr_dict.items()} You could make it a two-step process if you thought there were any tuples in the dictionary values: Code: intermediate_dict = {k:list(v) if isinstance(v, tuple) else v) for (k,v) in attr_dict.items()} new_tag.attrs = {k:(' '.join(v) if isinstance(v, list) else v) for (k,v) in attr_intermediate_dict.items()} The above would really only be useful if you were looking to retrieve a dictionary of attributes from one tag (via bs4) and transfer it to a newly created tag. Otherwise... if you have to manually specify the new attributes (or build an attr dictionary from scratch), you may as well assign them as Doitsu suggested; or just iterate over the dictionary and assign them to the new tag--handling any specific list or tuple conditions along the way.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Loading Plugin in development	Sladd	Development	6	06-17-2014 06:57 PM
Question for plugin development gurus	DiapDealer	Plugins	2	02-04-2012 11:33 PM
DR800 Plugin development for DR800/DR1000	yuri_b	iRex Developer's Corner	0	09-18-2010 09:46 AM
Device plugin development	reader42	Plugins	10	03-29-2010 12:39 PM
Calibre plugin development - Newbie problems	minstrel	Plugins	5	04-12-2009 12:44 PM