![]() |
#391 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@Diapdealer...Thanks for that. Then, since bs4 is causing this problem, how do I stop bs4 changing the svg tags contents from camel case to lower case or will I have to code this fix myself in bs4 ?
Last edited by slowsmile; 03-27-2018 at 07:47 PM. |
![]() |
![]() |
![]() |
#392 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@DiapDealer...The reason I want this svg/bs4 problem resolved is because most of my plugins use bs4 and, as a result, I have had to advise people in the release notes that they should not use svg images with these plugins. I' would really like to resolve this issue so that all my plugins will allow svg code.
Last edited by slowsmile; 03-27-2018 at 07:54 PM. |
![]() |
![]() |
![]() |
#393 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,783
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Only an html5 parser will grok svg properly. bs4 allows different parsers to work including gumbo, html, lxml, and html5lib. So have you tried using either the gumbo or html5lib parsers with bs4 (both are html5 parsers)?
Gumbo uses a lookup table to fix svg attribute names and html5lib uses a simple dictionary to fix the strange case issues with svg attributes. See here for the html5lib version (and search for viewbox) https://github.com/html5lib/html5lib...b/constants.py At worst case you can post process the source to fix the bad all lowercase attributes. Last edited by KevinH; 03-27-2018 at 08:10 PM. |
![]() |
![]() |
![]() |
#394 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,783
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Also post processing the output in gumbo and serializing it should fix the issues.
|
![]() |
![]() |
![]() |
#395 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,585
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I've found that using bs4's "html5lib" parser will preserve case in attribute names. As will sigil_gumbo_bs4_adapter.
Provided the xhtml variable contains markup with svg: Code:
from sigil_bs4 import BeautifulSoup as bs4 soup=bs4(xhtml, "html5lib") print(soup.prettify_xhtml()) Code:
import sigil_gumbo_bs4_adapter as bs4 soup=bs4.parse(xhtml) print(soup.prettify_xhtml()) |
![]() |
![]() |
![]() |
#396 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@DiapDealer and KevinH...Wow!!...What a simple solution. I'll try using the html5lib parser, that seems the easiest. Using html5lib should speed things up as well. Thanks for your help.
|
![]() |
![]() |
![]() |
#397 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 57
Karma: 600000
Join Date: Jan 2018
Device: Galaxy Tab S2
|
Another workaround if htnl5 Parser is causing other problems: just a simple string replace
![]() |
![]() |
![]() |
![]() |
#398 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 57
Karma: 600000
Join Date: Jan 2018
Device: Galaxy Tab S2
|
Hi,
i'm just playing around with the beautifulsoup inside sigil: i'm interest if an element contains some text or not. This one works fine: Code:
<p>Test</p> But Code:
<p><a id="x1"></a>Test</p> Is there an easy way to get always the information if i have text inside a block element or not? Code:
for (id, href) in bk.text_iter(): html = bk.readfile(id) soup = BeautifulSoup(html) body = soup.body tags = body.find_all(True) for tag in tags: print(tag.name, tag.attrs, tag.contents, tag.string) Maui |
![]() |
![]() |
![]() |
#399 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,783
Karma: 6000000
Join Date: Nov 2009
Device: many
|
In most dom implementations, not without walking the tree (either recursively or with a tree walker). The .string value is just the text that immediately follows the tag up to the next (possibly child) tag.
In bs4, I think there is a get_text() that will work on any element to get all of the text under that element. But I am not 100% on that name. Last edited by KevinH; 03-28-2018 at 11:52 AM. |
![]() |
![]() |
![]() |
#400 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
@Maui: If you're looking for a bs4 innerHTML equivalent, you could use decode_contents() to return the inner HTML code of an element: Code:
soup = BeautifulSoup('<p><a id="x1"></a>Test</p>') soup.p.decode_contents() # returns '<a id="x1"></a>Test' |
|
![]() |
![]() |
![]() |
#401 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 57
Karma: 600000
Join Date: Jan 2018
Device: Galaxy Tab S2
|
Hi agaim,
after climbing the html tree up and down i've collected everything i need. In certain cases i now want to insert a new element right below the body element. Code:
soup = BeautifulSoup('<body></body>') body = soup.body new_tag = soup.new_tag('h1') body.append(new_tag) print(body) Code:
<h1 class="class1 class2 class3">Header</h1> Code:
"{'class': ['invis','big'], 'id': 'page_i'} Code:
<h1 class="invis big" id='page_i'>Header</h1> ![]() |\/|aui Last edited by Maui; 04-03-2018 at 04:37 AM. |
![]() |
![]() |
![]() |
#402 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
E.g., you'd use the following code to add attributes: Code:
new_tag = soup.new_tag('h1') new_tag['class'] = 'invis big' new_tag['id'] = 'page_i' Code:
new_tag = soup.new_tag('h1') new_tag.attrs = {'class': 'invis big', 'id': 'page_i'} Last edited by Doitsu; 04-03-2018 at 06:14 AM. |
|
![]() |
![]() |
![]() |
#403 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 57
Karma: 600000
Join Date: Jan 2018
Device: Galaxy Tab S2
|
|
![]() |
![]() |
![]() |
#404 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,585
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
You could also use dictionary comprehension if a dictionary only needs a little massaging before assigning it to tag.attrs. I know bs4 returns a list when retrieving certain attributes (like class), but I wasn't aware it returned any tuples. Lists AND tuples would probably over-complicate a one-line dictionary comprehension (to the point where it would be easier to just iterate over the dictionary to handle all the exceptions).
I'm not near a development machine, but something like the following should work if list values are the ONLY exception (assuming attr_dict is your existing dictionary of attributes): Code:
new_tag.attrs = {k:(' '.join(v) if isinstance(v, list) else v) for (k,v) in attr_dict.items()} Code:
intermediate_dict = {k:list(v) if isinstance(v, tuple) else v) for (k,v) in attr_dict.items()} new_tag.attrs = {k:(' '.join(v) if isinstance(v, list) else v) for (k,v) in attr_intermediate_dict.items()} |
![]() |
![]() |
![]() |
#405 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Plugin Runner windows-only crash issue
I'm working on a Windows-only plugin that'll crash Sigil, if the click Plugin Runner window close button (X) is clicked while the plugin is running. (On my Linux machine, Sigil displays: Error Parsing Result XML: Encountered incorrectly encoded content., but it doesn't crash.)
Steps to reproduce this issue: 1. Install and run the crash plugin. 2. Click the first option in the list box. 3. Click the Plugin Runner window close button (X). The plugin uses the following code: Spoiler:
How do I need to change the code to gracefully exit the plugin, if the Plugin Runner window is closed? |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Loading Plugin in development | Sladd | Development | 6 | 06-17-2014 06:57 PM |
Question for plugin development gurus | DiapDealer | Plugins | 2 | 02-04-2012 11:33 PM |
DR800 Plugin development for DR800/DR1000 | yuri_b | iRex Developer's Corner | 0 | 09-18-2010 09:46 AM |
Device plugin development | reader42 | Plugins | 10 | 03-29-2010 12:39 PM |
Calibre plugin development - Newbie problems | minstrel | Plugins | 5 | 04-12-2009 12:44 PM |