Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 08-04-2019, 09:34 AM   #1
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Putting a soup back to the metadata

Hi,

i would like to modify the metadata. So i'm reading the metadata using bk.getmetadataxml() and make use of Sigils own Beautifulsoup:

from sigil_bs4 import BeautifulSoup

But i'm getting an html and a body element around the metadata

Even using the lxml parser doesnt change this behaviour. Is there any way (except deleting <html>, <body> and the corresponding closing tags myself) to prevent this?

Once i made my changes to the metadata i would like to write it back, serialize_xhtml inserts unwanted elements as well, is there any other way or do i need to make use of prettify() to get the metadata as a string and writing it back via setmetadataxml( string ). I guess setmetadataxml does not accept a soup...

vroni
Vroni is offline   Reply With Quote
Old 08-04-2019, 12:32 PM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
bk.getmetadataxml() returns a utf-8 encoded xml fragment from <metadata> to </metadata>.

bk.setmetadataxml() expects a similar utf-8 encoded xml fragment in return.

All the processing that happens in between those two events is entirely up to you. But neither serialize_xhtml() nor sigil_bs4's xhtml parser in general, would be a wise choice, in my opinion, for processing the data. Considering that the opf file and the resulting metadata fragment is not, in fact, xhtml.

Last edited by DiapDealer; 08-04-2019 at 12:35 PM.
DiapDealer is offline   Reply With Quote
Advert
Old 08-04-2019, 02:32 PM   #3
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
You mean i have to parse it myself? Pfuuh, this will be the end of the development. I would just alter one meta entry and maybe add another one, but would like to keep all others. I dont want to do it via regex.

I already have the correct soup, but with these nasty html and body elements around.

Hmmm wasnt there a simple xml parser available?
Vroni is offline   Reply With Quote
Old 08-04-2019, 02:57 PM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
BS4 can use an pure xml parser such as lxml. There is also a built in QuickParser (see the test plugin example and epub3itizer plugin for examples of quickparser use) which can happily parse fragments of xml or xhtml.
KevinH is offline   Reply With Quote
Old 08-04-2019, 03:37 PM   #5
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Hi Kevin,

thanks for the Quickparser Hint.

Regarding BS4 the LXML parser adds html and body elements as well (which i dont understand), as written im my first post.
Vroni is offline   Reply With Quote
Advert
Old 08-04-2019, 03:40 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
No you need to tell lxml to use an xml parser and an xml serializer with bs4.

Check out Sigil/src/Resource_Files/python3lib/xmlprocessor.py for examples.

For example: performOPFUpdates in that file show how to use an xmlbuilder to parse pure xml for bs4 and how to serialize it back using decodexml.

Last edited by KevinH; 08-04-2019 at 03:44 PM.
KevinH is offline   Reply With Quote
Old 08-04-2019, 05:57 PM   #7
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
@Vroni:

The following minimal code should get you started:

Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from sigil_bs4 import BeautifulSoup

def run(bk):
    metadata_soup = BeautifulSoup(bk.getmetadataxml(), 'lxml')
    dc_language = metadata_soup.find('dc:language')
    if not dc_language:
        dc_language = metadata_soup.new_tag('dc:language')
        metadata_soup.metadata.append(dc_language)
    dc_language.string = 'en-US'
    new_metadata = str(metadata_soup.prettyprint_xhtml())
    bk.setmetadataxml(new_metadata)
    print('Done')
    return 0

def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())
It'll change the language code to en-US or add a new en-US language metadata entry.
Doitsu is offline   Reply With Quote
Old 08-04-2019, 07:36 PM   #8
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
Technically, I think the builder should be set to lxml-xml or even just xml if you do not manually set the TreeBuilder to use as is done in xmlprocessor.py.

The key is to make sure you use etree.XMLParser via lxml
KevinH is offline   Reply With Quote
Old 08-08-2019, 08:32 AM   #9
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Quote:
Originally Posted by Doitsu View Post
@Vroni:

The following minimal code should get you started:
Hi, thx for the example. I'm pretty sure i've tested my code with lxml as parser and got the html and body elements as well. But i will try this again and see what i did wrong as soon as my Sigiil installation is working again
Vroni is offline   Reply With Quote
Old 08-08-2019, 08:51 AM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Vroni View Post
Hi, thx for the example. I'm pretty sure i've tested my code with lxml as parser and got the html and body elements as well.
If so, then I'll once again point out Kevin's suggestion of using xmlprocessor.py as an example of parsing/serializing pure xml with the tools available to Sigil plugins. The file can be found in the Sigil/python3lib folder of a Windows installation of Sigil, or in the src/Resource_File/python3lib folder of the Sigil source code.

Last edited by DiapDealer; 08-08-2019 at 09:34 AM.
DiapDealer is offline   Reply With Quote
Old 08-08-2019, 09:16 AM   #11
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
Note lxml will parse both pure xml and html. You have to tell it which one to use by telling it which builder or parser to use. And you should also use an appropriate serializer. If you try to use an html parser and serializer on a pure xml fragment you will end up with exactly the error you reported.

KevinH

Quote:
Originally Posted by Vroni View Post
Hi, thx for the example. I'm pretty sure i've tested my code with lxml as parser and got the html and body elements as well. But i will try this again and see what i did wrong as soon as my Sigiil installation is working again
KevinH is offline   Reply With Quote
Old 08-08-2019, 09:37 AM   #12
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
The problem is the parser (at this point) Putting debugging prints in the code i can see the html/body is already inserted in the soup by the parser.

Vroni
Vroni is offline   Reply With Quote
Old 08-08-2019, 09:45 AM   #13
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Vroni View Post
The problem is the parser (at this point) Putting debugging prints in the code i can see the html/body is already inserted in the soup by the parser.

Vroni
Then you're not configuring the parser correctly.
DiapDealer is offline   Reply With Quote
Old 08-08-2019, 10:00 AM   #14
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
I didnt got any error message, but lets see where's the difference between doitsos and my code
Vroni is offline   Reply With Quote
Old 08-08-2019, 10:42 AM   #15
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Vroni View Post
but lets see where's the difference between doitsos and my code
And then maybe look at the code that both of Sigil's maintainers are trying really, really hard to steer you toward.

Last edited by DiapDealer; 08-08-2019 at 10:45 AM.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Please, give us back old metadata tab! semsaudade Sigil 26 05-19-2017 03:58 AM
metadata.db library back up obihal Library Management 2 06-05-2015 03:04 PM
iPad [Marvin] editing metadata and syncing back tsolignani Apple Devices 3 02-15-2013 11:56 AM
back cover of paperback - metadata ? cybmole Calibre 0 05-11-2011 03:43 PM
Free Book (Kindle) - Putting the Public Back in Public Relations koland Deals and Resources (No Self-Promotion or Affiliate Links) 0 12-27-2010 09:28 AM


All times are GMT -4. The time now is 10:36 AM.


MobileRead.com is a privately owned, operated and funded community.