Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 08-08-2019, 11:25 AM   #16
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,591
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
This plugin prints the xml snippet that bk.getmetadata() returns, prints the soup made from that snippet, adds the dc:language entry if not present, serializes the soup and prints the results, then ultimately writes the xml snippet back with bk.setmetadata().

You'll note that at no point in the process do any html or body tags get added.
Attached Files
File Type: zip xmlsouptest.zip (1.1 KB, 291 views)

Last edited by DiapDealer; 08-08-2019 at 11:58 AM.
DiapDealer is offline   Reply With Quote
Old 08-08-2019, 12:00 PM   #17
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by DiapDealer View Post
This plugin prints xml snippet that bk.getmetadata() returns, prints the soup made from that snippet, adds the dc:lang entry if not present, serializes and prints the soup, then ultimately writes the xml snippet back with bk.setmetadata().

You'll note that at no point in the process do any html or body tags get added.
Thanks for the code! You might want to add it to the Sigil API Framework documentation, because LXMLTreeBuilderForXML is somewhat "underdocumented."
Doitsu is offline   Reply With Quote
Advert
Old 08-08-2019, 12:09 PM   #18
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,591
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
xmlprocessor.py also has examples of passing optional lists of relevant void tags to LXMLTreeBuilderForXML that are specific to xml file-types to assist in processing entire opf, ncx, and other xml files-types.

And the LXMLTreeBuilderForXML approach is probably overkill for simple epub metadata work. You can accomplish the same thing with:
Code:
from sigil_bs4 import BeautifulSoup 

metadata = bk.getmetadataxml()
metadata_soup = BeautifulSoup(metadata, "lxml-xml")
.
.
stir the xml soup
.
.
new_metadata = metadata_soup.decodexml(indent_level=0, formatter='minimal', indent_chars="  ")
# or new_metadata = metadata_soup.decodexml() if you don't care about prettying.
The point is to avoid html parsers and (x)html serializers.

Last edited by DiapDealer; 08-08-2019 at 03:34 PM.
DiapDealer is offline   Reply With Quote
Old 08-09-2019, 05:16 AM   #19
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Quote:
Originally Posted by DiapDealer View Post
This plugin prints the xml snippet that bk.getmetadata() returns, prints the soup made from that snippet, adds the dc:language entry if not present, serializes the soup and prints the results, then ultimately writes the xml snippet back with bk.setmetadata().
Thats the expected result, but it adds a second (third foruth and so on) dc_language element all the time From the documentation find() returns None if it finds nothing and in that case if adds it and sets the language ex_us.

So whats wrong with if not dc_language: ? If should not insert something, just changing.

Before:

Code:
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:7967fadc-d511-42ee-aad1-a472e662546a</dc:identifier>
    <dc:language>de</dc:language>
    <dc:title>[Title here]</dc:title>
  </metadata>
After

Code:
<?xml version="1.0" encoding="utf-8" ?>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
  <dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:7967fadc-d511-42ee-aad1-a472e662546a</dc:identifier>
  <dc:language>de</dc:language>
  <dc:title>[Title here]</dc:title>
  <dc:language>en-US</dc:language>
</metadata>
By the way, the parser adds the xml starting declaration. At least, that doesnt mess up the content.opf file.



What my Python abilities now exceed is that if not dc_language statement does work. Debugging the code with print(): the content of the Variable is None

Vroni
Vroni is offline   Reply With Quote
Old 08-09-2019, 06:09 AM   #20
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Vroni View Post
So whats wrong with if not dc_language: ?
Both my bs4 + lxml suggestion and DiapDealers bs4 + LXMLTreeBuilderForXML code snippets work as designed.

If they don't work on your machine, please post your code.

I haven't tested DiapDealer's latest bs4 + lxml-xml suggestion, but, IIRC, if you're using the lxml-xml parser, you'll have to omit the dc: namespace prefix when using bs4 find:

Code:
dc_language = metadata_soup.find('language')
Doitsu is offline   Reply With Quote
Advert
Old 08-09-2019, 06:23 AM   #21
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,591
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Yes, the xml declaration hurts nothing and affects nothing. That's why I didn't mention it. It's not relevant to writing the metadata soup back to the epub. But once again, the xmlprocessor.py file that we keep trying to point people to for examples of how to parse/serialize pure xml with sigil_bs4 has an example of how to easily strip the xml header.

As for the logic of adding the dc:language element or not; it was only ever intended as a simple example of diddling the metadata via bs4. If it doesn't work, then change the logic. My sample was addressing the proper way to parse/serialize pure xml fragments in a Sigil plugin. It's up to you to figure out how best to modify the metadata soup.
DiapDealer is offline   Reply With Quote
Old 08-09-2019, 06:59 AM   #22
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Well it might be the colon in dc:language which confuses your version diap, as this is not a tag but a tag with a namepace. Its just not found, thats why it adds a new one. Always.

This is my coding now:

Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from sigil_bs4 import BeautifulSoup
from sigil_bs4.builder._lxml import LXMLTreeBuilderForXML


def run(bk):
#    xmlbuilder = LXMLTreeBuilderForXML(parser=None)
    metadata = bk.getmetadataxml()
    print('...')
    print(metadata)
#    metadata_soup = BeautifulSoup(metadata, features=None, from_encoding="utf-8", builder=xmlbuilder)
    metadata_soup = BeautifulSoup(bk.getmetadataxml(), 'lxml')
    print('...')
    print(metadata_soup)
    print('...')    
    dc_language = metadata_soup.find({"dc:language"})
    print(dc_language)
   
    if dc_language is None:
        print('...')  
        print('Creating new element')
        dc_language = metadata_soup.new_tag('dc:language')
        metadata_soup.metadata.append(dc_language)
    dc_language.string = 'en-US'
    new_metadata = metadata_soup.decodexml(indent_level=0, formatter='minimal', indent_chars="  ")[40:]
    print('...')
    print(new_metadata)
    
    bk.setmetadataxml(new_metadata)
    print('Done')
    return 0

def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())
If i use doitsos version i get this one:

Code:
<?xml version="1.0" encoding="utf-8" ?>
<html>
<body>
 <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"><dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:7967fadc-d511-42ee-aad1-a472e662546a</dc:identifier>
 <dc:title>[Title here]</dc:title>
 <dc:language>en-US</dc:language></metadata>
</body>
</html>
Vroni is offline   Reply With Quote
Old 08-09-2019, 07:12 AM   #23
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
So searching for language without the namespace works fine in Diaps code, and in addition i'm slicing the xml declaration away.

for this, its a good starting point!
Vroni is offline   Reply With Quote
Old 08-09-2019, 08:24 AM   #24
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,591
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Now after all that, is when I'll mention that I feel that using bs4/lxml parsing/serializing for simple changes/additions to an epub's metadata is considerable overkill. Like using a scalpel to peel an orange. Unless I'm planning on writing a plugin that grants a user considerable autonomy over making complex metadata edits, I'm using a quick regex to make the change I need and moving on. But to each their own.
DiapDealer is offline   Reply With Quote
Old 08-09-2019, 08:42 AM   #25
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Well, i think i've a good knowledge in regex, but not in python nor in BS and this was a good chance to learn it.

And you never know how complex this plugin will be in 2 years

I've some ideas, but the constraint is time
Vroni is offline   Reply With Quote
Old 08-09-2019, 09:42 AM   #26
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,591
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Hey, I'm all for learning. I don't want to discourage anyone from broadening their knowledge.
DiapDealer is offline   Reply With Quote
Old 08-10-2019, 01:18 AM   #27
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
In addition, i remember thinking about to do it with regex as i started realizing the idea, but than i realized that attributes can be in arbitrary order such as

Code:
<meta name="xyz" content="123">
Code:
<meta content="123" name="xyz">
This would have required more coding, checking the first variant and if not found, try to find the second variant to see if its present.
Vroni is offline   Reply With Quote
Old 08-10-2019, 09:43 AM   #28
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,591
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
You would only have to do two checks if you didn't know what you were looking for. Otherwise, you simply search for what matters ... regardless of position.

If a meta tag with the name "xyz" is what you need to find, then you use a regex that doesn't care in what order the name attribute appears:

Code:
<meta[^>]*(?=name=\"xyz\")[^>]*>
If it's the content attribute that you're looking to match, then its:
Code:
<meta[^>]*(?=content=\"123\")[^>]*>
Not trying to discourage you from using bs4/lxml for pure xml, just trying to point out that unless you've got a lot of complicated metadata editing to do, a simple find and replace could turn out to be much simpler and use less lines of code.
DiapDealer is offline   Reply With Quote
Old 08-10-2019, 11:47 AM   #29
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
I'm loking for calibre:series and need the content. Interested in your approach
Vroni is offline   Reply With Quote
Old 08-10-2019, 02:53 PM   #30
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Vroni View Post
I'm loking for calibre:series and need the content. Interested in your approach
You might find KevinH's ePub3-itizer plugin helpful. It contains code to convert custom Calibre metadata entries to EPUB3 metadata entries. (Have a look at _convertOpf() in opf_converter.py.)
Doitsu is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Please, give us back old metadata tab! semsaudade Sigil 26 05-19-2017 03:58 AM
metadata.db library back up obihal Library Management 2 06-05-2015 03:04 PM
iPad [Marvin] editing metadata and syncing back tsolignani Apple Devices 3 02-15-2013 11:56 AM
back cover of paperback - metadata ? cybmole Calibre 0 05-11-2011 03:43 PM
Free Book (Kindle) - Putting the Public Back in Public Relations koland Deals and Resources (No Self-Promotion or Affiliate Links) 0 12-27-2010 09:28 AM


All times are GMT -4. The time now is 10:48 PM.


MobileRead.com is a privately owned, operated and funded community.