MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   Update of Cleaning content.opf Plugim (https://www.mobileread.com/forums/showthread.php?t=328174)

Thasaidon 03-18-2020 11:47 PM

Update of Cleaning content.opf Plugim
 
Would anyone be interested in recreating the Cleaning content.opf plugin?

I would do it myself but I do not have the requisite coding knowledge.

A numer of people have expressed an interest in using the the plugin but the author is no longer active on Mpbileread.

DiapDealer says in the ClleanOpf thread

Quote:

Support for Python2.7-only plugins was removed in Sigil v0.9.7.

Unfortunately, this plugin developer is no longer active to update the plugin to be compatible with Python 3.4+.

I'm going to close this thread, since the plugin won't work with the latest versions of Sigil, but the plugin is still downloadable for anyone who wants to try and upgrade it to work with Python 3 for their own personal use (or continue to use it with older versions of Sigil). With no licensing info to go on, I can't in good conscience allow another developer to "take over" the plugin.
So the plugin would have to be a recreate/rewrite..

KevinH 03-19-2020 12:27 AM

Since I have never used the plugin, I have no idea what an opf cleaner plugin is used for.

So if you can describe exactly what it should do, I would be happy to take a shot at it from scratch so we are not violating any licensing here. But please be as specific as you can exactly what you want removed from the opf and why.

KevinH

BeckyEbook 03-19-2020 10:53 AM

Functional description of the CleanOPF plugin (it can be easily fixed for Python 3, but it is useless for me):

1. At the beginning, it asks ("Insert series elements?") whether we want to add the calibre series in the metadata and (after answer "Yes") adds two lines in the content.opf file:

Code:

<meta content="" name="calibre:series"/>
<meta content="" name="calibre:series_index"/>

2. Checks for ISBN and ASIN identifiers and remembers them.

3. Inserts a new UUID identifier.

4. Removes existing entries from the metadata (leaves others unchanged):
Code:

<dc:identifier.*>
<dc:contributor.*calibre.*>
<dc:type.*>
<dc:rights.*>
<dc:date.*>
<dc:publisher.*>
<dc:genre.*>
<dc:subject.*>

5. Inserts the stored ISBN and ASIN identifiers.

7. If it detects that there are calibre series, it also restores them.

6. Replaces MIME type for fonts:
Code:

ttf --> application/x-font-ttf
otf --> application/vnd.ms-opentype
ttc --> application/x-font-truetype-collection
woff --> application/font-woff

I have no idea which of these features is valuable. :chinscratch:

The calibre series can be inserted through the Metadata Editor or even through Clips.

It's possible that it's about generating a new UUID, but there are at least two other plugins that can do this.

Is deleting metadata? But deleting publisher data or rights information seems strange to me. :eek:

MIME type for fonts inserted by default by Sigil:
Code:

ttf --> font/ttf
otf --> font/otf
ttc --> font/collection
woff --> font/woff


KevinH 03-19-2020 11:17 AM

I really see nothing truly useful here either. So exactly why do people want this plugin?

DiapDealer 03-19-2020 11:17 AM

Where/who are the "number of people" who expressed an interest in using this plugin?

BetterRed 03-19-2020 04:34 PM

FWIW - I never expressed an interest as such, but I did install it out of curiosity. Then I discovered it didn't seem to do anything I couldn't do with the Metadata edit tool, so I deleted it.

BR

JSWolf 03-19-2020 04:38 PM

What an odd plugin. I'd like to ask anyone that uses it, why do you use it?

Thasaidon 03-19-2020 10:49 PM

Quote:

Originally Posted by DiapDealer (Post 3965719)
Where/who are the "number of people" who expressed an interest in using this plugin?

Have a look at the end of the thread about this plugin which the PlugIn index points to.

I was interested in the plugin because I thought it may automatically fix at least some errors in the OPF. Unfortunately this does not appear to be the case so it is no longer of any interest,

KevinH 03-19-2020 11:32 PM

The only fixing part is to fix incorrect or outdated font mimetypes but that can be done in other ways since that plugin is now using old mimetypes that have since been deprecated.

Perhaps a plugin that can audit or correct the opf mimetype (or even updates them to current values) might prove useful.

We can map all file extensions except for .xml to a specific mimetype and we could try to verify each mimetype matches roughly the file contents but that depends in large part on magic byte strings being identifiable in each binary type file.

Is that what you are looking for?

HaPeSchu 03-20-2020 05:04 AM

Hmmm i'm using it and i adopted it to python 3 as mostly the print functionality was affected. I dont want to have all this crap in there espacially calibre is inserting as i'm not using calibre.
They do not hurtm, thats true, but they are useless from my point of view and it males editing the remaining entries easier to maintain.

copy right information is useless as well. In most cases these are presnt in the imprint page. I dont know any reader which is extracting more than title, author, cover, sometime series from it. So why should i keep them? subject in most cases is highhly based on individuals and in 100% useless for me.

And it does each of these things with a single click.

I ghuess i changed something else so it only queries for series if these are not present, dont know if and when i did that. And its adjusting the media types for fonts so flightcrew and epubcheck are not complaining if they dio have the wrong settings. But well, a feature with only little value for me as i'm deleting in most cases all fonts espacially when it comes with dejavu or linlibertine as thea do not have any value.

At least, this plugin has the the problem of doing something very specifik and is not really customizable.

i had the idea to have GUI to select subjects which are my favourites, but at least i didnt had the time and i noticed that only mantano is making use of that, but not the tolino readers.

Ah just had a look, iÄm delting nby regex all html entities from the description as neither mantano nor tolino can handle those. A plain text is good for me.

Summing up: this is very specific and not really customizable. For me its a one click solution doing a rough clean up how I would like to have my ebooks.

Doitsu 03-20-2020 04:51 PM

Quote:

Originally Posted by Thasaidon (Post 3966044)
I was interested in the plugin because I thought it may automatically fix at least some errors in the OPF.

What specific problems would you like to be automatically fixed?

The only problem that I occasionally encounter is:

Code:

<metadata>
instead of:

Code:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">

Thasaidon 03-21-2020 09:35 AM

Quote:

Originally Posted by Doitsu (Post 3966348)
What specific problems would you like to be automatically fixed?

The only problem that I occasionally encounter is:

Code:

<metadata>
instead of:

Code:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">

Over the years I have become familiar enough with Html and CSS so I can handle most problems in the text and CSS files

Thankfully I only occasionally get problems with the OPF when I run ePubCheck.

A couple of simple problems are when there are lines relating to an embedded font that is no longer embedded. There is also part of a line (usually first one) that relates to Calibre, which unfortunately ePubCheck does not like. These are problems are easy sorted as I can manually delete them

Unfortunately if I hit ePubCheck errors in the OPF, I am insufficiently familiar with what should be in the OPF, that I cannot identify what is specifically wrong.

I cannot remember what these other problems are specifically. My old computer died and I have been offline for a couple of weeks and have been taking a holiday from editing ePubs for a few weeks longer.

In these cases I cannot identify the actual problem and end up doing an ePub to ePub conversion in Calibre, which usually fixes things.

I would like to avoid this which is why I was interested in the plugin

If it would help I will document any future problems.

Alternatively it would acceptable if you added a magic button to Sigil which automatically fixes all errors in the ePub .:rofl:

Doitsu 03-21-2020 09:51 AM

Quote:

Originally Posted by Thasaidon (Post 3966556)
A couple of simple problems are when there are lines relating to an embedded font that is no longer embedded.

IMHO, it'd be relatively easy to remove manifest entries for fonts (and other files) that have been deleted. IIRC, Sigil will exclude unmanifested items, but won't complain about manifest items that are no longer in the book.

Quote:

Originally Posted by Thasaidon (Post 3966556)
There is also part of a line (usually first one) that relates to Calibre, which unfortunately ePubCheck does not like.

AFAIK, Calibre uses custom metadata entries, but they're usually properly encoded, if the book was last edited in Calibre. Can you give me an example?

Quote:

Originally Posted by Thasaidon (Post 3966556)
I would like to avoid this which is why I was interested in the plugin.

If you provide specific examples, it might be possible to fix them with a plugin.

Quote:

Originally Posted by Thasaidon (Post 3966556)
Alternatively it would be acceptable if you added a magic button to Sigil which automatically fixes all errors in the ePub .:rofl:

It doesn't exist in Sigil, however, Calibre Editor has a Try to correct all fixable errors automatically button, but it only fixes certain types of errors.

Thasaidon 03-24-2020 06:25 AM

Quote:

Originally Posted by Doitsu (Post 3966558)
It doesn't exist in Sigil, however, Calibre Editor has a Try to correct all fixable errors automatically button, but it only fixes certain types of errors.

Yes I use it regularly but that is not the kind of "magic button" I was joking about.

Anyway I have been able to find more details about the errors I was talking about

the string "prefix="calibre: https://calibre-ebook.com" is sometimes included in row one of the OPF and ePubcheck throws it up as an error

ePub check also throws up errors if it finds the following in the OPF

<manifest>
<item id="id4" href="Fonts/LiberationSerif-Bold.ttf" media-type="application/octet-stream"/>
<item id="id3" href="Fonts/LiberationSerif-BoldItalic.ttf" media-type="application/octet-stream"/>
<item id="id2" href="Fonts/LiberationSerif-Italic.ttf" media-type="application/octet-stream"/>
<item id="id1" href="Fonts/LiberationSerif-Regular.ttf" media-type="application/octet-stream"/>

</manifest>

As I said earlier these two problems can be easily solved with a simple manual deletion.

The other problems I have come across are rarer and I was not been able to work out a manual fix.

If you are interested I will be starting working on my books again in a few days and can post details of these rarer errors here when I find them.

Murphy ruling the universe though, will probably mean it may take some time to find some.

DiapDealer 03-24-2020 07:36 AM

In my experience, all of calibre's various attributes are properly formatted and/or namespaced and thus valid accoording to epub specs (and epubcheck compliant). If Epubcheck complains about any of them, I suspect it's because a piece that made them valid was manually removed.


All times are GMT -4. The time now is 08:44 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.