Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 04-02-2014, 02:36 PM   #586
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by eschwartz View Post
Final regex:
Code:
<span[^<>]*>((?:(?!<(?:span|/span)>).)*)</span>
Doesn't work if the inner span has any attributes.
Rev. Bob is offline   Reply With Quote
Old 04-02-2014, 02:42 PM   #587
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by Rev. Bob View Post
Doesn't work if the inner span has any attributes.
I just noticed -- and fixed -- that.

Just don't close the span tag (in the negative lookahead), we don't need that to match anyway.
eschwartz is offline   Reply With Quote
Advert
Old 04-02-2014, 02:47 PM   #588
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
One thing I did for a Sigil plugin (written in Python), that would delete empty spans, was to cycle through storing all the tags and text, giving each a number, and a pir number (initialised at 0), then run through and when it came to an end span, work backward to find the previous unpaired span, then continuing on through to end of file.

Then remove all the empty spans and their end tags, then rewrite the file back in order.

You could do something similar to remove *your* desired spans.

the plugin was in the post here if you want to see what I did. [And see better than how it's worded here ]
Perkin is offline   Reply With Quote
Old 04-02-2014, 02:52 PM   #589
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by eschwartz View Post
I just noticed -- and fixed -- that.

Just don't close the span tag (in the negative lookahead), we don't need that to match anyway.
The problem is, your regex still isn't doing what it would need to do. The challenge is to remove a specific set of SPANs, namely those matching the form
Code:
<span class="koboSpan" id="koboxxx">content</span>
...regardless of whether the content itself has SPANs, and allowing for the specific ID value to vary.
Rev. Bob is offline   Reply With Quote
Old 04-02-2014, 02:56 PM   #590
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by Rev. Bob View Post
The problem is, your regex still isn't doing what it would need to do. The challenge is to remove a specific set of SPANs, namely those matching the form
Code:
<span class="koboSpan" id="kobo.2.1">content</span>
...regardless of whether the content itself has SPANs.
Since I don't have a Kobo, I was merely suggesting general methods of hunting spans.

Quote:
Originally Posted by eschwartz View Post
Final regex:
Code:
<span[^<>]*>((?:(?!<(?:span|/span)).)*)</span>
Can be edited to fill in any required classes/ids with ease :

Code:
<span{ kobo-specific info goes here}[^<>]*>((?:(?!<(?:span|/span)).)*)</span>

Last edited by eschwartz; 04-02-2014 at 03:06 PM. Reason: (classes) or ids
eschwartz is offline   Reply With Quote
Advert
Old 04-02-2014, 03:04 PM   #591
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,166
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
Actually Rev. Bob, the Kobo span's (at least in the books I examined) were of the form
<span id="kobo.114.1">...</span>, so my approach to looking for those was
Code:
<span                ## look for <span
[^>]+?               ## then one or more character that is not the closing >
                     ## (there will always be at least one; the space after
                     ## the <span )
id="kobo             ## the start of the Kobo id assigned to this span
[\d.]+               ## the numeric and dotted span number
[^>]+?               ## then everything up to the the >
                     ## (always at least one; the " after the id)
>                    ## the closing symbol of the <span tag
which in theory should handle both the format I found and your example.

Last edited by PeterT; 04-02-2014 at 03:07 PM.
PeterT is offline   Reply With Quote
Old 04-02-2014, 03:14 PM   #592
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
@PeterT, how does my regex do?
eschwartz is offline   Reply With Quote
Old 04-02-2014, 03:28 PM   #593
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,166
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
When I tried it in calibre ebook-edit it failed On the example that has the nested spans, it fails the same as all the other attempts I have tried. I'm going to have to bite the bullet and modify this plugin and try it there.

-----

Retried with
Code:
<span[^<>]*>((?:(?!<(?:span|/span)).)*)</span>
This matches the 1st and 3rd span's I had above
Code:
<span id="kobo.114.1">I don’t go in for ‘lawn maintenance’, though — all that weeding and feeding.</span>
and
<span id="kobo.114.3"> As Vita Sackville-West said, ‘A weed is only a plant in the wrong place.’ To which we should add: ‘or one for which we haven’t yet discovered the use’.</span>
however on the 2nd span
Code:
<span id="kobo.114.2"> I prefer my ‘weeds’: the clover, which keeps the grass naturally green with its nitrogen-fixing nodules; the daisy, opening and closing each day (its name comes from the Old English <em>daeges <span class="ent1">ē</span>age</em>, meaning ‘the day’s eye’); the little blue-purple <em>Prunella</em>, known as ‘self-heal’, used to treat sore throats, mouth <a id="page_184"></a>ulcers and open wounds — and still used in modern herbal medicine as an astringent for external or internal wounds.</span>
just matches the interior span
Code:
<span class="ent1">ē</span>
Again; this testing is just being done within the confines of ebook-edit

Last edited by PeterT; 04-02-2014 at 03:36 PM.
PeterT is offline   Reply With Quote
Old 04-02-2014, 03:39 PM   #594
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by PeterT View Post
When I tried it in calibre ebook-edit it failed On the example that has the nested spans, it fails the same as all the other attempts I have tried. I'm going to have to bite the bullet and modify this plugin and try it there.

-----

Retried with
Code:
<span[^<>]*>((?:(?!<(?:span|/span)).)*)</span>
This matches the 1st and 3rd span's I had above
Code:
<span id="kobo.114.1">I don’t go in for ‘lawn maintenance’, though — all that weeding and feeding.</span>
and
<span id="kobo.114.3"> As Vita Sackville-West said, ‘A weed is only a plant in the wrong place.’ To which we should add: ‘or one for which we haven’t yet discovered the use’.</span>
however on the 2nd span
Code:
<span id="kobo.114.2"> I prefer my ‘weeds’: the clover, which keeps the grass naturally green with its nitrogen-fixing nodules; the daisy, opening and closing each day (its name comes from the Old English <em>daeges <span class="ent1">ē</span>age</em>, meaning ‘the day’s eye’); the little blue-purple <em>Prunella</em>, known as ‘self-heal’, used to treat sore throats, mouth <a id="page_184"></a>ulcers and open wounds — and still used in modern herbal medicine as an astringent for external or internal wounds.</span>
just matches the interior span
Code:
<span class="ent1">ē</span>
Again; this testing is just being done within the confines of ebook-edit
Well, I didn't expect it to match the outer span, since we specifically excluded those (to prevent span mismatches). But since we delete the inner span, the next pass should pick it up, right?

Then we just keep passing over the book till no results are found.
eschwartz is offline   Reply With Quote
Old 04-02-2014, 03:53 PM   #595
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,166
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
ugh.. correct!

Of course, the issue is now that you actually don't want to remove those inner spans for the original requirements; only the Kobo added ones that exist to provide bookmarking support within the ACCESS engine.
PeterT is offline   Reply With Quote
Old 04-02-2014, 03:55 PM   #596
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by Perkin View Post
the plugin was in the post here if you want to see what I did. [And see better than how it's worded here ]
If I grok that script correctly, it looks like I can expand it to remove the Kobo spans just by changing this line (23):

Code:
if entity == '<span>':
to this:

Code:
if entity == '<span>' or entity[:17] == '<span class="kobo' or entity[:15] == '<span id="kobo.':
In other words, match no-attribute spans, spans where the first attribute is a class beginning with 'kobo', and spans where the first attribute is an ID beginning with 'kobo.' - yes?

If that'll do the trick, it should match the Kobo books I've seen (class first), those PeterT's got (id first), and it'll strip out empty spans in the bargain. Not bad for one changed line...

ETA: Never mind; the further enhancement I had in mind wouldn't work. Ignore this ETA line.

Last edited by Rev. Bob; 04-02-2014 at 04:01 PM.
Rev. Bob is offline   Reply With Quote
Old 04-02-2014, 04:20 PM   #597
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by PeterT View Post
ugh.. correct!

Of course, the issue is now that you actually don't want to remove those inner spans for the original requirements; only the Kobo added ones that exist to provide bookmarking support within the ACCESS engine.

How is this?

Code:
<span{ kobo-specific info goes here}[^<>]*>((?:(?!<(?:span|/span){ kobo-specific info goes here}).)*)</span>
eschwartz is offline   Reply With Quote
Old 04-02-2014, 07:17 PM   #598
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Quote:
Originally Posted by Rev. Bob View Post
If I grok that script correctly, it looks like I can expand it to remove the Kobo spans just by changing this line (23):

Code:
if entity == '<span>':
to this:

Code:
if entity == '<span>' or entity[:17] == '<span class="kobo' or entity[:15] == '<span id="kobo.':
In other words, match no-attribute spans, spans where the first attribute is a class beginning with 'kobo', and spans where the first attribute is an ID beginning with 'kobo.' - yes?

If that'll do the trick, it should match the Kobo books I've seen (class first), those PeterT's got (id first), and it'll strip out empty spans in the bargain. Not bad for one changed line...

ETA: Never mind; the further enhancement I had in mind wouldn't work. Ignore this ETA line.
Yep, that should do it.

IIRC - When doing it originally (ages ago) I was going to add a few other cases that may have been useful, but left it as it was as it worked as needed at the time.
Perkin is offline   Reply With Quote
Old 04-02-2014, 07:30 PM   #599
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by Perkin View Post
Yep, that should do it.

IIRC - When doing it originally (ages ago) I was going to add a few other cases that may have been useful, but left it as it was as it worked as needed at the time.
Now I just have to figure out how to make it work with Calibre instead of the now-discontinued Sigil...
Rev. Bob is offline   Reply With Quote
Old 04-03-2014, 12:21 PM   #600
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
All I can think of at present (not in a programming frame of mind at moment) is...

Either look through and see if you can 'piggy-back' one of this (or other) plug-in functions, or add your own function, or do the same to one of the editor functions, or wait for Kovid to add PI functionality into editor.

I suppose it will need to be a function that receives the raw html file(s) that need to be worked on, unless you can adapt it yourself.

I haven't done much programming over the last 6+ months, switched over to a mac and haven't got into coding on it yet, and am not currently 'with it'.

Hope this gives you an idea, and it's work/do-able.
Perkin is offline   Reply With Quote
Reply

Tags
modify epub


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1184 04-17-2024 06:17 PM
[GUI Plugin] Open With kiwidude Plugins 403 04-01-2024 08:39 AM
[GUI Plugin] Manage Series kiwidude Plugins 166 02-13-2024 11:31 AM
Modify ePub plugin dev thread kiwidude Development 346 09-02-2013 05:14 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 11:30 PM.


MobileRead.com is a privately owned, operated and funded community.