Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 07-18-2017, 02:48 AM   #151
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
I was working on a book with some HTML comments and ran across this with Smarten Punctuation:

Before:

<!--?xml version="1.0" encoding="UTF-8"?-->

After:

<!--?xml version=“1.0” encoding=“UTF-8”?-->

Perhaps smartening should completely ignore HTML Comments? There is the possibility that comments could be designed in very specific ways (requiring unsmartened quotes).
Tex2002ans is offline   Reply With Quote
Old 07-18-2017, 12:00 PM   #152
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by Tex2002ans View Post
I was working on a book with some HTML comments and ran across this with Smarten Punctuation:

Before:

<!--?xml version="1.0" encoding="UTF-8"?-->

After:

<!--?xml version=“1.0” encoding=“UTF-8”?-->

Perhaps smartening should completely ignore HTML Comments? There is the possibility that comments could be designed in very specific ways (requiring unsmartened quotes).
While I agree with you in general, I feel obliged to point out that the case you cite is an oddball. A proper XML declaration - which would require dumb quotes - doesn't use the HTML comment markers. It looks like, in your case, someone manually commented out the declaration for some reason. In short, the change looks weird, but it doesn't actually affect anything.
Rev. Bob is offline   Reply With Quote
Advert
Old 07-18-2017, 03:58 PM   #153
icallaci
Guru
icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.
 
Posts: 769
Karma: 6528026
Join Date: Sep 2012
Device: Kobo Elipsa
Quote:
Originally Posted by Rev. Bob View Post
While I agree with you in general, I feel obliged to point out that the case you cite is an oddball. A proper XML declaration - which would require dumb quotes - doesn't use the HTML comment markers. It looks like, in your case, someone manually commented out the declaration for some reason. In short, the change looks weird, but it doesn't actually affect anything.
I'm not sure what's causing this (Sigil, maybe?) but I am seeing this same commented-out declaration in a lot of ebooks lately.
icallaci is offline   Reply With Quote
Old 07-18-2017, 04:52 PM   #154
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by icallaci View Post
I'm not sure what's causing this (Sigil, maybe?) but I am seeing this same commented-out declaration in a lot of ebooks lately.
I wouldn't know. Have you noticed any common factor to those books - publisher, bookstore, metadata, anything like that?
Rev. Bob is offline   Reply With Quote
Old 07-18-2017, 05:19 PM   #155
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by icallaci View Post
I'm not sure what's causing this (Sigil, maybe?) but I am seeing this same commented-out declaration in a lot of ebooks lately.
I've been meaning to track down where the commented-out declaration comes from. I'm pretty sure it's Sigil (or Sigil + Sigil plugins) and/or the Gumbo parser that makes it happen under certain conditions. I just haven't sat down and figured out the particulars.

As far as Smarten Punctuation goes, I assumed the SmartyPants algorithm could handle html comments. Perhaps I was wrong. But if it (or my plugin) can be easily tweaked to accommodate the issue, I'll certainly try to incorporate a fix into a new release when I get a chance.
DiapDealer is offline   Reply With Quote
Advert
Old 07-18-2017, 06:47 PM   #156
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DiapDealer View Post
I've been meaning to track down where the commented-out declaration comes from. I'm pretty sure it's Sigil (or Sigil + Sigil plugins) and/or the Gumbo parser that makes it happen under certain conditions. I just haven't sat down and figured out the particulars.
I get the commented-out XML declaration in EPUBs output straight from Finereader. Haven't really noticed it before yesterday.... maybe because previous versions of Sigil auto-cleaned it upon opening or mending?

I'll definitely keep an eye out now that I noticed it.

Quote:
Originally Posted by DiapDealer View Post
As far as Smarten Punctuation goes, I assumed the SmartyPants algorithm could handle html comments. Perhaps I was wrong. But if it (or my plugin) can be easily tweaked to accommodate the issue, I'll certainly try to incorporate a fix into a new release when I get a chance.


Quote:
Originally Posted by Rev. Bob View Post
While I agree with you in general, I feel obliged to point out that the case you cite is an oddball. A proper XML declaration - which would require dumb quotes - doesn't use the HTML comment markers. It looks like, in your case, someone manually commented out the declaration for some reason. In short, the change looks weird, but it doesn't actually affect anything.
Yeah, I just posted the example that made me notice the issue. I probably could have concocted a similar issue where dumb quotes would have been important. I sometimes use HTML comments to future-proof Math Formulas by leaving the TeX code there:

Before:

Code:
<-- f'(x) = x + y -->
<div class="formula"><img src="Formula1.png"/></div>
After:

Code:
<!--f’(x) = x + y-->
<div class="formula"><img src="Formula1.png"/></div>
In the future, that can be used to easily generate the formula again or be converted to MathML, etc. etc. But smartening the dumb quote would break the equation when fed back into TeX (TeX turns a dumb quote into a prime while in TeX's math mode).

Last edited by Tex2002ans; 07-18-2017 at 06:54 PM.
Tex2002ans is offline   Reply With Quote
Old 07-18-2017, 07:57 PM   #157
icallaci
Guru
icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.icallaci ought to be getting tired of karma fortunes by now.
 
Posts: 769
Karma: 6528026
Join Date: Sep 2012
Device: Kobo Elipsa
Quote:
Originally Posted by Rev. Bob View Post
I wouldn't know. Have you noticed any common factor to those books - publisher, bookstore, metadata, anything like that?
I haven't paid much attention to it, but my impression (for what it's worth) is that it may be related to epub3 books converted to epub2 and opened in Sigil. I usually search and replace it out of existence and it has never come back, so I haven't spent a lot of time trying to track it down. And my hunch about where it's coming from may be way off base. Just thought I'd mention I've been seeing it a lot lately.
icallaci is offline   Reply With Quote
Old 07-19-2017, 01:25 AM   #158
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Quote:
Originally Posted by Tex2002ans View Post
I sometimes use HTML comments to future-proof Math Formulas by leaving the TeX code there
How about embedding it as img alt?
Code:
<div class="formula"><img alt="f'(x) = x + y" src="Formula1.png"/></div>
And it actually makes good alt text.
AlanHK is offline   Reply With Quote
Old 07-19-2017, 08:46 AM   #159
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by AlanHK View Post
How about embedding it as img alt?
Code:
<div class="formula"><img alt="f'(x) = x + y" src="Formula1.png"/></div>
And it actually makes good alt text.
That was my solution for a while (see my Post back in 2014 talking about Word formulas -> EPUB)...

but then then there were multiple problems that cropped up:
  • Formulas which used characters that are invalid in an alt tag
    • Like a dumb double quote: ".
  • It would be read aloud by Text-to-Speech (TTS).
    • For the vast majority of TTS, you would hear a bunch of gibberish being spouted out.
      • See one of my "Complex Examples" in my 2014 post:
      • alt="\frac{\displaystyle\frac{\text{widgets} / \text{elapsed-time}}{[\text{widgets} \times (\text{elapsed-time})^{(\alpha-1)}] \cdot [\text{labor-hours} / \text{elapsed-time}]^\phi}}{[\text{labor-hours}]^{\alpha\theta} \cdot [\text{labor-effort}]^{\alpha(1-\theta)}}"
    • It would be fantastic if TTS would recognize TeX and be able to read the Math properly... but I currently don't know of any that could (and this reader would be in the extreme minority).
    • This is one of the reasons for the push to MathML.
      • MathML is split into Presentational/Content Markup... since symbols can mean different things in different contexts. It would allow the TTS to read an equation like a human would in a classroom.

Anyway, I don't want to clutter up this topic with formulas or MathML discussion. We can take this to PM, or discuss it in other topics (like my Tutorial: Formulas to PNG). I am more than happy to spitball ideas + discuss any of that stuff at any time.

All I know is that comments should be ignored by smarteners! Sometimes there is smart stuff going on that needs to remain dumb!

Last edited by Tex2002ans; 07-19-2017 at 08:48 AM.
Tex2002ans is offline   Reply With Quote
Old 07-19-2017, 10:13 AM   #160
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Making SmartyPants ignore this particular comment is pretty trivial. I just basically have to "get out of its way" with my plugin code. There are however, situations where there's html code contained within comments that SmartyPants will barf hard on regardless. Fixing that might not be trivial, but I'd rather work on a solution that ignores ALL html comments entirely rather than slapping something together that only fixes this specific comment.
DiapDealer is offline   Reply With Quote
Old 07-19-2017, 03:44 PM   #161
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by DiapDealer View Post
There are however, situations where there's html code contained within comments that SmartyPants will barf hard on regardless. Fixing that might not be trivial, but I'd rather work on a solution that ignores ALL html comments entirely rather than slapping something together that only fixes this specific comment.
In theory, ignoring HTML comments should be simple. If "<!--" is found, scan ahead until "-->" or EOF is found, and ignore everything in between: HTML code, equations, scripts, whatever. Of course, depending on how the algorithm in this instance is designed, "scan ahead" may not be feasible.

Maybe a preprocessing pass that removes comments to store them in a list somewhere, then a post pass that puts them back? (I'm really just spitballing here, having not looked at the code.)
Rev. Bob is offline   Reply With Quote
Old 07-19-2017, 04:04 PM   #162
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Rev. Bob View Post
In theory, ignoring HTML comments should be simple. If "<!--" is found, scan ahead until "-->" or EOF is found, and ignore everything in between: HTML code, equations, scripts, whatever. Of course, depending on how the algorithm in this instance is designed, "scan ahead" may not be feasible.

Maybe a preprocessing pass that removes comments to store them in a list somewhere, then a post pass that puts them back? (I'm really just spitballing here, having not looked at the code.)
Yes, in theory, it's quite simple. I'm somewhat limited by the fact that I'm using the SmartyPants algorithm which has its own html tokenizer routine that doesn't lend itself well to the "scan ahead" technique.

Luckily the fork of Python SmartyPants on PyPi, has a robust solution for handling html comments that I'm going to incorporate. Should have an updated version of the plugin very soon.

Last edited by DiapDealer; 07-19-2017 at 04:06 PM.
DiapDealer is offline   Reply With Quote
Old 07-19-2017, 04:11 PM   #163
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
In fact, if someone wants to give this test version a whirl and verify that it completely ignores html comments (without breaking something else, hopefully), I'd appreciate it!

EDIT: test attachment removed. New version is available on the first page.

Last edited by DiapDealer; 07-19-2017 at 09:43 PM.
DiapDealer is offline   Reply With Quote
Old 07-19-2017, 04:53 PM   #164
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Is there a way to get the div and span processing to work with multiple HTML/XHTML files? If not, can such a function be written in?
JSWolf is online now   Reply With Quote
Old 07-19-2017, 05:17 PM   #165
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by JSWolf View Post
Is there a way to get the div and span processing to work with multiple HTML/XHTML files? If not, can such a function be written in?
The default is to only modify the current (x)html file. If you want to process all of the (x)html files in the ebook, you need to uncheck the "Smarten current file only" check box that's available by clicking the down arrow on the right side of the plugin's icon in the toolbar. It will remember this setting from session to session. "All" or "Current" are the only choices.

You may have to add the plugin to the toolbar (editor preferences) if it's not already there.

It's the same for all three tools provided by this Editor plugin.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sample Plugin for the Editor DiapDealer Editor 77 12-10-2014 07:16 AM
Diaps Editing Toolbag.zip Index Error phossler Editor 2 10-01-2014 08:05 PM
Editor plugin question DiapDealer Development 2 07-28-2014 10:23 PM
japi - a text editor capable of editing ePub directly hekkel ePub 5 02-20-2009 08:46 AM


All times are GMT -4. The time now is 06:52 AM.


MobileRead.com is a privately owned, operated and funded community.