Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 11-08-2018, 02:10 PM   #181
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 183
Karma: 266070
Join Date: Dec 2010
Device: Win7,Win10,Lubuntu,smartphone
Thousands of thanks!
Proceeding to download and try--I am grateful for the sheer notion of this plugin.
carmenchu is offline   Reply With Quote
Old 11-08-2018, 05:01 PM   #182
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 183
Karma: 266070
Join Date: Dec 2010
Device: Win7,Win10,Lubuntu,smartphone
A couple snags

Installed and tested, with a couple of snags:
1. First run produced the following error message:

Code:
Incorrect XHTML: OEBPS/Text/Chap_09.htm Line/Col 17,30 @16:30: Tokenizer error with an unimplemented error message.
Incorrect XHTML: OEBPS/Text/Chap_21.htm Line/Col 19,30 @18:30: Tokenizer error with an unimplemented error message.
As both those lines contained images with src="../Images/bar20days1-ill1.jpg and src="../Images/drive-c.jpg", at first I thought the problem were the hyphens--deleted or corrected to '_', but as the imaages bar-20-daysIll.jpeg and bar20days0-ill0b.jpg had given no trouble, looking further I found common to the troublesome lines
Quote:
alt=""
--also corrected, and personally strongly suspected as the villain (Soup of the evening...)

2. The text contains
Quote:
s-p-a-t!
, and, not knowing better, I added to my KeepHyphen.txt just
Quote:
s-p-a-t!
: reslt, a report of
Quote:
Hyphen removed at
(restored by hand).

Thus, two questions:
1. How to enter in KeepHyphen.txt multi-expressions like s-p-a-t? Like this
s-p
p-a
a-t
or is there some shortcut?
2. What about plurals? To be on the safe side, I entered both
cow-puncher
cow-punchers
but I have a hazy notion that it may be redundant--i.e., that a search for the first may sometimes include the second...?
Thanks!
carmenchu is offline   Reply With Quote
Advert
Old 11-09-2018, 01:38 PM   #183
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
@carmenchu: Thank you for alerting me to the error in my plugin.

I have made a correction and placed the updated plugin in the first post in this thread.
CalibUser is offline   Reply With Quote
Old 11-10-2018, 06:37 AM   #184
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 183
Karma: 266070
Join Date: Dec 2010
Device: Win7,Win10,Lubuntu,smartphone
More about hyphens

Thanks! No more issues with alt=""...
Now, I have another suggestion about hypens: in some books one finds several instances of words spelled out (maybe for added emphasis) like "r-a-t-s", or stuttered, like "p-p-please".
The Plugin will check "r-a-t-s"
Code:
HyphenRemoved=m.group(1)+m.group(2)
find "at" in Hunspell english dictionary, and return r-at-s, unless "a-t" in KeepHyphens.txt.
Now, I never found an instance of a publication with a hyphen either after or before a single letter, like w-ord or wor-d--where it grammatically sound, which I doubt, there seems to be some styling rule against it.
Thus, my suggestion: is it possble to check those m.group() for number of characters, and keep the hyphen if either is a single char?
I don't know python, is it difficult to do?
However, it makes sense to me--and, by the way, it would take care of the "I-I" special case...
Thanks again!
carmenchu is offline   Reply With Quote
Old 11-10-2018, 02:08 PM   #185
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
@carmenchu: Following your suggestion, the plugin will not remove hyphens between words where one of the words is a single character.

The updated plugin in the first post in this thread.
CalibUser is offline   Reply With Quote
Advert
Old 11-10-2018, 03:57 PM   #186
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 183
Karma: 266070
Join Date: Dec 2010
Device: Win7,Win10,Lubuntu,smartphone
Thanks again!

I only hope I'm not being a bore...
carmenchu is offline   Reply With Quote
Old 11-11-2018, 05:14 AM   #187
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 183
Karma: 266070
Join Date: Dec 2010
Device: Win7,Win10,Lubuntu,smartphone
Quote:
Originally Posted by carmenchu View Post
I only hope I'm not being a bore...
The attached image is the icon I have made for showing this plugin in Sigil's toolbar.
It's not a work of art, but anything more sophisticated didn't show well in the little button--this at least can be identified at a glance among my other installed plugins.
For what it's worth...
Attached Images
 

Last edited by carmenchu; 11-11-2018 at 05:15 AM. Reason: correct word
carmenchu is offline   Reply With Quote
Old 11-11-2018, 05:17 AM   #188
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
Quote:
Originally Posted by carmenchu View Post
I only hope I'm not being a bore...
Not at all.
CalibUser is offline   Reply With Quote
Old 11-11-2018, 05:22 AM   #189
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
Quote:
Originally Posted by carmenchu View Post
The attached image is the icon I have made for showing this plugin in Sigil's toolbar.
Thanks.
CalibUser is offline   Reply With Quote
Old 01-09-2020, 03:34 PM   #190
xli199
Junior Member
xli199 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2020
Device: samsung s2
ePub Tidy Tool v3

Hi, sorry to ask about ePub Tidy Tool plugin here - the original thread is closed.

Today I used the "chapter titles" button to change the format of chap titles - but somehow all the occurrences of "chapter" were altered - not only the chapter titles. And there are quite a few "chapter" occurrences inside a book I worked on, therefore quite a few paragraphs containing "chapter" were changed.

Is there a way to add more restrictions to this function? e.g., change only occurrences with all-cap "CHAPTER" or only for the first few lines of a file, not to a long paragraph with chapter tucked inside.

Thanks, Sean

Last edited by DiapDealer; 01-12-2020 at 06:50 AM. Reason: The correct thread is not "closed"
xli199 is offline   Reply With Quote
Old 01-12-2020, 01:06 PM   #191
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
Quote:
Originally Posted by xli199 View Post
Hi, sorry to ask about ePub Tidy Tool plugin here - the original thread is closed.

Today I used the "chapter titles" button to change the format of chap titles - but somehow all the occurrences of "chapter" were altered - not only the chapter titles. And there are quite a few "chapter" occurrences inside a book I worked on, therefore quite a few paragraphs containing "chapter" were changed.

Is there a way to add more restrictions to this function? e.g., change only occurrences with all-cap "CHAPTER" or only for the first few lines of a file, not to a long paragraph with chapter tucked inside.
Thanks for your comment. I am very busy at the moment with several other projects. If there are other users who find this an issue then please let me know via this thread. If there are enough users to make it worth spending my time on this, then I will look at updating the plugin.
CalibUser is offline   Reply With Quote
Old 07-15-2020, 03:55 AM   #192
democrite
Evangelist
democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.
 
Posts: 425
Karma: 77256
Join Date: Sep 2011
Device: none
Hello,

Many thanks for this terrific plugin. Incredibly handy.

Would it be possible to add some options such that anything the plugin does, such as fix common OCR, PDF export, or HTML errors, can be selectively enabled, that is one has precise control over the enabling of all things? In the case of vector-quality commercial PDF exports, OCR as well, I prefer to use my own regexes and find errors as they occur. Some maybe OCR errors, some may be a common PDF export error, some may be a typo in the original source, etc. ; in each case, I'd prefer to find them myself on the off chance the common fix isn't correct.

In the meantime, I removed the lines of code for my use and I think I got them all as the log didn't report any changes except the ones I wanted.

I recently found and used this plugin solely for hyphenation. On that note, calibre uses the eBook itself, scanning for words and compiling a dictionary. Would you someday consider such a feature? Many works – academic, scientific, and so forth –, may have unique terms, either from the field itself, transliterated from another language, Latin terms, etc. that not in any dictionary. Would be nice to have. I had first tried calibre but prefer not to convert. In the meantime, I converted the EPUB to text, created a word list, and used that.

Last edited by democrite; 07-15-2020 at 03:59 AM.
democrite is offline   Reply With Quote
Old 07-15-2020, 06:31 AM   #193
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
Quote:
Originally Posted by democrite View Post
Hello,

Many thanks for this terrific plugin. Incredibly handy.
You are most welcome - I'm glad you find the plugin handy.

Quote:
Originally Posted by democrite View Post
Would it be possible to add some options such that anything the plugin does, such as fix common OCR, PDF export, or HTML errors, can be selectively enabled, that is one has precise control over the enabling of all things?

The code for this plugin has many different search/replace terms for correcting errors. It would require a really large number of checkboxes to implement your suggestion, and then the code would need to examine each checkbox to determine which corrections to implement. Regretfully, this would take me too long to implement and test, so I cannot add this feature to the plugin.

Quote:
Originally Posted by democrite View Post
I recently found and used this plugin solely for hyphenation. On that note, calibre uses the eBook itself, scanning for words and compiling a dictionary. Would you someday consider such a feature? Many works – academic, scientific, and so forth –, may have unique terms, either from the field itself, transliterated from another language, Latin terms, etc. that not in any dictionary. Would be nice to have. I had first tried calibre but prefer not to convert. In the meantime, I converted the EPUB to text, created a word list, and used that.

What does Calibre do with the dictionary it has compiled? Does this dictionary consist only of hyphenated words?

On several occasions I have thought that it would be useful to have a dictionary of hyphenated words that need to be kept in the ePub as sometimes the hyphen is removed from some words where I want to keep the hyphen. This is why the plugin gives a list of all the words that have had the hyphen removed - I copy these words to notepad and then do a search/replace to put the hyphen back in to the one or two words where I want to keep the hyphen. Fortunately this does not happen for too many words in a given epub.

I did consider the possibility of producing a dictionary of hyphenated words that were not to be replaced, but then I found that I had an ever-growing list of words to put in the dictionary and decided that it would be too time consuming to finalise a dictionary for this purpose.
CalibUser is offline   Reply With Quote
Old 07-15-2020, 04:42 PM   #194
democrite
Evangelist
democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.
 
Posts: 425
Karma: 77256
Join Date: Sep 2011
Device: none
Quote:
Originally Posted by CalibUser View Post
The code for this plugin has many different search/replace terms for correcting errors. It would require a really large number of checkboxes to implement your suggestion, and then the code would need to examine each checkbox to determine which corrections to implement.
What I mean is perhaps all the changes can be grouped into a minimum number of categories such as Common OCR errors, and so forth. That seems possible?

Quote:
Originally Posted by CalibUser View Post
What does Calibre do with the dictionary it has compiled? Does this dictionary consist only of hyphenated words?
As I haven't looked at the code I'm not sure exactly what calibre does. It certain compiles a word list to fix words as your plugin does that are line-break hyphenated in the PDF, e.g. "read- ing". Such is invaluable for certain works, such as the one I made recently of a scientific work containing countless latin terms and specialized vocabulary. Perhaps it too fixes hyphenated words such as "yellow- green". I would guess such could be a fair amount of work but simpler than what you suggest. I would guess maybe it'd be useful to also keep track of number of word occurrences in case of possible source typos, picking the more common one.
democrite is offline   Reply With Quote
Old 07-16-2020, 05:22 AM   #195
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
Quote:
Originally Posted by democrite View Post
What I mean is perhaps all the changes can be grouped into a minimum number of categories such as Common OCR errors, and so forth. That seems possible?
Hmm...apart from the amount of coding involved, I can see numerous different ways of grouping the corrections that the plugin makes. Some of these groupings may be OK for some users, but then other users may prefer a different set of groupings. What do other people think?

Quote:
Originally Posted by democrite View Post
As I haven't looked at the code I'm not sure exactly what calibre does. It certain compiles a word list to fix words as your plugin does that are line-break hyphenated in the PDF, e.g. "read- ing". Such is invaluable for certain works, such as the one I made recently of a scientific work containing countless latin terms and specialized vocabulary. Perhaps it too fixes hyphenated words such as "yellow- green". I would guess such could be a fair amount of work but simpler than what you suggest. I would guess maybe it'd be useful to also keep track of number of word occurrences in case of possible source typos, picking the more common one.
PDF readers frequently produce the same typos when PDFing different documents, including specialised words. My plugin enables you to a set up a customised list of words that contain these typos, together with the correct word. These words with typos are then corrected automatically when the plugin runs. Although the plugin does not scan the ePub to find misspelt words, you can add these manually to the plugin's list. Please see Using a customised list of words that are corrected automatically in the manual for the plugin.
CalibUser is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tidying Up My Kindle selectortone Calibre 2 07-17-2013 10:35 AM
developping a Plugin for Presentation files abdlink Plugins 4 04-15-2013 11:27 AM
Plugin to fix fb2 files oviksna Plugins 3 01-28-2013 08:53 AM
Tidying Up My Library JayLaFunk Library Management 2 09-20-2011 09:12 AM
Calibre 0.7.50 can't see plugin files mb_webguy Calibre 5 04-29-2011 03:41 AM


All times are GMT -4. The time now is 11:03 PM.


MobileRead.com is a privately owned, operated and funded community.