![]() |
#1 |
Member
![]() Posts: 11
Karma: 10
Join Date: Dec 2013
Device: none
|
Sigil plug-in idea
Many of us do epub conversions from old pulp magazines -- mysteries from the 20s and 30s, SF from the 30s and 40s -- tens of thousands of stories that have never been republished and don't deserve to die. Even with the best software, the the OCR generates many errors that need to be corrected manually. (Yellowed pages, ink bleeding, old typefaces are the main causes.)
This can be done (laboriously) in Sigil with spellcheck...but it could be streamlined to a few seconds with a simple Sigil plug-in. Most of the errors recur with frightening regularity -- things like weU (well) presendy (presently) '/ (,") iie (he) Td ("I'd) bom (born) bum (burn) hps (lips) gendy (gently) and so on. I, literally, can supply a list of many hundreds of these non-words that recur in nearly every pulp conversion. It we could run a plug-in that would automatically correct *all* of these errors *before* we spellcheck, we could cut proofing time by a huge margin. The plug-in would access a database that provides a list of error-words and the corresponding fix. I'm sure that we could come up with an initial list of many hundreds of errors...and if the plug-in could access a text file that the user can modify, they can add words for specialized conversions (medical, scientific, etc). I hope someone thinks this is a good idea -- it sure as heck would help me. Thanks. |
![]() |
![]() |
![]() |
#2 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,318
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
That is a good idea. I seems there are similar functioning plugins out there - checking words against a pre-made list - like the spell check function. I would recommend having the option to confirm with the user for words that actually are real words ("bum") before automatically changing them.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,493
Karma: 5703586
Join Date: Nov 2009
Device: many
|
If you can supply a list of words in a text file, one pair per line separated by a vertical pipe character:
Td|I'd I would be happy to write a small program to sort and index the list and then walk the text of every xhtml file parsing the text word by word, and looking in the list to see if the word needs to be replaced and if so doing the replacement. Please make the list case sensitive. The hardest part will actually be where to split the text of a sentence into words and dealing with all the punctuation pieces stuck to the end. KevinH |
![]() |
![]() |
![]() |
#4 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
This sounds wonderful
But I would like to see it handled as 2 cases 1) sure thing fixes (red line words ) 2) Context check required (step thru only) fixes eg Is it bum or burn ![]() Replace Options: Curley/ straight quotes |
![]() |
![]() |
![]() |
#5 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,493
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi,
Easiest would be two plugins. The first handles non-word to word corrections fully automatically. The second searches a list of word to word corrections, where it presents the word and its sentence to you and you say replace or not. Alternatively for the word to word conversion, you could add a condition such as only replace if any of a short list of other keywords are within say 5 words of the target Something along the lines of bum|burn:hot,fire,ignite,scald,inferno,blaze,flame ,heat Effectively you are generating an automatic but context sensitive replacement. A final proof read would always be needed but you could have the plugin , wrap the replaced word in span tags that turned it red. Then before writing out the epub, remove those created tags. Creating, such wordlists could in fact be crowd sourced. KevinH ps. things like this is why I designed and added the plugin interface to begin with. It is perfect for automating cleanups. Last edited by KevinH; 09-29-2015 at 10:27 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,635
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
IIRC, Toxaris said he might think about 'porting' some of the features of his EPUB Tools Word Addin to a Sigil plugin. Its Search and Replace and Dialogue Checker features are obvious candidates.
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
The Plugin for tidying ePub files fix many of those errors.
We can simply see what else can we fix without problem ![]() For example in my last test with the Π fixes with dictionary... i must find a way to bypass the fix of some words. My code find the word "ΓΙΟΥ" and change it to "ΠΟΥ". But they are both correct. Last edited by gipsy; 09-30-2015 at 10:24 AM. |
![]() |
![]() |
![]() |
#8 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I wrote the "Plugin for tidying ePub files" for this reason. I have magazines and old books that I read in as PDF files and then need to correct a set of common misspellings. This plugin includes the ability to correct some misspelt words such as Tve, Fd, Til, Fve, Fm, Vm, Tm, tlieir, lli, words that should not be hyphenated, apostrophes that are the wrong way round and other fixes. It should be possible to extend this to work with customised lists of words.
I will look at extending it so that it reads a list of common errors in misspelt words from a file and corrects them when I have time....I am working on a different project at the moment. |
![]() |
![]() |
![]() |
#9 |
Member
![]() Posts: 11
Karma: 10
Join Date: Dec 2013
Device: none
|
I have been keeping a running list, but I have asked some friends for additions.
Also, I think most changes should be automatic, while others should offer a "spellcheck-like" set of options. For example, Td might be "I'd or I'd. Also, straight and curly quotes would have to be taken into account. The list will have to have many case-specific fixes such as weU and WeU (but I can build that into the master list). I'm sure others will think of other things as well. Also, a frequent error is a word ending with a capital L -- alL -- this is always all. [a-z]L to l. search and replace would be nice too. Last edited by martyger; 09-30-2015 at 08:01 PM. Reason: potential added feature to the plug-in |
![]() |
![]() |
![]() |
#10 |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
|
Simple word replacements can be stored as saved searches, added to a group and then executed as a whole by executing the whole group. Can't see a plug-in for this as this functionality is already present.
|
![]() |
![]() |
![]() |
#11 | |
Member
![]() Posts: 11
Karma: 10
Join Date: Dec 2013
Device: none
|
Quote:
2\Name="pulp errors/bom" 2\Find=" bom " 2\Replace=" born " 3\Name=pulp errors/L 3\Find=([a-z])L 3\Replace=\\1l. If no one sees any flaws in this, I'll create the list and post the Pulp Errors Group text here so folks can just pop it into their sigil_searches.ini file. Last edited by martyger; 10-01-2015 at 08:30 AM. |
|
![]() |
![]() |
![]() |
#12 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I have updated my plugin "Plugin for tidying ePub files" at https://www.mobileread.com/forums/sho...d.php?t=264378 to enable a list of commonly misspelt words and their corrections in a separate file to be processed. The plugin uses the convention suggested by KevinH. Currently this plugin will change words that have been misspelt automatically, but not words where a context check is needed.
@theducks: DiapDealer has developed a plugin that will turn straight quotes to curly quotes at https://www.mobileread.com/forums/sho...d.php?t=247088 While I appreciate that there is little point in using a plugin solely for correcting words when Sigil has a built in function for executing a group of saved searches, my plugin can do more than this. For example, it can process chapter headings, making them uppercase, mixed case etc and strip out unwanted tags at the same time, allowing the user to apply different options to different ePubs. It also has a "bolt-in" image resizer to change the siae of an image if it is too small for the cover page. |
![]() |
![]() |
![]() |
#13 | |
Member
![]() Posts: 11
Karma: 10
Join Date: Dec 2013
Device: none
|
Quote:
|
|
![]() |
![]() |
![]() |
#14 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@martyger: Another user has reported the same problem. It worked on my Windows 7 system. I will have another look at the code to find out what is happening.
|
![]() |
![]() |
![]() |
#15 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
The plugin should work now - there was an error in the filename that did not match the XML file in the plugin
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil on Nook vs Sigil on Kobo vs Sigil on iBook | rosshalde | Sigil | 12 | 11-13-2014 09:34 AM |
Epub crashes on Sigil for Mac, OK on Sigil for PC | crystamichelle | Sigil | 6 | 08-14-2013 02:52 PM |
Sigil 0.3.4 / Sigil 0.4.0 RC1 / Cover in Nook Color | Bertrand | Sigil | 13 | 08-06-2011 04:06 AM |
Sigil 0.3.4 / Problème CSS entre Sigil et iPad | Grivels | Software | 10 | 07-03-2011 09:06 AM |
My "read" tag idea enhancement for Calibre idea | rcuadro | Calibre | 10 | 01-20-2011 04:23 PM |