![]() |
#1 |
Nameless Being
|
![]()
This regex function will replace numbers in words with digits. But it can't deal with numbers like twenty-one. They must be in the format twenty one.
Can anyone suggest a solution? Code:
def text2int(textnum, numwords={}): if not numwords: units = [ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", ] tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"] scales = ["hundred", "thousand", "million", "billion", "trillion"] numwords["and"] = (1, 0) for idx, word in enumerate(units): numwords[word] = (1, idx) for idx, word in enumerate(tens): numwords[word] = (1, idx * 10) for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0) current = result = 0 for word in textnum.split(): word = word.lower() if word not in numwords: raise Exception("Illegal word: " + word) scale, increment = numwords[word] current = current * scale + increment if scale > 100: result += current current = 0 return result + current def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): return match.group(1)+str(text2int(match.group(2)))+match.group(3) Last edited by Ted Friesen; 09-23-2021 at 06:54 PM. |
![]() |
![]() |
#2 |
Running with scissors
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,583
Karma: 14328510
Join Date: Nov 2019
Device: none
|
Can't you simply replace any dash with a space and then call text2int on the result of that?
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Nameless Being
|
Thanks for the reply Hobnail, but I was hoping for a single solution and I have found it.
By adding this line the hyphen is exchanged for a space before textnum is split into words. Code:
current = result = 0 ==> textnum = textnum.replace('-',' ') #replace hyphen with space for word in textnum.split(): |
![]() |
![]() |
#4 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,751
Karma: 103847703
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
House numbers, credit cards, years, measurements (mostly) should be digits. But Centuries, ages, counted things should be spelled out. However depends on context audience.
Ages are usually hyphenated but not Centuries etc. He was one and twenty years old is the original form of He was twenty-one, the "-" replaces the "and". I'd review each find before replacing. |
![]() |
![]() |
![]() |
#5 |
Nameless Being
|
Thanks for that interesting list. I agree mostly, but was taught that numbers up to ten should be in words and larger numbers should be digits and that double word numbers should be hyphenated. Blame my strict grammar teachers.
|
![]() |
Advert | |
|
![]() |
#6 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,916
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
![]() 27 27[sup]th[/sup] Street, Twenty-Seventh birthday, Tel: 272-2727 One size does not fit all I would love a 'replace picker' tool, where you search, BUT you are presented with a LIST (instead of a Single replace button) that yo click on the pattern (you set those up) as you step thru the Founds (this, instead of doing multiple passes, each with a new term. More like Spell check, where you pick from the suggestions. |
|
![]() |
![]() |
![]() |
#7 | |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,751
Karma: 103847703
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Quote:
I have an exact list somewhere, but it may only fully apply in places that use British English. Some aspects of British English punctuation, grammar and spelling are more flexible than some online resources including Wikipedia suggest*. USA, especially Arts Courses, is generally more prescriptive. This difference dates to Webster USA (prescriptive) and the Oxford Dictionaries (GB then UK documents usage). Most of English Grammar I was taught 11 to 16 approx at school was flawed and none of it covered dialogue in fiction. Totally erroneous information about commas that might only apply to a script! * Outer Dialogue quotes are “and” in the USA and in Irish Language. UK/Irish publishers in English have a a house style of ‘and’ or “and” which may change per era. British English allows grey or gray unless the name is one of them. Dashes as parenthesis in the USA are usually em—and no space, but UK/Ireland can use – en dashes with spaces either side. Hyphens don't have spaces unless they are used as a minus (there is a separate minus character) or for a range, but 1914 to 1918 is preferred by some to 1914 - 1918, especially in mathematical and science works. The USA might have #6 instead of British number 6 or No. 6, the # is never used that way in British English. You can have the sixth house on the left or 6 Duke Street, but not the 6th house on the left or six Duke Street. Many abbreviations common in casual writing are to be avoided in formal writing or novels. Journalists have to use the House style. I've used Spelling and Grammar checkers since the early 1980s and to me it's one proof AI is SF. I turn auto-replace off on everything, turn off check grammar while you type and often have to click ignore. Grammar checkers don't know if "that that" is correct at times. They should suggest words that can be doubled differently. Word processers often get a leading omission wrong like “back in the ’80s” or ’tis. Also they put ’ and ” for feet and inches or minutes and seconds. Those use a prime and double prime. Straight quotes " and ' with italics are a simulation. Undo or a saved copy is your friend with automated correction tools! Last edited by Quoth; 09-25-2021 at 04:40 PM. Reason: Examples |
|
![]() |
![]() |
![]() |
#8 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
|
|
![]() |
![]() |
![]() |
#9 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
I called it the "'Spellcheck List' For Search": What Features or Tools does Sigil Still Need Yet? Quote:
![]() * * * It would also be nice to have a viewable list, then you could mass accept/reject. Currently, I see pieces of this functionality in various programs: 1. Notepad++ has a "Find All", which gives you a list + shows you the locations in context: Attachment 185761 (See very bottom part of the image where I searched for all italics: <i>.+?</i>.) But sadly, you can only mass SEARCH and get a list... you can't replace. But you can click on each item, then jump to that exact location within the file + see the match with its surrounding context. 2. A tool like Bulk Rename Utility allows you to mass search/replace filenames: You fill out your parameters below. Then you select which files you want to apply it to (Ctrl+Click/Shift+Click). It puts green highlight on the files that'll actually change, and shows you the before/after in 2 columns. 3. Currently, I use Beyond Compare in order to mass apply/reject changes: I use this when I have two different versions of an ebook. I extract the HTML, then compare both sets of files against each other. You can then push changes from:
Note: One disadvantage though... Beyond Compare was meant more for actual code, where lines are small. So in ebooks, you can only replace "whole paragraphs" at a time. Fine if there's 1 simple change in a paragraph. Not fine if there's dozens, where I might want 9/12 changes. And everything is compared at the file-level... it would be nice to do something like:
Then you could go through in passes of accepts/rejects/ignores. Complete Side Note: I made a similar argument in 2018: "Does Tool Exist to Spellcheck/Grammarcheck by Category?" I explained the replace one-by-one workflow compared to the category/list workflows. Spellcheck Lists in Sigil/Calibre completely dwarf the crappy one-by-one spellchecking in Word/LibreOffice. I'll never be able to go back! (Now, we just need this expanded to grammarchecking + Find/Replace! ![]() * * * Having a Calibre/Sigil "Spellcheck List"-type thing, where you can run complicated Find/Replace actions, then visually see them in a searchable list form would be extremely powerful. And then adding a way to mass accept/reject would make it into a super tool. Last edited by Tex2002ans; 09-26-2021 at 01:04 AM. |
||
![]() |
![]() |
![]() |
#10 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,221
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I have to say I dont really understand this use case. If you are reviewing each replacement individually you might as well do it by clicking "Replace and find" repeatedly and correcting any bad replaces with an undo when needed.
If you want an overview of all replaces, and the ability to revert only a few of them, the "See what changed" tool works for that. You can review all the changes in one window, if you find one that you dont like, simply double click on it in the right panel and it will be displayed in the main edit book window, where you can edit it. Admittedly, this is not quite as convenient as clicking a button to revert an individual change, however, since reversion is a relatively rare operation (if it isnt maybe reject all changes and adjust your regex) it's good enough. |
![]() |
![]() |
![]() |
#11 | |
Not Quite Dead
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 195
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
|
Quote:
I hope I was the only one who did not suspect the window had a finer grain of functionality. |
|
![]() |
![]() |
![]() |
#12 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
I'll try to break it down into more manageable pieces/use-cases. * * * The Line-by-Line Merge Similar to when you Compare entire documents in LibreOffice/Word. You have: File A: v1 of an ebook from a year ago File B: v7 of an ebook from today (Happens quite often in ebooks.) Now, you wanted to create a File C, because not all changes are correct. This is where the line-by-line, Left->Right override would be great. 1. You open up File B in Calibre Editor. 2. You File > Compare against File A. 3. You scroll through the comparison, and are able to override individual lines with File A's version. 4. This will give you "File C", which you can then save as a new EPUB. Currently, you can only VIEW all the differences. - - - Note: Calibre's current Compare is extremely helpful though! And the way it's displayed in Calibre is absolutely fantastic/beautiful! Its visuals beat the pants off of Beyond Compare's (and any other code comparison tool I've tested so far). - - - I also believe this would be helpful in the normal large Find/Replaces (with a handful of edge cases). Like this thread. A giant Find/Replace to switch all "123" -> "spelled-out numbers" form. 100 replaces were fine:
But then there were a few exceptions. Like the ones theducks gave:
So, you'd see the fantastic Calibre diff, and you can scroll through and override certain lines with the "Left"/before version. * * * And then the other methods I rambled about are more extreme. A Sortable/Searchable (List-Based?) Differ (Advanced Find/Replace?) When the amount of changes are overwhelming (in the hundreds/thousands). Similar to the Spellcheck List, you'd be able to type in a: - Find - Replace Run this on a book (like pressing "Count All") and generate a list: - Find: Chapter \d+ You'd get a list of all hits: Code:
Found | Replace | Hits Chapter 1 | | 1 Chapter 2 | | 1 Chapter 3 | | 1 Chapter 4 | | 1 [...] Chapter 100 | | 1 And, similar to the Spellcheck List, you can search/sort through this: - Search: 1 Code:
Found | Replace | Hits Chapter 1 | | 1 Chapter 10 | | 1 Chapter 11 | | 1 Chapter 12 | | 1 [...] Chapter 100 | | 1 Code:
Found | Replace | Hits Chapter 10 | | 1 Chapter 100 | | 1 - Find: Chapter (\d+) - Replace: Chap. \1 Code:
Found | Replace | Hits Chapter 1 | Chap. 1 | 1 Chapter 2 | Chap. 2 | 1 Chapter 3 | Chap. 3 | 1 Chapter 4 | Chap. 4 | 1 [...] Chapter 100 | Chap. 100 | 1 Maybe, sorting by Hits, there would be a: Code:
Chapter 5 | Chap. 5 | 5
You may want to treat that differently than:
so you'd apply the change to all 99 other replaces first, then you can dig in to that oddity in more detail. * * * I think those 2 would take a large bite out of this "mass diff" use-case. The other stuff was just thinking about categorizing/sorting through diffs specifically. Like if you Compare two documents in LibreOffice/Word, there may be hundreds of changes where a comma/punctuation was added/removed. Going through these one-by-one is extremely slow. Would be nice to sort through JUST THE COMMA diffs, then mass accept/reject those lines. Then sort through JUST THE <i> diffs, then mass accept/reject those lines. You just Smartened Punctuation the entire EPUB, now you only want to double-check that quotes around EM DASHES were done properly... so you can: "Search: —" and only focus on those diffs. (Quite often, the Smarten Punctuation puts the wrongly flipped quotation mark: —“ instead of —” .) Anyway, that would be a far-in-the-future type idea. I think for now, those 2 listed above would be more helpful to the broader community. ![]() Quote:
![]() Just this week, I wrote about how I stumbled upon the Calibre Look&Feel > "Transform Styles" tab. Thousands of conversions in Calibre, years and years... and this thing was sitting right under my nose. Thanks for all the fantastic work, Kovid! Last edited by Tex2002ans; 09-26-2021 at 02:35 PM. |
||
![]() |
![]() |
![]() |
#13 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,221
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Same double click to edit trick works for the file compare case as well. Not as convenient as a revert button, obviously but still pretty good, and it lets you do actual *edits* not just reverts.
The second use case I'm afraid doesn't really resonate with me, sorry. Last edited by kovidgoyal; 09-26-2021 at 11:11 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex Function: Italicize a list of words | Paulie_D | Editor | 2 | 08-14-2021 11:50 AM |
Help creating possible Regex-Function | MerlinMama | Editor | 14 | 03-03-2020 05:53 AM |
Predefined regex for Regex-function | sherman | Editor | 3 | 01-19-2020 05:32 AM |
regex function replacement | The_book | Sigil | 5 | 12-09-2019 09:45 AM |
Regex Function about «» and “” | senhal | Editor | 8 | 04-06-2016 02:12 AM |