Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 01-31-2022, 01:29 AM   #1
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 492
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Regex term for batch identifying word count in comments

I am trying to clean up my comments and want to remove word counts from some of them. For example, I have some books where the end of the comments have:

"x,xxx Words" or "xx,xxx Words" or "xxx,xxx Words"

I know I can use (\d+) to select all numbers....but....this also selects any other random numbers, dates, or chapter numbers that might be present as well.

Is there a specific regex that would only select the phrase "x,xxx Words" or "xx,xxx Words" or "xxx,xxx Words" where the x=any digit?

A specific example of what I'm looking for would be seen in this book comment:

Quote:
This is an extremely AU crossover fic that asks the question what might have happened if Petunia Dursley hadn't found a young Harry Potter sleeping on her doorstep on the morning of the 2nd of November 1981. After all, Dumbledore was a bit careless with him 109,446 Words
Here, I want to use a regex to delete the "109,446 Words" but leave the "2nd" and "1981" alone.

I don't mind splitting up the regex into several searches depending on number of words. Any suggestions or links to where I can find an answer would be appreciated.
slantybard is offline   Reply With Quote
Old 01-31-2022, 02:12 AM   #2
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 10,952
Karma: 74999999
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
While there's probably a better way to handle it, this will match xxx,xxx words (or Words):

https://regex101.com/r/YhHpr3/1

For other digits, just edit the two {3} bits to reflect the number of digits.

EDIT: Here's an improved version that can handle other numbers of digits:

https://regex101.com/r/YhHpr3/2

Last edited by ownedbycats; 01-31-2022 at 03:44 AM.
ownedbycats is offline   Reply With Quote
Advert
Old 01-31-2022, 03:42 AM   #3
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Slightly better would be:

Code:
(\d{0,3}[,]?\d{1,3} [Ww]ords)
That should get any number up to 6 digits with or without the comma. But, it would miss the first digit in 1000000.

If I hadn't see your solution, I think I would have suggested:

Code:
([\d,]+ [Ww]ords)
Which would pickup ", words". So, slightly safer is:

Code:
([\d,]{2,} [Ww]ords\b)
davidfor is offline   Reply With Quote
Old 01-31-2022, 11:19 AM   #4
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 492
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Thank you both very much for the suggestions. Regex is wonderfully wacky! I ended up using the original suggestion prior to seeing the improved suggestions. In the end, I used:
Quote:
(\d{1}[,]\d{3}[,]\d{3} (W|w)ords)
To remove all the comments in books with > 1,000,000 words

Quote:
(\d{3}[,]\d{3} (W|w)ords)
To remove all the comments in books between 100,000 and 1,000,000 words

Quote:
(\d{2}[,]\d{3} (W|w)ords)
To remove all the comments in books between 10,000 and 100,000 words

Quote:
(\d{1}[,]\d{3} (W|w)ords)
To remove all the comments in books between 1,000 and 10,000 words

Quote:
(\d{3} (W|w)ords)
To finally remove all the comments in books < 1,000 words
slantybard is offline   Reply With Quote
Old 01-31-2022, 07:41 PM   #5
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
A single one to do this is:

Code:
(\d{0,1}[,]?\d{0,3}[,]?\d{1,3} [Ww]ords)
It isn't perfect because it would pickup something like "1,,1 Words". But, I suspect that won't appear.

And semantically, "[Ww]" is a bit better than "(W|w)". The former is "One of this list of things" and the later is "W or w and put them into a group for the replacement". It is a little pedantic and doesn't make a difference in your situation. But, if you need to extend this because you found another case, or, needed to replace the text, the latter might not work as well.
davidfor is offline   Reply With Quote
Advert
Old 01-31-2022, 08:02 PM   #6
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 10,952
Karma: 74999999
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
Yes, I'd forgotten that square brackets worked better for that.
ownedbycats is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
batch processing regex search/replace? G2B Editor 21 11-24-2020 09:52 PM
Regex to count line wraps? kboogie222 Library Management 12 09-15-2019 09:12 PM
Word Count and Page Count? CrossReach Library Management 2 07-19-2018 05:44 PM
Comments - batch add? mezme Calibre 6 02-22-2015 08:32 PM
COMMENTS batch formatting ippopom Library Management 7 02-26-2013 01:23 PM


All times are GMT -4. The time now is 09:05 AM.


MobileRead.com is a privately owned, operated and funded community.