View Single Post
Old 12-24-2019, 12:05 PM   #947
kboogie222
Junior Member
kboogie222 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2019
Location: New Jersey
Device: Kindle Oasis 2
Bad Breaks

Quote:
Originally Posted by snarkophilus View Post
Indeed, Stephen King's Christine has 145 matches for just [a-z]</p> and 245 matches for [a-z,]</p>. Almost all of these were in song verses at the start of each chapter, but there were three missing periods at the end of sentences, one comma that should have been a period and one actual occurrence of a break mid-sentence.
Wow, it sounds like we could be pretty close here. Very cool!

Thinking through a couple rules that might help us avoid the false positives, here's the first that come to mind. There's probably 2 or 3 better solutions, hah.

1) Ignore instances that are repeating.
2) Add a capital requirement for the beginning of the first half of broken sentence.
3) Add a length requirement for the first half of the first half

Quote:
Originally Posted by snarkophilus View Post
Instead of just looking for at least one match for the regex, you could count the number of times the broken sentence regex appears and return "true" if more than certain (configurable?) threshold.
Exactly. Set a threshold for number of time broken, and exit search if true. Would also cut down on unnecessary searching beyond threshold.
kboogie222 is offline   Reply With Quote