Quote:
Originally Posted by Tex2002ans
Just wondering what the use-case is?
Are you trying to pull out all n-grams?
|
It's just an idea that's on my mind.
Code:
I bought a new smart phone.
I have a very smart phone.
I check all words and if two neighboring (connected with each other) exist in the dictionary - I display the results.
In this case:
Code:
smart + phone = smartphone
(In the first sentence should be "smartphone", in second is OK – written separately.)
Of course, EVERYTHING depends on the context and this context does not manage to "catch" correctly.
My dream is to get a result close to:
Code:
(.{0,10})(?=(\b\w+\b[,;.\s]*\b\w+\b))(.{0,10})
The expected result:
Code:
ght a new smart phone.
ve a very smart phone.
(0-10 characters of "context" around words).
I can then jump to the first sentence and manually join the words.