I'm finding that a lot of files that were converted from PDF have line wrap issues. Tons of line breaks in the middle of sentences.
The number of paragraphs that start with a lowercase letter would be a great indicator of PDF conversion linewrap issues.
Is it possible to create a regex that counts those occurrences and saves the count in a column?
This would be a great measure of quality. Perhaps even the ratio of lower/uppercase paragraph starts.
Please help