View Single Post
Old 08-16-2010, 06:04 AM   #3
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,463
Karma: 10684861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by purcelljf View Post
After scanning a book and exporting it to html, I frequently have separate paragraphs where the pages break in the document. ... I thought maybe there is a trick to this, so it doesn't take so much time?
There are lots of tricks.
We just need to know what software you are using and what are your skills.
Do you use OpenOffice.org writer, or MSOffice, or something else?
Do you konw what Regular Expression is?

As previous poster said, loking for paragraphs that begin with a lower cap letter would find the vast majority of such paragraphs.
You can also start looking for paragrephs that do not end with . ? ! ." ?" !" .' ?' !' ... you get the idea.
kacir is offline   Reply With Quote