View Single Post
Old 08-20-2009, 03:25 PM   #5
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by siulayhumga View Post
I am trying to convert a lot of old text file to epub so I can read it in my Sony 505.

Most of them was scan in at mid 90s and have a lot of formatting errors like extra line break in the middle of the line. Guess the OCR technology for PC was weak back than.

I don't want to use a text edtior to remove all the CRLF and line wrap everything coz this will give me a "wall of text".


Anyone know a program which will do this kind of text paragraph "reflow"?

Thanks
If the file is reasonably regular, other than for the erroneous linebreaks in the middle of paragraph, my very very under constructed python script might be able to get it in slightly better shape:

Try running it with:

pacify.py -i filename.txt -p

or with:

pacify.py -i filename.txt -rp

It outputs results into output.txt in the same directory wherefrom you run it.

- Ahi
Attached Files
File Type: zip pacify.zip (3.0 KB, 404 views)
ahi is offline   Reply With Quote