MobileRead Forums - View Single Post - pacify.py (Text reformatter / RTF extractor)

ahi · 09-21-2009, 01:13 PM

Hi, Frabjous!

I would suggest holding off until I put up the next version. The haphazard unicode errors are basically gone as of the development version I am currently working on.

The filesize thing is weird... the 600 KB file certainly did not cause a memory issue, but whatever the issue was got misreported as such.

I have successfully processed 700+ MB (nearly 1 GB) files with pacify.py before... and once I implement spooling (which I actually think I will do sooner rather than later after all), file size will be a non-issue so long as you have both sufficient memory and disk space.

And yes, the HTML parsing needs to take tables and such properly into account... along with a few other things.

I'll keep everyone update via this thread...

- Ahi

09-21-2009, 01:13 PM	#65
ahi Wizard Posts: 1,790 Karma: 507333 Join Date: May 2009 Device: none	Hi, Frabjous! I would suggest holding off until I put up the next version. The haphazard unicode errors are basically gone as of the development version I am currently working on. The filesize thing is weird... the 600 KB file certainly did not cause a memory issue, but whatever the issue was got misreported as such. I have successfully processed 700+ MB (nearly 1 GB) files with pacify.py before... and once I implement spooling (which I actually think I will do sooner rather than later after all), file size will be a non-issue so long as you have both sufficient memory and disk space. And yes, the HTML parsing needs to take tables and such properly into account... along with a few other things. I'll keep everyone update via this thread... - Ahi