View Single Post
Old 09-21-2009, 12:13 PM   #65
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Hi, Frabjous!

I would suggest holding off until I put up the next version. The haphazard unicode errors are basically gone as of the development version I am currently working on.

The filesize thing is weird... the 600 KB file certainly did not cause a memory issue, but whatever the issue was got misreported as such.

I have successfully processed 700+ MB (nearly 1 GB) files with pacify.py before... and once I implement spooling (which I actually think I will do sooner rather than later after all), file size will be a non-issue so long as you have both sufficient memory and disk space.

And yes, the HTML parsing needs to take tables and such properly into account... along with a few other things.

I'll keep everyone update via this thread...

- Ahi
ahi is offline   Reply With Quote