View Single Post
Old 11-23-2010, 07:31 PM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
How do you want to do the word count? Funnily enough I'm adding that this week for some other reasons, but I wasn't planning to do anything that was exposed to an end user.

My implementation is relatively simplistic for html - I'm just deleting the everything in the <head> section and then removing all the other tags with a regex. It's probably not always perfect but it's fast. Once that's done I'm using this code to do the actual count:
http://ginstrom.com/scribbles/2007/1...s-with-python/

The thing to do which could potentially be more accurate is to use this extra code which uses a proper parser to extract all translatable words (which was the original goal of this author):
http://ginstrom.com/scribbles/2008/0...e-with-python/

Anyway I could put the word count into the debug log so you could see it in the job details.

Last edited by ldolse; 11-23-2010 at 07:34 PM.
ldolse is offline   Reply With Quote