View Single Post
Old 04-28-2009, 05:43 AM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
That logic is basically what I'm referring to, it's just with a single regex replacement in Python. That is then combined with a median line length calculation so that only lines approaching the document median have the regex applied (an extra safety to prevent short lines without punctuation from being wrapped). This is what we're doing already for PDF post processing.

There will always be docs that this won't work for, but I think that we can handle the majority of the cases where a user needs to hand edit to fix this sort of thing.
ldolse is offline   Reply With Quote