View Single Post
Old 05-24-2012, 02:36 AM   #35
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
A fix was checked in for this in the past day or two - basically the two problem regexes have been commented out. This issue is that smarten punctation doesn't have any concept of a 'sentence'. It 'tokenizes' the html by breaking everything into either an html tag or a piece of text - the pieces of text get passed to the regexes.

Those two regexes that are causing the problem are using line start/line end regex meta-characters--^$--Since there is no guarantee that any piece of text the regex sees will actually be the start or end of the line we had to disable those patterns. And in the case of style tags like <i>,<b>,<em>,<span>, etc a sentence would indeed be broken into multiple pieces.

Basically the take-away is there won't be any way to smarten some 19th/early 20th century books that use odd quoting patterns like some of the Holmes books, at least not within the context of how smarten works.

Bug for reference:
https://bugs.launchpad.net/bugs/998900

Last edited by ldolse; 05-24-2012 at 02:44 AM.
ldolse is offline   Reply With Quote