MobileRead Forums - View Single Post - A situation that breaks "smarten punctuation"

ldolse · 05-24-2012, 03:36 AM

A fix was checked in for this in the past day or two - basically the two problem regexes have been commented out. This issue is that smarten punctation doesn't have any concept of a 'sentence'. It 'tokenizes' the html by breaking everything into either an html tag or a piece of text - the pieces of text get passed to the regexes.

Those two regexes that are causing the problem are using line start/line end regex meta-characters--^$--Since there is no guarantee that any piece of text the regex sees will actually be the start or end of the line we had to disable those patterns. And in the case of style tags like <i>,<b>,<em>,<span>, etc a sentence would indeed be broken into multiple pieces.

Basically the take-away is there won't be any way to smarten some 19th/early 20th century books that use odd quoting patterns like some of the Holmes books, at least not within the context of how smarten works.

Bug for reference:
https://bugs.launchpad.net/bugs/998900

05-24-2012, 03:36 AM	#35
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	A fix was checked in for this in the past day or two - basically the two problem regexes have been commented out. This issue is that smarten punctation doesn't have any concept of a 'sentence'. It 'tokenizes' the html by breaking everything into either an html tag or a piece of text - the pieces of text get passed to the regexes. Those two regexes that are causing the problem are using line start/line end regex meta-characters--^$--Since there is no guarantee that any piece of text the regex sees will actually be the start or end of the line we had to disable those patterns. And in the case of style tags like <i>,<b>,<em>,<span>, etc a sentence would indeed be broken into multiple pieces. Basically the take-away is there won't be any way to smarten some 19th/early 20th century books that use odd quoting patterns like some of the Holmes books, at least not within the context of how smarten works. Bug for reference: https://bugs.launchpad.net/bugs/998900 Last edited by ldolse; 05-24-2012 at 03:44 AM.