View Single Post
Old 01-02-2018, 12:03 PM   #259
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,928
Karma: 6361444
Join Date: Nov 2009
Device: many
Yes the offset gumbo records is a byte offset from a start of a utf-8 encoded file or string. The column number is "proper" as it is measured in unicode code points not in bytes. The solution is to use the routine previously posted by Doitsu to convert line and column numbers inside python to an offset in unicode codepoints if that is what you want. Offsets are hard to work with given they are encoding dependent. Whereas line and column given in codepoints should be easier to work with and convert to any encoding you like.

KevinH
KevinH is offline   Reply With Quote