Wednesday, January 29, 2014

T-SQL: Unicode-escaping characters in a string

I am in no way a T-SQL pro, but today I had a need of escaping a varchar field value to create a valid JSON string, while being limited only to Microsoft T-SQL features.
The JSON RFC states that:
All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
As it turns out, the way of iterating through a string in T-SQL is as such:
set @wcount = 0
set @index = 1
set @len = len(@string)

while @index <= @len
begin
  set @char = substring(@string, @index, 1)
  /* do something with @char */
  set @index += 1
end
To escape a quote or a backslash, we just prefix it with a backslash. As for the control characters, this gets a bit trickier, as we need to convert them to \u-notation that is used in JSON. We can use the built-in unicode function to get the ordinal value of a char and determine that it needs to be escaped.
when unicode(@char) < 32
Then we take advantage of the fn_varbintohexstr system function to convert a char value through varbinary type to a hex string.
sys.fn_varbintohexstr(cast(@char as varbinary))
Finally, after some string chopping and concatenating, we get what we want:
'\u00' + right(sys.fn_varbintohexstr(cast(@char as varbinary)), 2)
Here's the code of the function json_escape in its entirety.
if object_id(N'dbo.json_escape', N'FN') is not null
    drop function dbo.json_escape
go

create function dbo.json_escape (@string varchar(max)) returns varchar(max)
as
begin
    declare @wcount int, @index int, @len int, @char char, @escaped_string varchar(max)

    set @escaped_string = ''
    set @wcount = 0
    set @index = 1
    set @len = len(@string)

    while @index <= @len
    begin
        set @char = substring(@string, @index, 1)
        set @escaped_string += 
        case
            when @char = '"' then '\"'
            when @char = '\' then '\\'
            when unicode(@char) < 32 then '\u00' + right(sys.fn_varbintohexstr(cast(@char as varbinary)), 2)
            else @char
        end
        set @index += 1
    end
    return(@escaped_string)
end

go

No comments:

Post a Comment