How to parse special characters in PARSE_JSON function

Sometimes it can be confusing when trying to parse JSON string, this is no different when using PARSE_JSON function in Snowflake, especially when there are special characters involved. Look at below examples when I need to have backslash inside the key string:

SELECT PARSE_JSON('{"my\\id": "value"}'); 
-- Error parsing JSON: missing colon, pos 10

SELECT PARSE_JSON('{"my\\\id": "value"}');
-- Error parsing JSON: missing colon, pos 10

SELECT PARSE_JSON('{"my\\\\id": "value"}');
-- returns { "my\\id": "value" }

We can see that both of 2 and 3 backslashes produced errors, while 4 backslash returned value as “my\\id“, which contains 2 backslashes. But our key value should only contain one, which is “my\id“. So what’s happening here?

In order to understand what you need to parse into the PARSE_JSON function, you need to understand what value is stored inside JSON object. In our case, what value will be stored in JSON for our key string “my\id“, which has a single backslash?

To demonstrate this, it is easier to use an online JSON parser tool. The one I use often is the Json Parse Online. See my test in screenshots below:

You can see that when we only have one backslash “my\id“, JSON parser won’t be able to evaluate it and reported error. It is only when we have two backslashes, the final result is our correct value of “my\id“.

This is because backslash is a special character, when JSON parser evaluates data and encounters a backslash, it will try to interpret the immediate character after backslash, in our above case is character “i”. And “\i” is not a valid character in JSON, hence it fails. Now in the second case, we have double backslash “\\”, after JSON’s interpretation, it will become single backslash, as first one is just the escape character.

So, we can see that in order to return single backslash from JSON parser, we need to store double backslashes. Once we know this, let’s get back to the PARSE_JSON function. Remember that PARSE_JSON will return a JSON object (a VARIANT type in Snowflake world) that will need to be evaluated later on, so we need to have 2 backslashes. However, since we are passing JSON data as a string to PARSE_JSON function, the backslash again is treated as a special character inside quotes, if we only have 2 backslashes, the first one will be treated as escape character, hence we will end up with single backslash in JSON, and in term the data will be invalid when JSON parser tries to evaluate it. So, we need to escape BOTH of the backslashes, hence 4 in total is required.

According to above analysis, our last query mentioned above is the correct one:

SELECT PARSE_JSON('{"my\\\\id": "value"}');
-- returns { "my\\id": "value" }

To demonstrate this further, please see below examples:

-- We need four backslashes to pass into PARSE_JSON function,
-- which will result in 2 backslashes to be stored in resulted VARIANT type,
-- which in term will be evaluated as single backslash after we convert it to STRING type

SELECT PARSE_JSON('{"id": "my\\\\value"}'):id::STRING AS ID;
+----------+
| ID       |
|----------|
| my\value |
+----------+

SELECT PARSE_JSON('{"my\\\\id": "my\\\\value"}'):"my\id"::STRING AS ID;
+----------+
| ID       |
|----------|
| my\value |
+----------+

The same rule also applies to double quote character. Double quote is used extensively in JSON data to enclose keys and values. In order to have double quote as part of JSON’s key or value, we need to escape it with single backslash inside JSON data. However, one difference here is that because we used single quotes around the JSON data string, double quote itself is treated as literal, hence we do NOT need to escape it, and all we need is to escape the single backslash, see example below:

SELECT PARSE_JSON('{"id": "my\\"value"}'):id::STRING AS ID;
+----------+
| ID       |
|----------|
| my"value |
+----------+

I hope above explanation can help anyone with question on why we need so many backslashes to pass into PARSE_JSON function.

Any questions or comments, please post below and let me know.

Leave a Reply

Your email address will not be published. Required fields are marked *