Create View Automatically From VARIANT Columns

Recently I was helping one of the StackOverflow users to resolve one of his/her problems. The question was whether Snowflake has any way of automatically creating views on top of a table that has a variant data type, so that users can access data inside the variant just like a normal column.

I do not think any SQL engines that have such a feature, but I think using a Stored Procedure can easily achieve this goal.

I spent some time going through the logic and came out with the below Stored Procedure. It is not perfect, needs to add some validation logic etc, but it does the trick.

The code is as below:

1. Firstly, create the table and have some sample data for testing:

create or replace table test (
  col1 int, 
  col2 string, 
  data1 variant, 
  data2 variant
);

insert into test select 1,2, parse_json(
   '{"URL": "test", "Icon": "test1", "Facebook": "http://www.facebook.com"}'
), parse_json(
   '{"k1": "test", "k2": "test1", "k3": "http://www.facebook.com"}'
);

insert into test select 3,4,parse_json(
   '{"URL": "test", "Icon": "test1", "Twitter": "http://www.twitter.com"}'
), parse_json(
   '{"k4": "v4", "k3": "http://www.ericlin.me"}'
);

The main Stored Procedure logic is as below:

create or replace procedure create_view(
    table_name varchar
)
returns string
language javascript
as
$$
  var final_columns = [];
  
  // first, find out the columns
  var query = `SHOW COLUMNS IN TABLE ${TABLE_NAME}`;
  var stmt = snowflake.createStatement({sqlText: query});
  var result = stmt.execute();
  
  var variant_columns = [];
  
  while (result.next()) {
    var col_name = result.getColumnValue(3);
    var data_type = JSON.parse(result.getColumnValue(4));

    // just use it if it is not a VARIANT type
    // if it is variant type, we need to remember this column
    // and then run query against it later
    if (data_type["type"] != "VARIANT") {
      final_columns.push(col_name);
    } else {
      variant_columns.push(col_name);
    }
  }

  var columns = {};
  query = `SELECT ` + variant_columns.join(', ') + ` FROM ${TABLE_NAME}`;
  stmt = snowflake.createStatement({sqlText: query});
  result = stmt.execute();

  while (result.next()) {
      for(i=1; i<=variant_columns.length; i++) {
        var sub_result = result.getColumnValue(i);
        if(!sub_result) {
          continue;
        }

        var keys = Object.keys(sub_result);

        for(j=0; j<keys.length; j++) {
          columns[variant_columns[i-1] + ":" + keys[j]] = keys[j];
        }
      }
  }

  for(path in columns) {
    final_columns.push(path + "::STRING AS " + columns[path]);
  }

  var create_view_sql = "CREATE OR REPLACE VIEW " + 
    TABLE_NAME + "_VIEW\n" +   
    "AS SELECT " + "\n" +
    "  " + final_columns.join(",\n  ") + "\n" +
    "FROM " + TABLE_NAME + ";";
    
  snowflake.execute({sqlText: create_view_sql});
  return create_view_sql + "\n\nVIEW created successfully.";
$$;

Then execute the Stored Procedure will produce the below output, at the same time, the TEST_VIEW will be created:

call create_view('TEST');

+---------------------------------------+
| CREATE_VIEW                           |
|---------------------------------------|
| CREATE OR REPLACE VIEW TEST_VIEW      |
| AS SELECT                             |
|   COL1,                               |
|   COL2,                               |
|   DATA1:Facebook::STRING AS Facebook, |
|   DATA1:Icon::STRING AS Icon,         |
|   DATA1:URL::STRING AS URL,           |
|   DATA2:k1::STRING AS k1,             |
|   DATA2:k2::STRING AS k2,             |
|   DATA2:k3::STRING AS k3,             |
|   DATA1:Twitter::STRING AS Twitter,   |
|   DATA2:k4::STRING AS k4              |
| FROM TEST;                            |
|                                       |
| VIEW created successfully.            |
+---------------------------------------+

Finally, verify the TEST_VIEW:

SELECT * FROM TEST_VIEW;
+------+------+-------------------------+-------+------+------+-------+-------------------------+------------------------+------+
| COL1 | COL2 | FACEBOOK                | ICON  | URL  | K1   | K2    | K3                      | TWITTER                | K4   |
|------+------+-------------------------+-------+------+------+-------+-------------------------+------------------------+------|
|    1 | 2    | http://www.facebook.com | test1 | test | test | test1 | http://www.facebook.com | NULL                   | NULL |
|    3 | 4    | NULL                    | test1 | test | NULL | NULL  | http://www.ericlin.me   | http://www.twitter.com | v4   |
+------+------+-------------------------+-------+------+------+-------+-------------------------+------------------------+------+

I am sure there are improvements we can have in the SP’s logic, like adding error handling, data validation etc, but it does the trick and creates a view for you.

Of course, if the dataset is large, it can take a while as we are going through row by row to extract JSON data keys. If you have better idea, please share in the comment below.

Loading

Leave a Reply

Your email address will not be published.