How to Track Table Ownership Change

We know that it is important to keep track of table ownerships to understand who created and owned the tables in the system. Snowflake keeps such records in multiple places. So depending on your needs, the methods might be different. In this post, I will show you 2 ways that you can get such information and the pros and cons of each.

Method 1: SNOWFLAKE.ACCOUNT_USAGE.TABLES

The TABLES table under ACCOUNT_USAGE schema of SNOWFLAKE database contains information about each table, include their database, schema, table type, row count, bytes etc, see details below:

DESC TABLE SNOWFLAKE.ACCOUNT_USAGE.TABLES;

+------------------------------+-------------------+-...-+
| name                         | type              | ... |
|------------------------------+-------------------+-...-|
| TABLE_ID                     | NUMBER(38,0)      | ... |
| TABLE_NAME                   | VARCHAR(16777216) | ... |
| TABLE_SCHEMA_ID              | NUMBER(38,0)      | ... |
| TABLE_SCHEMA                 | VARCHAR(16777216) | ... |
| TABLE_CATALOG_ID             | NUMBER(38,0)      | ... |
| TABLE_CATALOG                | VARCHAR(16777216) | ... |
| TABLE_OWNER                  | VARCHAR(16777216) | ... |
| TABLE_TYPE                   | VARCHAR(16777216) | ... |
| IS_TRANSIENT                 | VARCHAR(3)        | ... |
| CLUSTERING_KEY               | VARCHAR(16777216) | ... |
| ROW_COUNT                    | NUMBER(38,0)      | ... |
| BYTES                        | NUMBER(38,0)      | ... |
| RETENTION_TIME               | NUMBER(38,0)      | ... |
| SELF_REFERENCING_COLUMN_NAME | VARCHAR(16777216) | ... |
| REFERENCE_GENERATION         | VARCHAR(16777216) | ... |
| USER_DEFINED_TYPE_CATALOG    | VARCHAR(16777216) | ... |
| USER_DEFINED_TYPE_SCHEMA     | VARCHAR(16777216) | ... |
| USER_DEFINED_TYPE_NAME       | VARCHAR(16777216) | ... |
| IS_INSERTABLE_INTO           | VARCHAR(3)        | ... |
| IS_TYPED                     | VARCHAR(3)        | ... |
| COMMIT_ACTION                | VARCHAR(16777216) | ... |
| CREATED                      | TIMESTAMP_LTZ(6)  | ... |
| LAST_ALTERED                 | TIMESTAMP_LTZ(6)  | ... |
| DELETED                      | TIMESTAMP_LTZ(6)  | ... |
| AUTO_CLUSTERING_ON           | VARCHAR(3)        | ... |
| COMMENT                      | VARCHAR(16777216) | ... |
+------------------------------+-------------------+-...-+

You can see that we have TABLE_OWNER column, which tell us who owns those tables. However, please keep in mind that the owner here is the ROLE, not the actual end USER.

I have performed a test and you can see the result below:

CREATE OR REPLACE TABLE OWNER_TEST (A INT);

SELECT 
	TABLE_NAME,
	TABLE_SCHEMA AS SCHEMA_NAME,
	TABLE_CATALOG AS DATABASE_NAME,
	TABLE_OWNER,
	DELETED
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES
WHERE 
-- make sure both DB and TABLE name needs to be in upper case
	DATABASE_NAME = 'ERICLIN' 
	AND TABLE_NAME = 'OWNER_TEST'
;

+------------+-------------+---------------+--------------+---------+
| TABLE_NAME | SCHEMA_NAME | DATABASE_NAME | TABLE_OWNER  | DELETED |
|------------+-------------+---------------+--------------+---------|
| OWNER_TEST | TEST        | ERICLIN       | ACCOUNTADMIN | NULL    |
+------------+-------------+---------------+--------------+---------+

The DELETED column marks the time when table was DROPPED, so if it is NOT NULL, it mean the table was dropped. In my case it is NULL, as I just created it. You can see that the TABLE_OWNER is ACCOUNTADMIN, not my username of ERICADMIN.

The downside with this approach is:

  • SNOWFLAKE.ACCOUNT_USAGE.TABLES has 45 minutes to 3 hours latency, so you might not be able to see result straight away
  • As mentioned, the owner recorded is ROLE, not USERNAME

The benefits of this approach:

  • The query should run faster, as it only records table related info

Method 2: SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY

The QUERY_HISTORY table records the query that each user runs, including query ID, query string, the context of DB and Schema name when query was run, WH info, time it took etc etc. There are lots of useful query centric information. Please see full columns below:

DESC TABLE SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY;

+----------------------------------------+-------------------+....+
| name                                   | type              |....|
|----------------------------------------+-------------------+....|
| QUERY_ID                               | VARCHAR(16777216) |....|
| QUERY_TEXT                             | VARCHAR(16777216) |....|
| DATABASE_ID                            | NUMBER(38,0)      |....|
| DATABASE_NAME                          | VARCHAR(16777216) |....|
| SCHEMA_ID                              | NUMBER(38,0)      |....|
| SCHEMA_NAME                            | VARCHAR(16777216) |....|
| QUERY_TYPE                             | VARCHAR(16777216) |....|
| SESSION_ID                             | NUMBER(38,0)      |....|
| USER_NAME                              | VARCHAR(16777216) |....|
| ROLE_NAME                              | VARCHAR(16777216) |....|
| WAREHOUSE_ID                           | NUMBER(38,0)      |....|
| WAREHOUSE_NAME                         | VARCHAR(16777216) |....|
| WAREHOUSE_SIZE                         | VARCHAR(16777216) |....|
| WAREHOUSE_TYPE                         | VARCHAR(16777216) |....|
| CLUSTER_NUMBER                         | NUMBER(38,0)      |....|
| QUERY_TAG                              | VARCHAR(16777216) |....|
| EXECUTION_STATUS                       | VARCHAR(16777216) |....|
| ERROR_CODE                             | VARCHAR(16777216) |....|
| ERROR_MESSAGE                          | VARCHAR(16777216) |....|
| START_TIME                             | TIMESTAMP_LTZ(6)  |....|
| END_TIME                               | TIMESTAMP_LTZ(6)  |....|
| TOTAL_ELAPSED_TIME                     | NUMBER(38,0)      |....|
| BYTES_SCANNED                          | NUMBER(38,0)      |....|
| PERCENTAGE_SCANNED_FROM_CACHE          | FLOAT             |....|
| BYTES_WRITTEN                          | NUMBER(38,0)      |....|
| BYTES_WRITTEN_TO_RESULT                | NUMBER(38,0)      |....|
| BYTES_READ_FROM_RESULT                 | NUMBER(38,0)      |....|
| ROWS_PRODUCED                          | NUMBER(38,0)      |....|
| ROWS_INSERTED                          | NUMBER(38,0)      |....|
| ROWS_UPDATED                           | NUMBER(38,0)      |....|
| ROWS_DELETED                           | NUMBER(38,0)      |....|
| ROWS_UNLOADED                          | NUMBER(38,0)      |....|
| BYTES_DELETED                          | NUMBER(38,0)      |....|
| PARTITIONS_SCANNED                     | NUMBER(38,0)      |....|
| PARTITIONS_TOTAL                       | NUMBER(38,0)      |....|
| BYTES_SPILLED_TO_LOCAL_STORAGE         | NUMBER(38,0)      |....|
| BYTES_SPILLED_TO_REMOTE_STORAGE        | NUMBER(38,0)      |....|
| BYTES_SENT_OVER_THE_NETWORK            | NUMBER(38,0)      |....|
| COMPILATION_TIME                       | NUMBER(38,0)      |....|
| EXECUTION_TIME                         | NUMBER(38,0)      |....|
| QUEUED_PROVISIONING_TIME               | NUMBER(38,0)      |....|
| QUEUED_REPAIR_TIME                     | NUMBER(38,0)      |....|
| QUEUED_OVERLOAD_TIME                   | NUMBER(38,0)      |....|
| TRANSACTION_BLOCKED_TIME               | NUMBER(38,0)      |....|
| OUTBOUND_DATA_TRANSFER_CLOUD           | VARCHAR(16777216) |....|
| OUTBOUND_DATA_TRANSFER_REGION          | VARCHAR(16777216) |....|
| OUTBOUND_DATA_TRANSFER_BYTES           | NUMBER(38,0)      |....|
| INBOUND_DATA_TRANSFER_CLOUD            | VARCHAR(16777216) |....|
| INBOUND_DATA_TRANSFER_REGION           | VARCHAR(16777216) |....|
| INBOUND_DATA_TRANSFER_BYTES            | NUMBER(38,0)      |....|
| LIST_EXTERNAL_FILES_TIME               | NUMBER(38,0)      |....|
| CREDITS_USED_CLOUD_SERVICES            | FLOAT             |....|
| RELEASE_VERSION                        | VARCHAR(16777216) |....|
| EXTERNAL_FUNCTION_TOTAL_INVOCATIONS    | NUMBER(38,0)      |....|
| EXTERNAL_FUNCTION_TOTAL_SENT_ROWS      | NUMBER(38,0)      |....|
| EXTERNAL_FUNCTION_TOTAL_RECEIVED_ROWS  | NUMBER(38,0)      |....|
| EXTERNAL_FUNCTION_TOTAL_SENT_BYTES     | NUMBER(38,0)      |....|
| EXTERNAL_FUNCTION_TOTAL_RECEIVED_BYTES | NUMBER(38,0)      |....|
| QUERY_LOAD_PERCENT                     | NUMBER(38,0)      |....|
| IS_CLIENT_GENERATED_STATEMENT          | BOOLEAN           |....|
+----------------------------------------+-------------------+....+

So we can query this VIEW to find out which user ran “CREATE TABLE” statement for a given table at which time. Query as below:

SELECT
  USER_NAME, ROLE_NAME, START_TIME, QUERY_TEXT
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE
  START_TIME > '2020-11-20 00:00:00'
  AND QUERY_TEXT ILIKE '%CREATE%TABLE%OWNER_TEST%'
;

+-----------+--------------+-------------------------------+----------------------------------------------------+
| USER_NAME | ROLE_NAME    | START_TIME                    | QUERY_TEXT                                         |
|-----------+--------------+-------------------------------+----------------------------------------------------|
| ERICADMIN | ACCOUNTADMIN | 2020-11-25 22:52:10.110 +0000 | create or replace table owner_test (a int);        |
+-----------+--------------+-------------------------------+----------------------------------------------------+

We should add time range to limit the search, otherwise the query will perform full table scan, it might take sometime and consume your credit. Also replace the table name with yours, in my case is OWNER_TEST. The query basically searches for users who ran “CREATE TABLE” or “CREATE OR REPLACE TABLE” queries against the table in question.

The downside for this approach:

  • Since it is also under ACCOUNT_USAGE, it has the same issue that there is 45 minutes to 3 hours latency
  • It will take longer than method 1 as this VIEW contains all queries ran against the account, which can be a lot
  • You need to add time condition to limit the search, otherwise it can take time

However, this approach does have benefits:

  • You can get USER_NAME as well as ROLE_NAME
  • You can see more details information about a query, like when it was run etc
  • You can also find out who DROPPED the table as well, with a bit of query change

Hope that above information can be useful.

Posted in SQL

Leave a Reply

Your email address will not be published. Required fields are marked *