13 November 2020

ALTER TABLE … RECLUSTER is Classified as DML, not DDL

By Eric Lin

We all know that ALTER TABLE is a DDL (Data Definition Language), not a DML (Data Manipulation Language), because ALTER TABLE only changes metadata information about a table, like adding a new column, or changing existing column names..

If you are unsure about the differences between DDL and DML, please refer to this page on StackOverflow for details:

What are DDL and DML?

I will copy the chart below for your reference:

However, ALTER TABLE …. RECLUSTER is a bit different from the traditional ALTER TABLE command. What it does is to recluster the table in question based on defined key/keys, so that the underlining data for the table is balanced in each micro-partition. This can greatly help to improve query performance.

This post is not about how CLUSTERing works in Snowflake. If CLUSTERing concept is new to you, I suggest that you refer to below doc first:

Clustering Keys & Clustered Tables

So, what happens behind the scene when user runs ALTER TABLE …. RECLUSTER query in Snowflake? Snowflake will turn the query into INSERT statement, because if you know how micro-partition works in Snowflake, they are immutable, so in order to recluster all the data for a given table, new micro-partitions need to be created. In this case, since we have underlining data changes, it makes sense that this RECLUSTER query should be classified as DML, not DDL.

This is also mentioned in Snowflake’s official doc regarding Manual Reclustering, which is Deprecated feature and replaced by Automatic Reclustering.

Why it is important? Well, this will affect how it works with Transactions in Snowflake.

Consider below two examples:

BEGIN;
ALTER TABLE CLUSTER_TEST ADD COLUMN c2 VARCHAR;
BEGIN;
ALTER TABLE CLUSTER_TEST RECLUSTER;

Notice that I do not have COMMIT statement in the end. Do you know what will happen after we execute above two statement blocks, in separate sessions of course?

Let’s refer to Snowflake’s official documentation on how Transaction works for DDL.

Transactions > DDL

You can see that for DDLs, implicit transactions will be wrapped around them, while DML will not. So in our first example above, since it is a DDL, as it only tries to add a new column to the table, the change will be COMMITTED straightaway. If you open a new session and DESCRIBE the table, you should see the new column, even though we have not manually COMMITTED the transaction that we started.

Let’s check the second example where we tried to RECLUTER the table. After you run the query, open another session and run “SHOW LOCKS”, you will see that the table is locked by the RECLUSTER query that I just ran (01983abc-06ce-2f06-0000-01d106494b26):

+---------------------------+------------+---------------+-------------------------------+---------+-------------------------------+--------------------------------------+
| resource                  | type       |   transaction | transaction_started_on        | status  | acquired_on                   | query_id                             |
|---------------------------+------------+---------------+-------------------------------+---------+-------------------------------+--------------------------------------|
| ERICLIN.TEST.CLUSTER_TEST | PARTITIONS | 1605223450357 | 2020-11-12 23:24:10.357 +0000 | HOLDING | 2020-11-12 23:24:23.830 +0000 | 01983abc-06ce-2f06-0000-01d106494b26 |
+---------------------------+------------+---------------+-------------------------------+---------+-------------------------------+--------------------------------------+

The lock won’t be released unless COMMIT is executed.

I think this might cause some confusion to Snowflake users so I posted this blog and hope this can help to some users.