27 November 2020

How to Paginate SHOW TABLES Result

By Eric Lin

Pagination on query results is straightforward enough, and it is so common that most of SQL developers would need to do it on weekly, if not on daily basis. Paginate the resultset of SHOW TABLES query however, is not as common and might not be known to most developers, because of the different syntax, plus its different behaviour than other query resultsets. Let me demonstrate on what I mean below.

To start with, let’s see the syntax that Snowflake supports pagination for SHOW TABLES. Please refer to Snowflake’s official doc:

Show Tables Parameters

Basically we need to use the syntax of “LIMIT rows [ FROM ‘name_string‘ ]“. Please see example below:

1. I have 21 tables in my environment:

SHOW TERSE TABLES IN SCHEMA TEST;

+-------------------------------+------------------+-------+---------------+-------------+
| created_on                    | name             | kind  | database_name | schema_name |
|-------------------------------+------------------+-------+---------------+-------------|
| 2020-09-30 02:58:48.868 +0000 | CLUSTERED_TABLE  | TABLE | ERICLIN       | TEST        |
| 2020-11-12 23:24:02.177 +0000 | CLUSTER_TEST     | TABLE | ERICLIN       | TEST        |
| 2020-10-28 23:40:50.751 +0000 | CONTEXT_FIELDS   | TABLE | ERICLIN       | TEST        |
| 2020-11-18 05:14:32.910 +0000 | DEMONSTRATION1   | TABLE | ERICLIN       | TEST        |
| 2020-09-18 01:15:22.424 +0000 | DEPARTMENTS      | TABLE | ERICLIN       | TEST        |
| 2020-11-14 23:14:02.059 +0000 | EMPLOYEES        | TABLE | ERICLIN       | TEST        |
| 2020-11-12 09:37:55.084 +0000 | ERIC_EMPLOYEES   | TABLE | ERICLIN       | TEST        |
| 2020-11-12 21:21:29.722 +0000 | GET_DDL_TEST     | TABLE | ERICLIN       | TEST        |
| 2020-10-28 23:37:02.362 +0000 | JOBS             | TABLE | ERICLIN       | TEST        |
| 2020-10-28 23:39:00.601 +0000 | JOB_OUTPUTS      | TABLE | ERICLIN       | TEST        |
| 2020-11-11 23:00:19.135 +0000 | MANAGERS         | TABLE | ERICLIN       | TEST        |
| 2020-09-21 01:43:22.606 +0000 | PRODUCTS         | TABLE | ERICLIN       | TEST        |
| 2020-11-15 23:00:26.996 +0000 | RECURSIVE_TEST   | TABLE | ERICLIN       | TEST        |
| 2020-09-21 01:43:24.049 +0000 | SALES            | TABLE | ERICLIN       | TEST        |
| 2020-10-05 01:32:14.474 +0000 | SRC              | TABLE | ERICLIN       | TEST        |
| 2020-11-18 23:47:33.471 +0000 | T2               | TABLE | ERICLIN       | TEST        |
| 2020-10-05 01:32:23.003 +0000 | TARGET           | TABLE | ERICLIN       | TEST        |
| 2020-11-13 01:45:13.306 +0000 | TEST             | TABLE | ERICLIN       | TEST        |
| 2020-11-12 04:49:17.944 +0000 | TEST1            | TABLE | ERICLIN       | TEST        |
| 2020-10-30 02:16:21.148 +0000 | TEST2            | TABLE | ERICLIN       | TEST        |
| 2020-11-16 00:10:05.995 +0000 | TRANSACTION_TEST | TABLE | ERICLIN       | TEST        |
+-------------------------------+------------------+-------+---------------+-------------+

2. I can get the first 5 rows first:

SHOW TERSE TABLES LIMIT 5;

+-------------------------------+-----------------+-------+---------------+-------------+
| created_on                    | name            | kind  | database_name | schema_name |
|-------------------------------+-----------------+-------+---------------+-------------|
| 2020-09-30 02:58:48.868 +0000 | CLUSTERED_TABLE | TABLE | ERICLIN       | TEST        |
| 2020-11-12 23:24:02.177 +0000 | CLUSTER_TEST    | TABLE | ERICLIN       | TEST        |
| 2020-10-28 23:40:50.751 +0000 | CONTEXT_FIELDS  | TABLE | ERICLIN       | TEST        |
| 2020-11-18 05:14:32.910 +0000 | DEMONSTRATION1  | TABLE | ERICLIN       | TEST        |
| 2020-09-18 01:15:22.424 +0000 | DEPARTMENTS     | TABLE | ERICLIN       | TEST        |
+-------------------------------+-----------------+-------+---------------+-------------+

3. I then need to record the last table returned, which was “DEPARTMENTS”, then I can fetch page 2 data:

SHOW TERSE TABLES LIMIT 5 FROM 'DEPARTMENTS';

+-------------------------------+----------------+-------+---------------+-------------+
| created_on                    | name           | kind  | database_name | schema_name |
|-------------------------------+----------------+-------+---------------+-------------|
| 2020-11-14 23:14:02.059 +0000 | EMPLOYEES      | TABLE | ERICLIN       | TEST        |
| 2020-11-12 09:37:55.084 +0000 | ERIC_EMPLOYEES | TABLE | ERICLIN       | TEST        |
| 2020-11-12 21:21:29.722 +0000 | GET_DDL_TEST   | TABLE | ERICLIN       | TEST        |
| 2020-10-28 23:37:02.362 +0000 | JOBS           | TABLE | ERICLIN       | TEST        |
| 2020-10-28 23:39:00.601 +0000 | JOB_OUTPUTS    | TABLE | ERICLIN       | TEST        |
+-------------------------------+----------------+-------+---------------+-------------+

4. then I can use “JOB_OUTPUTS” from last row of the previous result and get page 3 data:

SHOW TERSE TABLES LIMIT 5 FROM 'JOB_OUTPUTS';

+-------------------------------+----------------+-------+---------------+-------------+
| created_on                    | name           | kind  | database_name | schema_name |
|-------------------------------+----------------+-------+---------------+-------------|
| 2020-11-11 23:00:19.135 +0000 | MANAGERS       | TABLE | ERICLIN       | TEST        |
| 2020-09-21 01:43:22.606 +0000 | PRODUCTS       | TABLE | ERICLIN       | TEST        |
| 2020-11-15 23:00:26.996 +0000 | RECURSIVE_TEST | TABLE | ERICLIN       | TEST        |
| 2020-09-21 01:43:24.049 +0000 | SALES          | TABLE | ERICLIN       | TEST        |
| 2020-10-05 01:32:14.474 +0000 | SRC            | TABLE | ERICLIN       | TEST        |
+-------------------------------+----------------+-------+---------------+-------------+

5. I just keep going until reached the last page:

SHOW TERSE TABLES LIMIT 5 FROM 'TEST2';

+-------------------------------+------------------+-------+---------------+-------------+
| created_on                    | name             | kind  | database_name | schema_name |
|-------------------------------+------------------+-------+---------------+-------------|
| 2020-11-16 00:10:05.995 +0000 | TRANSACTION_TEST | TABLE | ERICLIN       | TEST        |
+-------------------------------+------------------+-------+---------------+-------------+

That’s pretty simple, right? But there are catches. If you search under certain database, or under an account, where the result will contain tables from multiple schemas, the pagination result will break. Let me show you an example below.

Firstly I will setup an environment for this purpose:

CREATE DATABASE SHOW_TABLES_TEST;
USE DATABASE SHOW_TABLES_TEST;

CREATE SCHEMA SHOW_TABLES_TEST1;
USE SCHEMA SHOW_TABLES_TEST1;

CREATE TABLE a (a int);
CREATE TABLE b (a int);
CREATE TABLE c (a int);
CREATE TABLE d (a int);
CREATE TABLE e (a int);
CREATE TABLE f (a int);

CREATE SCHEMA SHOW_TABLES_TEST2;
USE SCHEMA SHOW_TABLES_TEST2;

CREATE TABLE a (a int);
CREATE TABLE b (a int);
CREATE TABLE c (a int);
CREATE TABLE d (a int);
CREATE TABLE e (a int);
CREATE TABLE f (a int);

Above queries will create a database named SHOW_TABLES_TEST, with two schemas and multiple tables underneath each. Below is the SHOW TABLES result in this newly created database:

SHOW TERSE TABLES IN DATABASE SHOW_TABLES_TEST;

+-------------------------------+------+-------+------------------+-------------------+
| created_on                    | name | kind  | database_name    | schema_name       |
|-------------------------------+------+-------+------------------+-------------------|
| 2020-11-26 07:38:07.664 +0000 | A    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:08.210 +0000 | B    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:08.717 +0000 | C    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:09.245 +0000 | D    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:09.771 +0000 | E    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:10.755 +0000 | F    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:12.063 +0000 | A    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
| 2020-11-26 07:38:12.593 +0000 | B    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
| 2020-11-26 07:38:13.095 +0000 | C    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
| 2020-11-26 07:38:13.654 +0000 | D    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
| 2020-11-26 07:38:14.202 +0000 | E    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
| 2020-11-26 07:38:14.797 +0000 | F    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
+-------------------------------+------+-------+------------------+-------------------+

Now, let’s get the first page with 4 rows:

SHOW TERSE TABLES IN DATABASE SHOW_TABLES_TEST LIMIT 4;

+-------------------------------+------+-------+------------------+-------------------+
| created_on                    | name | kind  | database_name    | schema_name       |
|-------------------------------+------+-------+------------------+-------------------|
| 2020-11-26 07:38:07.664 +0000 | A    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:08.210 +0000 | B    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:08.717 +0000 | C    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:09.245 +0000 | D    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
+-------------------------------+------+-------+------------------+-------------------+

So far so good, it returns correct 4 rows for page 1. To get the rows for page 2, we will start with table “D”:

SHOW TERSE TABLES IN DATABASE SHOW_TABLES_TEST LIMIT 4 FROM 'D';

+-------------------------------+------+-------+------------------+-------------------+
| created_on                    | name | kind  | database_name    | schema_name       |
|-------------------------------+------+-------+------------------+-------------------|
| 2020-11-26 07:38:09.771 +0000 | E    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:10.755 +0000 | F    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:14.202 +0000 | E    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
| 2020-11-26 07:38:14.797 +0000 | F    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
+-------------------------------+------+-------+------------------+-------------------+

Can you spot the issue here? If you look at the full resultset above when we have no LIMIT and FROM clause, the tables were ordered by schema names and then table names. So in theory, we should get below rows as the second page:

+-------------------------------+------+-------+------------------+-------------------+
| created_on                    | name | kind  | database_name    | schema_name       |
|-------------------------------+------+-------+------------------+-------------------|
| 2020-11-26 07:38:09.771 +0000 | E    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:10.755 +0000 | F    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST1 |
| 2020-11-26 07:38:12.063 +0000 | A    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
| 2020-11-26 07:38:12.593 +0000 | B    | TABLE | SHOW_TABLES_TEST | SHOW_TABLES_TEST2 |
+-------------------------------+------+-------+------------------+-------------------+

But instead, we only got tables “E” and “F” from schemas SHOW_TABLES_TEST1 and SHOW_TABLES_TEST2. Why? It is because of FROM “D”, this clause also filters out all tables in ALL schemas that are before (including) table “D”. This is clearly not we wanted.

Since currently there is no way to only ORDER the result of SHOW TABLES by table name, rather than the current default order of schema name and table name. The ability of performing pagination in such a case won’t work.

To workaround the issue, the better approach is to query the SNOWFLAKE.ACCOUNT_USAGE.TABLES VIEW provided by Snowflake, where you can order by any way you want and perform paginations freely. See my example below:

-- this can take a while
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES ; 

-- remember the query ID
SET show_table_result = last_query_id(); 

-- Page 1
SELECT
  $2 AS TABLE_NAME,
  $4 AS SCHEMA_NAME,
  $6 AS DATABASE_NAME,
  $7 AS OWNER_ROLE,
  $8 AS TYPE,
  $9 AS IS_TRANSIENT,
  $24 AS DELETED_ON
FROM 
TABLE(RESULT_SCAN($show_table_result))
WHERE  
    1=1
    AND DELETED_ON IS NULL
    AND TYPE = 'BASE TABLE'
ORDER BY TABLE_NAME
LIMIT 100;

-- Page 2
SELECT
  $2 AS TABLE_NAME,
  $4 AS SCHEMA_NAME,
  $6 AS DATABASE_NAME,
  $7 AS OWNER_ROLE,
  $8 AS TYPE,
  $9 AS IS_TRANSIENT,
  $24 AS DELETED_ON
FROM 
TABLE(RESULT_SCAN($show_table_result))
WHERE  
    1=1
    AND DELETED_ON IS NULL
    AND TYPE = 'BASE TABLE'
ORDER BY TABLE_NAME
LIMIT 100 OFFSET 100;

-- Page 3
SELECT
  $2 AS TABLE_NAME,
  $4 AS SCHEMA_NAME,
  $6 AS DATABASE_NAME,
  $7 AS OWNER_ROLE,
  $8 AS TYPE,
  $9 AS IS_TRANSIENT,
  $24 AS DELETED_ON
FROM 
TABLE(RESULT_SCAN($show_table_result))
WHERE  
    1=1
    AND DELETED_ON IS NULL
    AND TYPE = 'BASE TABLE'
ORDER BY TABLE_NAME
LIMIT 100 OFFSET 200;

This way the result is much cleaner. Couple of notes here:

  • The query above will scan ALL tables under an account, but you can add conditions as needed to narrow down resultset to certain database or schema.
  • SNOWFLAKE.ACCOUNT_USAGE.TABLES is a Snowflake internal VIEW, which queries backend table that stores all tables that an account has. So if there are many tables, this query might take a while. Unlike SHOW TABLES, which only operate at METADATA level.
  • Since the query can take some time to run, it will cost your credit when you paginate through each page. This is the reason why I stored query ID in a variable and use RESULT_SCAN function to get the results from cache instead. This avoids the unnecessary usage of Warehouse.
  • The VIEW will also return DELETED tables, whose deletion time are marked on column DELETED_ON. That’s why I have condition DELETED_ON IS NULL in the query to filter out all deleted tables.

If you know other better ways to paginate the full list of tables in Snowflake, please add comments below and let us know.