File Pruning for External Tables
Currently Snowflake does not support pruning External Table files. To prove it, I have performed below test.
Assume that I have an external S3 stage setup correctly, which points to an external S3 location with a few CSV files, I create below external table:
CREATE OR REPLACE EXTERNAL TABLE EXT_TABLE_TEST ( V VARCHAR AS (VALUE:C1::VARCHAR) ) WITH LOCATION = @ERICLIN_S3_STAGE/test/ AUTO_REFRESH = TRUE REFRESH_ON_CREATE = TRUE FILE_FORMAT = ( TYPE=CSV COMPRESSION=NONE REPLACE_INVALID_CHARACTERS = TRUE FIELD_DELIMITER = '|' ESCAPE_UNENCLOSED_FIELD = '\\' ;
The stage @ERICLIN_S3_STAGE/test contains a list of CSV files
And each file contains a few lines of data, whose content is not relevant here, so I will ignore them for now.
Now, if I run below query
SELECT METADATA$FILENAME, * FROM EXT_TABLE_TEST WHERE METADATA$FILENAME ILIKE '%TEST5%';
You can see that I only tried to get data from CSV file test5.csv, but if you examine the query profile, you can see that all files were scanned.
In theory I would expect only one file (named partition in Snowflake) to be scanned.
This is because currently (at the time of writing, Snowflake is at version 5.3.1) the metadata$filename information is stored at Snowflake’s processing layer (XP), so while compiler performs the compilation at Global Services (GS) layer, it has no knowledge of the files the external table has, hence unable to prune the files/partitions.
For now, if files/partitions pruning is required for external tables, the tables need to be partitioned correctly, and then partition pruning will happen as normal, just like other tables, as the column information is stored in the GS level.
For details on how to partition Snowflake’s external tables, please refer to below official doc: