Tables
We are all aware of permanent database tables, a typical object of any database. We aren’t discussing them here.
Let’s have some lights thrown on other types of tables — Transient and Temporary.
Temporary Table
A temporary table lives only for a single session. They are not visible to other users and sessions. Once the session ends, the table is purged.
Cost
Snowflake charges for the storage of Temporary tables for the duration of their existence.
Naming
A temporary table can have the same name as a permanent table in a session and it will take precedence over the permanent table.
All the queries in a session only affect the temporary table if a permanent table exists of the same name.
Conversion
A temporary table cannot be converted to another type once created.
Transient Table
Almost a permanent table with a difference of a zero fail-safe period. This type of table is also not purged after post session is ended. This kind of table is preferred when data protection or recovery is not required of the same standards as that of a permanent table.
Cost
Snowflake charges for the storage of Transient tables for the duration of their existence.
Comparison of Table Type (Snowflake, n.d.)
The main difference lies in the Time Travel property and Fail-Safe Period.
External Table
Data in the external table is stored in external stages (on cloud storage). Metadata is stored for files in the external table which enables the querying of the data stored in files.
External Table contains the following columns,
Partitions in External Table
Partitioning is highly recommended, as it improves query performance significantly. Partitioning the source files into logical paths such as date, country, etc.
Example: In S3 Bucket, folders are partitioned into ‘year_month’ buckets.
This path then can be used in snowflake external table creation.
External Table Can be Partitioned in two ways,
1. Automatically
2. Manually
Automatic Partition
Defining External Table using ‘...PARTITION BY(<part_column_name>)..’ Clause. The partition will have all the files segregated using the path defined in the clause using METADATA$FILENAME.
The object owner can configure the metadata automatically or manually using the ALTER … REFRESH command.
To better understand the automatic partition, remember the following points,
1. Partition by clause will have a partition column name defined in the table definition which might be an expression that extracts a certain part of the path in the cloud storage.
2. It can be refreshed manually using ALTER Command.
Example: if we use automatic partitioning in the above image (of S3 bucket), the query should look like,
Manual Partition
Use this option when you are unsure of the partitions in the source files. You expect that Path (or partitions) can change in the future or need to be dynamic. Additional action will be needed in the creation of this type of external table — ‘PARTITION_TYPE = USER_SPECIFIED’.
The object owner needs to add partitions as per need using the ALTER … ADD PARTITION () LOCATION ‘<path>’.
Automatically refreshing this type of table is NOT supported. Manually as well it will throw an error.
When to use this type (Manual) of partitioning,
1. You are aware of the exact path (or the value to have the partition upon) on the cloud.
2. You want to add partitions dynamically as per need.
Example: For the same S3 Bucket, another folder is defined (test_folder).
As it is evident from the results, the files in the folder — ‘test_folder/2022_December/’ are the only ones that got registered. Had there been another folder ‘2023_January’, it wouldn’t be scanned, enhancing the performance as less number of files were scrutinized.
Note: This is the Sixth one in a series of articles. Click here for the previous one. Click here for the next one.
References
Ref 1: https://pixabay.com/vectors/database-schema-data-tables-schema-1895779/
Ref 2: Snowflake. (n.d.). Retrieved from https://docs.snowflake.com/en/user-guide/tables-temp-transient.html#comparison-of-table-types