Introduction
Snowflake is simply defined as a ‘Data Platform as Cloud Service’. The data platform is provided as Software-as-a-Service (SAAS) solution.
In Simple words, if you have data either on three cloud storage providers (AWS, Azure, GCP) or locally, all you have to do is establish a connection with Snowflake, define the database, tables, etc. and you are good to use the analytical capabilities of Snowflake.
Architecture
Traditionally, databases have been segregated into diverse types of architecture, but here we will discuss two — Shared-Disk and Shared-Nothing.
What is Shared Nothing Architecture?
As the name suggests, each entity's data has its own storage, and brain (processing). If you put three laptops on your table together, each laptop shares no resources and works independently (in a simple scenario). Each has its own CPU, Storage, Input/Output ports, etc.
What are the Problems of Shared Nothing Architecture?
An increase in the number of resources adds to the costs of the overall system. Transmitting data across nodes also requires software interaction which also adds up to the cost.
The performance is also hampered if scaling up lacks inconsistent in the cross-communication layers.
What are the Advantages of Shared Nothing Architecture?
Owing to the independent nature of this architecture, scaling up won’t disrupt the entire system.
Due to the existence of multiple nodes, a single point of failure is eliminated. Even if one or two nodes fail, the application won’t be affected.
Makes it easier for upgrades as each node is independently working and is not interdependent.
What is Shared Disk Architecture?
As the name suggests, each node is using the same central disk but with an independent memory of its own. A Photos album on your mobile device can be an example of a shared disk as it is available from one account (on the cloud) but multiple users can use them from their own devices.
What are the Problems of Shared Disk Architecture?
Scalability issues are obvious in this architecture. This architecture is dependent on the token-passing model, where the messages alert the status of other nodes. The more nodes one adds the more time it takes to pass along (Lee, n.d.).
What are the Advantages of Shared Disk Architecture?
The number of CPUs can increase to a high number, adding processing power. The availability of the system with shared disk architecture is very high as if one node fails there are others to take over.
Snowflake’s Architecture
It is quite apparent that in explaining the two architectures, the snowflake is using the best of the two worlds. It is a hybrid of these two architectures. It uses a central data repository for persisted data (Shared disk), accessible from all the nodes. Similar to shared-nothing, it process queries using MPP (Massively Parallel Processing) where each node stores a portion of the data locally (Snowflake, n.d.).
To understand more about the architecture, the snowflake documentation is the best place — Click here.
Advantages of Snowflake Architecture
- Multi-Cloud support — Azure, AWS, and Google Cloud.
- Server Capability — No need to invest in servers since Snowflake provides the capability to Scale out or in as per need.
- Complete SQL database.
- Cost Effective — Snowflake charges you as per the usage. The credit system makes it easier to control the cost and the ability to monitor the resources used adds up to efficient usage of the application.
References
1 — https://nealanalytics.com/blog/snowflake-a-revolutionary-data-warehousing-experience/
2 — Lee, S. (n.d.). https://gvpress.com/journals/IJEIC/vol2_no4/15.pdf
3- Snowflake. (n.d.). https://docs.snowflake.com/en/user-guide/intro-key-concepts.html