Cover photo by max_776 from Adobe Stock
Snowflake data platform is a popular data warehouse that separates storage, or the data itself, from compute, or the processing of that data. The core idea behind Snowflake is its Structured Query Language (SQL) query engine, a powerful tool that has much better performance for data analytics than running queries against your traditional (transactional) SQL database.
In order to manipulate and analyze your data using Snowflake’s SQL query engine, you need to allocate Snowflake warehouses. These virtual warehouses are like virtual machines that you set up in advance to handle your data analytics needs.
Any SQL operation, whether a “query” to fetch data or some type of “manipulation” to change data, requires a Snowflake warehouse to be running in order to work.
“A [Snowflake] warehouse must be started/running to process SQL statements submitted in a session.” —Snowflake Docs
While you have a Snowflake warehouse running, you’re able to:
Any of these data warehouse actions requires computing resources, such as CPU time, memory, and temporary storage. These “compute resources” (in Snowflake terminology) are governed by Snowflake’s warehouses, based on the size you’ve preconfigured for them.
Let’s look at some examples of the database operations that require virtual warehouses in Snowflake. If you’re familiar with Structured Query Language (SQL), these should look familiar.
For example, you may need to update rows in tables (with a DELETE, INSERT, or UPDATE command), load data into tables (with a COPY INTO <table> command), or unload data from tables (with a COPY INTO <location> command).
These are called data manipulation language (DML) actions, meaning actions that change, mutate, or manipulate your database tables. Any of these DML actions will require a warehouse to be set up and running in your Snowflake account.
Of course, the bulk of your analytics queries will be slicing and dicing your data in different ways, whether you’re trying to gather business intelligence or provide in-product data insights.
Snowflake also handles those operations (SQL SELECT statements) using virtual machines, which are abstracted into the concept of Snowflake’s virtual warehouses. A larger Snowflake warehouse doesn’t mean more storage; it means better performance for queries.
Querying data using SQL SELECT statements is where Snowflake shines. While you need a Snowflake warehouse running to run SELECT statements, the performance will be much better than you would get running the same SELECT statements on a traditional SQL database.
Traditional SQL databases are “transactional,” meaning they’re optimized for fast performance with simple DML operations like adding a new record (or transaction). In comparison, Snowflake is much faster when running SELECT statements, thanks to its proprietary SQL query engine.
At the end of the day, you’ll need to find some way to pay for Snowflake, and you do that via “credits” that keep your Snowflake warehouses running. And, when needed, you can pay for performance by choosing larger warehouses, meaning you’ll have more CPU power available.
However, it doesn’t always make financial sense to pay for a larger virtual warehouse, since you’ll be paying for Snowflake whether or not you’re actively running queries in the warehouse. As a rule of thumb, customers aren’t willing to wait more than 5-10 seconds for analytics, but a data engineer may be comfortable with a 5-10 minute query time for business intelligence data.
If scaling your virtual warehouses manually sounds like a hassle requiring new hires, and you’d rather build in-product analytics using the team you already have, then you should consider building with Propel Data. Our data analytics platform provides easy-to-use GraphQL APIs that let you build in-product analytics in minutes, not months, and you only pay for actual usage.
Snowflake’s virtual warehouses are computing clusters that process the data analytics commands you run on Snowflake data analytics platform.
In Snowflake, you allocate “virtual warehouses” (computing clusters) to execute the SQL database commands that you run on the data platform.