Snowflake Concepts

What Is a Multi-Cluster Virtual Warehouse in Snowflake Data Platform?

Multi-cluster virtual warehouses in Snowflake automatically scale compute resources up and down to handle fluctuating demand on the data warehouse. Here’s how they work – and how they might save you money compared to using a larger virtual warehouse.

Multi-cluster virtual warehouses in Snowflake are analogous to the server farm pictured, since they allocate additional compute resources compared to a single warehouse or virtual machine.

When Do You Need Multiple Virtual Warehouses in a Snowflake Account?

Let’s imagine your company has a single data analyst who handles everything Snowflake and is running complex database queries on a daily basis. He or she might be fine waiting a few minutes, but not a few hours, so they increase the size of a single virtual warehouse until the query processing speed is correct. In this case, a single Snowflake warehouse works great.

On the other hand, building in-product analytics would be a great example of when you’ll need to account for concurrency. If you offer in-product analytics or insights to your customers, then your customers might request dozens of unique data queries simultaneously. You don’t want any individual customer’s results to be slowed down because of other requests in the queue.

The preferred approach would be to just turn the dial up to 11 and spring for the 6X-Large warehouse, but your budget may not love that idea. At a hefty 512 credits per hour (costing about $1024 per hour at $2 per credit), you’ll need to pay $747,520 per month to keep that behemoth online. And if you only need user concurrency sometimes, not 24/7, you’ll be wasting money.

What Is a Multi-Cluster Virtual Warehouse in Snowflake?

When you’re dealing with complex concurrent requests, you’ll probably want to explore Snowflake’s multi-cluster warehouses, which offer fully-automated scaling for concurrency. Multi-cluster warehouses allow you to make a larger pool of processing power available for running queries, and they are a feature available only in the Enterprise version of Snowflake.

While a standard virtual warehouse has only a single cluster of compute resources, a multi-cluster warehouse can have up to 10 clusters inside of it. Like standard warehouses, there seems to be no limit on the number of multi-cluster warehouses you can have in your Snowflake account, but each multi-cluster warehouse is limited to a maximum of 10 compute clusters.

Multi-cluster warehouses give you the concurrency benefits of multiple warehouses, with the added advantage that Snowflake automatically starts and shuts down running clusters as needed to handle peaks and troughs in demand for user concurrency. In other words, the number of clusters used by the warehouse automatically scales based on the current demand on Snowflake for data analytics. But, you’ll likely need some additional data engineering to connect more than 1 multi-cluster warehouse of 10 compute resources into a single application.

Which Two Modes Are Available for Multi Cluster Warehouses?

There are two modes of operation available for multi-cluster virtual warehouses in Snowflake:

  1. Maximized mode – The minimum and maximum number of clusters are the same, so the multi-cluster virtual warehouse always has the maximum compute resources available.
  2. Auto-scale mode – Snowflake dynamically manages the clusters between the specified minimum and maximum according to the demand on the data warehouse.

The auto-scale mode is obviously better, but it’s also way more complicated. For example, you have to specify a scaling policy, where “Standard” always increases the number of clusters according to demand, while “Economy” only adds a cluster if there’s enough query load to keep the new cluster busy for 6 minutes. As you might guess from the name, the latter saves money, but it can lead to queries still queuing up, even though you are using a multi-cluster warehouse.

Of course, the alternative is complicated as well. To set up multiple individual warehouses manually, you’ll need to choose a size appropriate for each one, based on your own estimates of the CPU load of having multiple users and queries running simultaneously. That’s why it is so useful to have multi-cluster warehouses that auto-scale the amount of compute available up and down as needed, but your company will need to sign up for the Enterprise edition of Snowflake.

Example of When To Use a Multi-Cluster Warehouse in Snowflake

Building any data app with multiple users, especially in-product analytics, is the perfect example of when you might need a multi-cluster virtual warehouse in Snowflake. Let’s say a company builds a leaderboard metric inside its product, which has 1000s of customers. At first, they’re comfortable just resizing a single warehouse to be larger in order to improve slow queries and data loading issues, but then they see the bill. In response, they switch to a multi-cluster virtual warehouse in auto-scale mode, and they get improved performance at lower cost.

At first, the company tries the Standard scaling policy, but spinning up lots of warehouses is still costing them too much money. The customers of their SaaS (software as a service) product are comfortable seeing their leaderboard metrics delayed by 15 minutes, so the company is able to queue up the requests from their serverless functions efficiently. After a few experiments, they are able to provide fresh data to the leaderboards using a single multi-cluster virtual warehouse in auto-scale mode with the Economy scaling policy selecting between 1 and 6 clusters.

Of course, that company may have needed to hire an entire team of data engineers to build that in-products analytics app on top of their data warehouse. If you’re considering such a project, you should check out Propel Data, where you can build analytics with the team you have now.

At Propel, we provide developers with easy-to-use GraphQL APIs that allow frontend and full-stack developers to build analytics products that support massive user concurrency at scale. Our backend handles multi-cluster Snowflake warehouses, data access policies, and fast responsive times in a performant, cost-effective manner, without extra data engineering. Propel is the clear choice for building in-product analytics with Snowflake as a data warehouse or data lake, because we help developers query the data they already have in Snowflake via our GraphQL API.

Please consider signing up for our email newsletter, following us on Twitter @propeldatacloud, or joining our waiting list to stay in touch.

Related Content

Snowflake’s storage of data for analytics is complicated “under the hood” because it uses columnar storage, as illustrated by this close-up image of a snowflake perched vertically on a block of ice like a column.

Snowflake Concepts

How Does Snowflake Storage Work? (Databases & Schemas)

Databases and schemas ("namespaces") are used to organize data in Snowflake storage, which uses a columnar format internally for analytics.

Snowflake’s virtual warehouses are used to pay for the processing power you need to run data analytical queries, like would be need to power a virtual dashboard of real-time pricing information, like the one shown in this image.

Snowflake Concepts

What Is the Use of a Virtual Warehouse in Snowflake Analytics?

In Snowflake, you allocate “virtual warehouses” (computing clusters) to execute the SQL database commands that you run on the data platform.