Learn how to size engines initially and use engine observability to monitor and resize engines
Selecting an appropriate engine size for your workload depends on multiple factors such as the size of your active dataset, latency and throughput requirements of your workload, your considerations for price-performance and the number of users and queries your workload is expected to handle concurrently. Our guidance is to start small with an engine size that fits your active dataset and monitor the workload using the engine observability metrics (see below). Based on these metrics, you can then dynamically resize your engine to meet the needs of your workload.
If your workload requires high processing power relative to data size, use a compute-optimized node type. These nodes have the approximately same processing power as storage-optimized nodes but have less memory, cache space, and cost.
Firebolt allows you to change:
See the engine fundamentals page for details.
Firebolt provides engine observability metrics that give visibility into how the engine resources are being utilized by your workloads. Use the Information_Schema.engine_metrics_history view to understand how much CPU, RAM, and disk are utilized by your workloads. In addition, this view also provides details on how often your queries hit the local cache and how much of your query data is spilling onto the disk. These metrics can help you decide whether your engine needs a different node type and whether you need to add more nodes to improve the query performance. Use the Information_Schema.engine_running_queries view to understand how many queries are waiting in the queue to be run. If there are a number of queries still waiting to be run, adding another cluster to your engine may help improve the query throughput.
The following tables provide approximate information on the specifications of each engine node type. This information is provided for informational purposes only and is subject to change without notice.
Node Type | vCPUs | Memory | Disk Size |
---|---|---|---|
S | 8 | 64GB | 1875GB |
M | 16 | 128GB | 3750GB |
L | 32 | 256GB | 7500GB |
XL | 64 | 512GB | 15000GB |
Node Type | vCPUs | Memory | Disk Size |
---|---|---|---|
S | 8 | 16GB | 475GB |
M | 16 | 32GB | 950GB |
L | 32 | 64GB | 1900GB |
XL | 64 | 128GB | 3800GB |
Note: Firebolt reserves approximately 25% of the disk size for other system operations and caches.
TIP: We recommend loading a representative sample of your data into Firebolt to get a sense of the compression ratio you will achieve. This value can vary widely based on the data types and distribution of values.
For the ELT workloads, the engine size would depend on the number of files and the size of the files used to ingest the data. You can parallelize the ingest process with additional nodes, which can provide improved performance.
To correctly size an engine for querying data, there are several factors to consider:
For query processing, our recommendation is to start with a S or M storage-optimized instance type. Then, run a checksum over the dataset you expect to be queried frequently. Firebolt Engines cache the data locally, which helps serve queries at low latencies. The cache size provided by the engines varies depending on the type of node used in your engines, with each size having twice the cache of the next smallest size. Compute-optimized instances have approximately one quarter of the cache size of storage-optimized instances. After the checksum, you can use Information_Schema.engine_metrics_history to see the cache utilization percentage. If an acceptable percentage of your active dataset fits, you can then run queries at your expected QPS on the engine.
TIP: You can use Multi-Cluster Engine Warmup to submit your checksum queries to all clusters in a multi-cluster engine.
Small and medium storage-optimized engines are available for use right away. Compute-optimized instance types are available, but may see longer engine start times. If you want to use a large or extra-large engine, reach out to support@firebolt.io.
TIP: You also have the option to run your workload simultaneously on engines with different configurations and use these metrics to identify which configuration best fits your needs.
You will need to have the appropriate RBAC permissions to use the engine observability metrics.
Learn how to size engines initially and use engine observability to monitor and resize engines
Selecting an appropriate engine size for your workload depends on multiple factors such as the size of your active dataset, latency and throughput requirements of your workload, your considerations for price-performance and the number of users and queries your workload is expected to handle concurrently. Our guidance is to start small with an engine size that fits your active dataset and monitor the workload using the engine observability metrics (see below). Based on these metrics, you can then dynamically resize your engine to meet the needs of your workload.
If your workload requires high processing power relative to data size, use a compute-optimized node type. These nodes have the approximately same processing power as storage-optimized nodes but have less memory, cache space, and cost.
Firebolt allows you to change:
See the engine fundamentals page for details.
Firebolt provides engine observability metrics that give visibility into how the engine resources are being utilized by your workloads. Use the Information_Schema.engine_metrics_history view to understand how much CPU, RAM, and disk are utilized by your workloads. In addition, this view also provides details on how often your queries hit the local cache and how much of your query data is spilling onto the disk. These metrics can help you decide whether your engine needs a different node type and whether you need to add more nodes to improve the query performance. Use the Information_Schema.engine_running_queries view to understand how many queries are waiting in the queue to be run. If there are a number of queries still waiting to be run, adding another cluster to your engine may help improve the query throughput.
The following tables provide approximate information on the specifications of each engine node type. This information is provided for informational purposes only and is subject to change without notice.
Node Type | vCPUs | Memory | Disk Size |
---|---|---|---|
S | 8 | 64GB | 1875GB |
M | 16 | 128GB | 3750GB |
L | 32 | 256GB | 7500GB |
XL | 64 | 512GB | 15000GB |
Node Type | vCPUs | Memory | Disk Size |
---|---|---|---|
S | 8 | 16GB | 475GB |
M | 16 | 32GB | 950GB |
L | 32 | 64GB | 1900GB |
XL | 64 | 128GB | 3800GB |
Note: Firebolt reserves approximately 25% of the disk size for other system operations and caches.
TIP: We recommend loading a representative sample of your data into Firebolt to get a sense of the compression ratio you will achieve. This value can vary widely based on the data types and distribution of values.
For the ELT workloads, the engine size would depend on the number of files and the size of the files used to ingest the data. You can parallelize the ingest process with additional nodes, which can provide improved performance.
To correctly size an engine for querying data, there are several factors to consider:
For query processing, our recommendation is to start with a S or M storage-optimized instance type. Then, run a checksum over the dataset you expect to be queried frequently. Firebolt Engines cache the data locally, which helps serve queries at low latencies. The cache size provided by the engines varies depending on the type of node used in your engines, with each size having twice the cache of the next smallest size. Compute-optimized instances have approximately one quarter of the cache size of storage-optimized instances. After the checksum, you can use Information_Schema.engine_metrics_history to see the cache utilization percentage. If an acceptable percentage of your active dataset fits, you can then run queries at your expected QPS on the engine.
TIP: You can use Multi-Cluster Engine Warmup to submit your checksum queries to all clusters in a multi-cluster engine.
Small and medium storage-optimized engines are available for use right away. Compute-optimized instance types are available, but may see longer engine start times. If you want to use a large or extra-large engine, reach out to support@firebolt.io.
TIP: You also have the option to run your workload simultaneously on engines with different configurations and use these metrics to identify which configuration best fits your needs.
You will need to have the appropriate RBAC permissions to use the engine observability metrics.