Amazon Redshift overview, pricing and cost optimization techniques

Posted on December 10, 2019 at 12:00 AM

Data is everywhere! Data is key for nearly every business decision made and business success. It is not sufficient to have just the data. Great businesses use data effectively to make decisions. In a typical setup, businesses gather data into data warehouse, analyzing the data to compile information, which will be used towards developing strategies for sales, marketing, operations, KPIs, performance reports, and HR activities.

A data warehouse is a specialized type of relational database where data will be pooled into, optimized for high-performance analysis and reporting. These databases collects current and historical transactional data from many disparate operational systems associated with the business (manufacturing, finance, sales, shipping, etc.) and pulls it together in one place to guide analysis and decision-making.

Amazon Redshift is a fast, fully managed data warehouse provided by AWS Cloud, that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. With Redshift, you can start small for just $0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions.

Why not traditional data warehouses?

In simple terms, data warehouse is nothing but a “specialized relation database”. In traditional on-premise setting, this is a combination of databases like SQL Servers and few analytical tools on top of these servers. This set up requires time, resources to administer and manage. In addition, the financial cost associated with building, maintaining, and growing self-managed, on-premise data warehouses is enormous.

As your data grows, you have to constantly trade-off what data to load into your data warehouse and what data to archive in storage so you can manage costs, keep ETL complexity low, and deliver good performance.

Why not MPP data warehouse cluster on EC2?

Setting up MPP data warehouse using EC2 instances is not much different than traditional data warehouses. Almost all the challenges are applicable here too, except that few scalability challenges could be easily handled using techniques like AutoScaling groups.

Why AWS Redshift?

Amazon Redshift on the other hand is fully managed data warehouse. It integrates easily with existing BI tools as well as allows you to run standard SQL queries in cost effective way. Using Redshift Spectrum makes it easy to analyze large amounts of data in its native format without requiring you to load the data.

Amazon Red shift automatically handles many of the time-consuming tasks associated with managing your own data warehouse including:

Setup
Data Durability
Scaling
Automatic Updates and Patching
Scale Query capabilities

Amazon Redshift billing?

Billing commences for a data warehouse cluster as soon as the data warehouse cluster is available. Billing continues until the data warehouse cluster terminates, which would occur upon deletion or in the event of instance failure.

You are billed for following components (pay for what you use):

Compute node hours: the total number of hours you run across all your compute nodes for the billing period. Node usage hours are billed for each hour your data warehouse cluster is running in an available state. If you no longer wish to be charged for your data warehouse cluster, you must terminate it to avoid being billed for additional node hours. Partial node hours consumed are billed as full hours. You are billed for 1 unit per node per hour, so a 3-node data warehouse cluster running persistently for an entire month would incur 2,160 instance hours. You will not be charged for leader node hours; only compute nodes will incur charges.
Backup Storage: the storage associated with your automated and manual snapshots for your data warehouse.
Data transfer: There is no data transfer charge for data transferred to or from Amazon Redshift and Amazon S3 within the same AWS Region. For all other data transfers into and out of Amazon Redshift, you will be billed at standard AWS data transfer rates.
Data Scanner: applicable if you are using Redshift Spectrum

What can be done to reduce Redshift bill?

Costs associated with running data warehouses are NOT going to be cheap unless you apply good cloud economics tactics. For example, mid size, dc2.8xlarge cluster could cost as much as $3500/month if you run it continuously.

Applying proper cost optimization techniques could lower the Redshift bill while keeping the resource usage optimal. Though not a comprehensive list, we tried to list a few different strategies that can be used to reduce the costs associated with the Redshift.

Stop/Pause when NOT needed: This is a universal best practice in cloud computing world. When resources are not needed in a running state, stop them. Data analytics or number crunching in general will not be 24/7 activity. A typical setup will be some batch jobs kick-in during specified time and do number crunching, generate reports and done. There are multiple strategies teams can apply for stop/pause:

Manual: Someone in the team will manually shutdown the Redshift cluster when not needed.

Scheduled: Based on required scheduled, shutdown and the cluster. Lots of companies are using this strategy to save costs. For example, they know jobs run for 2 hours during 1AM and 5AM. Shutdown scheduler will run at 5AM to stop the Redshift cluster

Idle Resource: This is a little tricky to implement, but once it is, will help with great cost savings because the servers are limited to “nearly” used time. Every day 2 hours of savings on these high cost machines will turn up as thousands of dollars savings.

In summary, whatever technique you implement, stop when NOT needed and launch when needed to control Redshift costs.
Scale up and down based on load - Elastic Resizing or concurrent scaling Whether you can implement stop the Redshift cluster technique or not, you should really pay attention to scale up and down based on load. This will essentially ensure that you are using the right number of resources to get tasks done.
Pick right sized “type” of Redshift cluster (memory, cpu or storage) Right sizing is another universal rule applies to across cloud resources. The second wasted cloud spend is due to over provisioned resources. Pick the right cluster size based on your performance requirements to get the best cloud economics.
Optimize queries and structure tables well This technique is very specific to Redshift cluster. Because the performance of cluster is tightly coupled with number of compute, using compute better will help with better performance. Query fine-tuning is one of the best techniques lot of users do not pay much attention, but this will reduce your AWS bill.

If you are looking for a solution to start and stop the RedShift cluster to save the costs, contact us . Our solution INVOKECloud could help. If not, we may be able to guide you with appropriate solution.

Amazon Redshift overview, pricing and cost optimization techniques

Recent posts