- Ensure same AWS Region for source data and the Kyvos deployment
You must ensure that the Kyvos is deployed on the same AWS Region as the data source (on S3, Snowflake, or Redshift). Different Regions will incur data transfer costs and delays.
- Query Engines, BI Server, and S3 storage must be in the same region
- Configure Cluster and Query Engine Scheduling to save cost and use cloud resources only when needed.
You can create schedules for:
- Shutdown cluster for any time interval .
- Start cluster for any time interval.
- Schedule query engines for any time interval.
- Elastic cube builds on EMR
Configure the EMR such that the cluster can scale in or scale out to use only the resources that are needed. This saves the cost of cube builds as compared to having static worker nodes.
- Use On-Demand EMR
The on-demand EMR cluster gets created when the cube build is launched and gets terminated when no cube builds are running OR no cube build is scheduled within the next 30 minutes. This ensures that the EMR is used only for the duration of cube build.
- Use Spot instances to save EMR costs in the cube builds
To save resource costs while building Kyvos cubes, configure the Spot instances. AWS offers Spot instances at a discounted price and can significantly reduce the cube build cost. Please note, the Spot instances are forcibly retracted because of insufficient capacity, which is quite common while using Spot instances. This may lead to a reattempt of the failed task which was running on the retracted Spot node and can increase cube build time.
- Use Glue service for Hive table metadata storage
Glue allows you to use HCatalog Glue to avoid the recreation of HCatalog tables with EMR cluster recreation for the same deployment. Table metadata will be preserved in Glue even if the EMR gets terminated.
- Ensure that all the cube which are not eligible for querying must be set to cuboid replication type as NONE.
- Ensure that there should be enough amount of local disk space available on Query Engine to replicate the built cubes.
- For the environments where we are not having sufficient local disk available (Local disk less than Cube size) - create a segment, create a dedicated metadata folder and allocate the prod cubes to this segment and the rest of the cubes to the default segment.