Page tree
Skip to end of metadata
Go to start of metadata

There are some differences between using Kyvos with a materialized or a non-materialized cube. See MOLAP, ROLAP, and HOLAP for the distinction between the different cube types. 

Kyvos provides you with the ability to work with raw data that is not built into a cube. You must select Enable Raw Data Querying for a particular cube to support working with raw data. When you use raw data, Kyvos does a quick analysis of the data using an external SQL engine to know what types of data it contains. You can only run one external SQL engine at a time.

A non-materialized cube doesn't include pre-calculated query results or summarized data but uses an external SQL engine to calculate values at the time of a query.

Supported external SQL engines for querying and drill through:

  • Presto
  • Spark
  • Hive
  • Snowflake

To use a ROLAP and HOLAP cube for a Snowflake data source, the connection must have the checkbox Is Default SQL Engine enabled.

To use a Spark or Hive connection, the Hadoop connection must have the checkbox Is Default SQL Engine disabled.

For a Spark connection, you need to provide the Spark thrift server connection string URL.

For example :jdbc:hive2://172.26.43.11:10001/default;transportMode=http;httpPath=cliservice;principal=hive/intelli-i0056.kyvostest.com@KYVOSTEST.COM

Consider using raw data when:

  • The data is small (a few GB in size)
  • When dimensions have high cardinality and pre-aggregating the data is unlikely to improve performance
  • You want to quickly start exploring the data, and once you know how you will use it, you plan to build it later
  • You want to generate drill-through reports

There are certain limitations. For example:

  • Crosstabs are not available. However, you will be able to see a table view of raw data. 
    Once you build a cube, the crosstab option becomes available.
  • You won't see the calculated measures in raw data cubes.
  • If you are working with a time dimension with a pre-defined hierarchy such as Year - Month - Day, you can specify whether month data should combine years' worth of data into a single month.
  • You can't browse a non-built cube if the underlying cube involves a dataset with transformations.
    You need to materialize the dataset and publish it to HCatalog.
  • You cannot use a registered file over a DFS file directly for raw data browsing. To browse a non-built cube (using raw data SQL engine) register the file with HCatalog. 
  • Currently, Kyvos supports single file, star schema, multifact, and snowflake schemas for a non-built cube.

To allow the use of raw data queries

  1. When you first open a cube, the cube properties display in the right pane. 
  2. Set the Raw Data Querying option to control the ability to query the cube without building it.
    Select from the following options:
  • Enable: Allows raw data querying using an external SQL engine for all types of queries. This option is applicable only for non-materialized cubes.
  • Disable: Disables raw data querying for this cube.
  • Only For Drillthrough: Allows raw data querying only for drill through queries. This option is applicable only for materialized cubes.

You can use raw data like you use data in a cube, but you can also use data profiles of raw data to see how to create better data relationships or cube designs. When creating a worksheet using raw data, you can specify whether it should show summarized data or detailed data.

You can pre-cache metadata queries for a raw data cube. This allows you to auto-populate the metadata queries in advance for a non-materialized cube before exposing it to a business user.

If you don't see the option Raw Data Querying in the Properties pane, you need to create a connection to the raw data. See Configuring the connection setting for raw data.

Table view

When a cube with raw data is viewed as a table, all levels, attributes, and measures are seen as individual columns. Consider a cube design that has the following two measures defined over the same underlying column `sales` from the fact table:

min_sales  = min(sales)

max_sales = max(sales)

These are seen as two columns of min(sales) and max(sales).

If you create a query that asks for an aggregated value on a column which is an attribute or level in the cube, the query behaves differently before and after the cube is built.

Exporting raw data

You can export the table data in CSV format.

Related topics


  • No labels