What Format are AtScale Aggregate Instances Stored as in HDFS?

By default, the AtScale engine will always choose to create aggregates in Parquet format for all aggregates, which use a highly compressed format to keep file sizes small and columnar storage so that compression is applied per column.

This can be overridden by the engine setting `aggregates.tableConfig.preferredStorageFormat`, which will create aggregates in the format specified. i.e., AtScale can also store instances in RC, ORC, and sequence files and even use Hive's SerDe interface.

AtScale's choice of storing aggregate data in Parquet files is based on the fact that the supported SQL engine (Impala, Hive, Spark, etc.) can work with the aggregate data format.

For example, Impala does not support the ORC file format as documented in Cloudera Impala Guide.

Was this article helpful?

0 out of 0 found this helpful