Configure Spark for Use with Materialized Views

When using Bundled/Standalone Spark, Materialized Views run under the configuration detailed in “Using Spark Bundled with Incorta”. These are the default settings inherited by all Materialized Views, both Spark SQL and PySpark version. With this in mind, it is recommended to configure the default settings sufficiently to run most of your MVs but use the override capability on an MV-per-MV basis for those requiring more resources (cores, memory, driver) than average. This helps ensure smaller MVs aren’t overallocated with resources.

Quick Start MV for Standalone Spark Configuration

While each MV use case is highly variable, use the following guidelines if you are just getting started with Spark. You will need to iterate on configuration until the system is tuned to your uses cases.

<incorta_home>/IncortaNode/spark/conf/spark-defaults.conf

  • Number of Spark server cores: C
  • spark.cores.max: floor (C/4)
  • spark.executor.cores: floor (spark.cores.max / spark.executor.cores)
  • spark.sql.shuffle.partitions: default is 4, unset default is 200
  • spark.driver.memory: 4GB

© Incorta, Inc. All Rights Reserved.