Configure Spark for Use with Materialized Views

When using Bundled/Standalone Spark, Materialized Views run under the configuration detailed in “Using Spark Bundled with Incorta”. These are the default settings inherited by all Materialized Views, both Spark SQL and PySpark version. With this in mind, it is recommended to configure the default settings sufficiently to run most of your MVs but use the override capability on an MV-per-MV basis for those requiring more resources (cores, memory, driver) than average. This helps ensure smaller MVs aren’t overallocated with resources.

Quick Start MV for Standalone Spark Configuration

While each MV use case is highly variable, use the following guidelines if you are just getting started with Spark. You will need to iterate on configuration until the system is tuned to your use cases.

<incorta_home>/IncortaNode/spark/conf/spark-defaults.conf

  • Number of Spark server cores: C
  • spark.cores.max: floor (C/4)
  • spark.executor.cores: floor (spark.cores.max / spark.executor.cores)
  • spark.sql.shuffle.partitions: default is 4, unset default is 200
  • spark.driver.memory: 4GB

Default Materialized View Application settings

For the selected Cluster, you can now set Materialized Views default values for Apache Spark Integrations:

  • Materialized view application cores
  • Materialized view application memory
  • Materialized view application executors

The Spark Integrations settings are global to all tenants in a cluster configuration.

Materialized view application cores

The number of CPU cores reserved for use by materialized view. The default value is 1. The allocated cores for all running Spark applications cannot exceed the dedicated cores for the cluster.  

Materialized view application memory

The number of gigabytes of maximum memory to use for materialized view. The default is 1 GB. The memory for all Spark applications combined cannot exceed the cluster memory (in gigabytes).

Materialized view application executors

Maximum number of executors that can be spawned by a single materialized view application. Each of the executors will allocate a number of the cores defined in sql.spark.mv.cores, and will consume part of the memory defined in sql.spark.mv.memory. Note that the cores and memory assigned per executor will be equal for each executor, hence the number of executors should be a divisor for each of the following configurations (sql.spark.mv.cores and sql.spark.mv.memory). For example, when you configure an application with cores=4, memory=8, executors=2, that result is that the Spark will spawn 2 executors where each executor consumes 2 cores and 4GB from the cluster).

Edit Default Materialized View Settings

Here is how you can modify these settings and their default values:

  1. In the navigation bar, select Clusters.
  2. In the cluster list, select a Cluster name.
  3. In the canvas tabs, select Cluster Configurations.
  4. In the panel tabs, select Server Configurations.
  5. In the left pane, select Spark Integration.
  6. Set the value for a given Materialized view application setting:

    • Materialized view application cores
    • Materialized view application memory
    • Materialized view application executors
  7. Select Save.

© Incorta, Inc. All Rights Reserved.