Server Configuration

Clustering

This section helps Admins configure the Zookeeper used in the CMC (Cluster Management Console). The CMC is a tool developed by Incorta to manage Incorta’s clustered environments.

  • Kafka Consumer Service Name: Provide the name for the loader service intended to act as the sole Kafka consumer. The naming should follow the format <NODE_NAME>.<SERVICE_NAME>. In this case, it is mandatory to configure the name for the node running the Kafka loader service. Otherwise, the loader node name will get assigned automatically, resulting in unexpected values. Changing this value requires restarting the loader services. Note: When changing the Kafka loader service consumer from one loader service (A) to another loader service (B), it is mandatory to restart the current loader service (A) first, then restart the loader service (B).

Important: Select Save before navigating away from this page, to avoid losing unsaved data.

SQL Interface

The SQL Interface (or SQLi) is a tool that makes Incorta act as a PostgreSQL database, enabling users to utilize Incorta’s powerful engine performance and features through other BI tools (e.g. Tableau). For example, Users can choose to load their data into the Incorta engine (memory) so that they can take advantage of its robust performance. Or, users can opt to load their data into Incorta’s staging area (if the data is too large for the Incorta memory), and access this data from other BI tools through the SQLi port set in the admin UI as shown below. In this section, you can configure SQLi, by setting the following properties:

  • Default SQL interface port: Provide a number for the port used to connect to the Incorta engine from other BI tools, and run queries against the data loaded in memory. In this case, if the query is not supported by the Incorta engine, it will automatically be routed through Spark to be executed. You can choose to by-pass the Incorta engine and run queries directly using Spark against data loaded in the staging area using the “Data Store (DS) port” property. Changing this value requires restarting the analytics service(s).
  • Data Store (DS) port: Provide the port number to use for running queries directly using Spark against data loaded in the staging area. Changing this value requires restarting the analytics service(s).
  • Enable connection pooling: Enable this option to create a pool of open connections between external BI tools and Incorta Analytics. Enabling this option avails multiple connections to improve the query response time, and save the time of establishing a new connection every time data is needed for the SQL interface. Changing this value requires restarting the analytics service(s).
  • Connection pool size: Provide the number of SQL interface connections to keep available when executing queries from external BI tools. Determining this value depends on the following factors: - Multithreading support from external BI tools. - Query complexity. - The Incorta host machine specs. - Available resources. Thus, choose this value very carefully, as setting it too high would result in reserving machine resources without being utilized. On the other hand, setting it too low can impact the query execution performance. Changing this value requires restarting the analytics service(s).
  • External Tables CSV File Path: If you are using Spark Yarn, provide the CSV file path for the tables to be used. Changing this value requires restarting the analytics service(s).
  • Concurrency: This property sets the number of metadata gathering processes that Incorta can run in parallel when executing queries against the Incorta engine. Changing this value requires restarting the analytics service(s).
  • Default Schemas: Provide a comma-separated list of schemas to be used in the case of using non-qualified table names (wrong table path), or when the SQL query does not specify a schema.
  • Enable Cache: Enable this option to cache repeated SQL operations and enhance the performance of executing queries, if there is enough available cache size.
  • Cache size (In gigabytes): Set the maximum caching size per user to cache the data returned by the SQLi queries. When this size is exceeded, the least recently used (LRU) data gets evicted, availing space for newer cache. Setting this parameter depends on the available memory in the Incorta host server, and the size of the common queries result-sets. For example, if the result is larger than this value, it will never be cached, in which case, it would be recommended to increase the cache size.
  • Cached query result max size: Configure this property to set the max size for each query result. That is, the table cell count, which is the rows multiplied by the columns.
  • Enable cache auto refresh: Enable this option to automatically refresh the cache at specified intervals.

Important: Select Save before navigating away from this page, to avoid losing unsaved data.

Spark Integration

Incorta Analytics utilizes Spark to: - execute complex queries that are not yet supported by the Incorta engine, and - perform queries on the data residing in the staging area without having to load them into the Incorta memory. In this section, you can configure Spark using the following properties:

  • Spark master URL: Provide the Spark Master connection string for the Apache Spark instance to execute materialized views (or SQL) queries. This option is required to connect to Apache Spark. You can access this info by navigating to the Spark host server UI (from any browser), using the following format:
    <SPARK_HOST_SERVER>:<SPARK_PORT_NO>
    Copy the Spark Master connection string (usually found in the top center of the UI) in the format:
    spark://<CONNECTION_STRING>:<SPARK_PORT_NO>
    The default port number for Spark installed with Incorta is 7077. Changing this value requires restarting all the loader and analytics services.
  • Enable SQL App: The SQL App is an application that runs within Spark to handle all incoming SQLi queries. Enable this option to start the SQL App, and keep it up and running, to execute incoming SQL queries. Changing this value requires restarting the analytics service(s).
  • SQL App driver memory: Allocate memory (in GB) to be used by the SQL interface Spark to construct (not calculate) the final results. Consult with the Spark admin to set this value.
  • SQL App cores: Set the number of dedicated CPU cores for the SQLi Spark App only. Ensure that there are enough cores in your setup that are reserved for OS, applications, and other services.
  • SQL App memory: Provide the maximum memory that will be used by SQLi Spark queries, leaving extra memory for MVs if needed. The memory required for both applications combined cannot exceed the Worker Memory.
  • SQL App executors: Provide the maximum number of executors that can be spawned on a single worker. Each of the executors will have some of the cores defined in the “SQL App cores” property, and will use part of the memory defined in the SQL App memory” property. Note that the cores and memory assigned for each executor will be the same for all the executors. Thus, the number of executors is the divisor of the number of SQL App cores and SQL App memory, and must be smaller than or equal to them. However, if it is not the divisor, the total cores will not be utilized.
    Example:
    If the SQL App cores = 7 and the SQL App executors = 3, each executor will take 2 cores, and 1 of the cores will not be used. Additionally, if the number of executors is greater than the SQL App cores, the number of executors will be equal to the number of cores. Note that each executor will use a single core (e.g. If you have SQL App cores = 7 and SQL App executors = 10, then 7 executors will be created, and each executor will utilize a single core). If the number of executors is greater than the SQL App memory, then the executors will consume the assigned memory at 1 GB/executor (e.g., if the SQL App memory = 5 and the SQL App executors = 7, 5 cores will be created, with 1 GB each).
  • SQL App shuffle partitions:A single shuffle partition represents a block of data processed for joins and/or aggregations execution. The shuffle partition increases as the processed data size increases. The optimal shuffle partition size is approximately 128MBs. It’s recommended to increase this value as the processed data size increases. However, this means an increased CPU utilization. On the other hand, if the query operates on a trivial amount of data, an increased amount of partitions will lead to a small partition size. This can increase the query execution time due to the overhead of managing needless partitions. Insufficient partitions can cause a query to fail.
  • SQL App extra options: Extra Spark options can be passed to the SQL interface Spark application. These options can be used to override the default configurations. Sample value: spark.sql.shuffle.partitions=8;spark.executor.memory=4g;spark.driver.memory=4g.
  • Enable SQL App Dynamic Allocation: This property controls the dynamic allocation of the Data Hub Spark application. If it is enabled, Spark will dynamically allocate executors depending on the workload. This is bounded by the resources assigned in other configurations (e.g. CPUs and memory per executor). When a query gets fired, it starts with one executor and dynamically generates others if needed. This option helps optimize resource utilization as it removes idle executors to save resources. If the workload increases, Spark claims them again. Changing this value requires restarting the analytics service(s).
  • Spark App port: This port used by Incorta to connect to Spark and access the Data Hub. Changing this value requires restarting the analytics service(s).
  • SQL App fetch size: This property sets the number of rows that Incorta fetches at a time from the Data Hub while aggregating the query result-set. Changing this value requires restarting the analytics service(s).
  • SQL App spark home (Optional): Provide the file system path for the Apache Spark instance used to execute queries that are either sent to the Incorta engine or Data Hub. If this options is not set, the “SPARK_HOME” environment variable will be used instead. If this is not set either, the Spark home value for the Spark instance used for the Incorta materialized views and compaction will be used.
  • SQL App Spark master URL (Optional): Provide the Spark master server URL. for executing the SQLi queries sent to the Incorta engine or Data Store. If not entered, the shipped Spark master URL will be used instead.

Important: Select Save before navigating away from this page, to avoid losing unsaved data.

Tuning

In this section, you can enter the values for the following properties:

  • Max In-Memory Data (%): This property sets the maximum data allowed in memory as a percentage of the JVM memory. The default (and recommended) value is 75, to leave enough room for the engine to perform calculations. Note that setting this value to higher than 75% can result in server stability issues. Changing this value requires restarting the analytics service(s).
  • Max Concurrent Queries: This property determines the maximum number of queries that can run at the same time. Note that dragging a column in the Incorta application UI (Analyzer mode) executes a query in the background. The default value is 0, meaning that the number of concurrent jobs is set automatically depending on the number of physical cores. Changing this value requires restarting the analytics service(s).
  • Enable Parquet File Compression: Toggle to enable the Parquet files compression. Compression required if you will be using Materialized Views. Changing this value requires restarting the loader and analytics services.

Important: Select Save before navigating away from this page, to avoid losing unsaved data.

UI Customizations

Use this section to customize the UI.

  • Color Palette mode: Choose a theme to change the mode of the Insights and Dashboards.

Important: Select Save before navigating away from this page, to avoid losing unsaved data.

Diagnostics

Use this section to configure diagnostics-related properties, e.g. Logging.

  • Logging: Set the logging level, to specify the amount of details in the log files. The available logging levels are: OFF, SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, and ALL. It is not recommended to change this parameter without referring to Incorta’s support team.

Important: Select Save before navigating away from this page, to avoid losing unsaved data.


© Incorta, Inc. All Rights Reserved.