Release Notes 5.0

Release Highlights

This release introduces several major improvements to the Cluster Management Console (CMC), Incorta Loader Service, and Incorta Analytics Service, in addition to the other services and Add-ons. The goal of the Incorta 5.0 release is to enhance data management and analytics capabilities.

This release offers versioning of physical schemas, an new Incorta SQL table, a new RESTful Public API, new connectors, a new Radial Bar Chart, dynamic measures, Apache Spark 3 compatibility, a Save Menu for the Analyzer that allows you to save an insight as a business schema view, the ability to download and send a dashboard as PDF, the ability to use a formula in an aggregate filter, the ability to export and download in new formats to data destinations, and much more.

Important new features and enhancements

There are several important features in this release:

Additional improvements and enhancements

Important

When you install or upgrade to this release, review the End User License Agreement (EULA) for various changes.

Upgrade considerations

There are several upgrade considerations for this release.

Warning: Enable Cross-Origin Access for IFRAME

If you are using IFRAMEs to embed Incorta or embedding another application within Incorta, a CMC Administrator must sign in to the Cluster Management Console and Enable Cross-Origin Access prior to the upgrade for each tenant. This option requires SSL configuration and mandates access to Incorta Analytics using HTTPS.

Warning: Achieve upgrade readiness for physical schemas

This release does not support the reference to miscellaneous system variables, internal session variables, external session variables, and filter expression session variables in the following:

  • formula expression of a physical schema table formula column
  • formula expression of a materialized view formula column
  • filter expression of a physical schema table load filter
  • filter expression of a materialized view load filter
Warning: Achieve upgrade readiness for Incorta Nodes

Prior to upgrade, the following configurations in the Cluster Management Console require resolution in order to upgrade to this release:

  • Multiple Incorta Clusters share Incorta Nodes that run the same Loader Service or an Analytics Service
  • An Incorta Node has more than one Notebook Add-on
  • An Incorta Node has no joined services or add-on

Incorta Labs

An Incorta Labs feature is experimental and functionality may produce unexpected results. For this reason, an Incorta Labs feature is not ready for use in a production environment. Incorta Support will investigate issues with an Incorta Labs feature. In a future release, an Incorta Labs feature may be either promoted to a product feature ready for use in a production environment or be deprecated without notice.

In this release, there are two new Incorta Labs features:

There also are improvements for existing Incorta Labs features:

Incorta Labs: Deprecation Notice

The 4.9.5 release deprecated the Incorta Labs feature for enabling SQL Joins. This Incorta Labs feature is not available in this release.

Incorta SQL Table

In this release, in the Cluster Management Console you can enable the Incorta SQL Table, an Incorta Labs feature.

An Incorta SQL Table is a new type of derived table that a schema developer creates in a physical schema using a SELECT statement which includes a non-recursive Common Table Expression (CTE). It uses a new SQL engine that orchestrates complex query execution using the existing engine as a query processor. An Incorta SQL Table supports complex query shapes and SQL expressions such as multiple join types, correlated subqueries, and analytic queries.

In the SELECT statement of an Incorta SQL Table, all referenceable objects must be performance optimized physical schema tables or materialized views.

To learn more, review Concepts → Incorta SQL Table.

Physical schema and dashboard versioning

A CMC administrator can enable version history tracking of dashboards and physical schemas. Once enabled users will be able to access the version history of a dashboard or physical schema. As an Incorta user you can preview, restore, export, or create a copy of prior dashboard and physical schema versions.

Steps to enable dashboard and physical schema version history

The following are the steps for enabling the dashboard and physical schema version history:

  • As the CMC administrator, sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a cluster name.
  • In the canvas tab, select Cluster Configurations.
  • In the Cluster Configurations menu, select Default Tenant Configurations.
  • In the Default Tenant Configurations menu, select Incorta Labs.
  • Enable Schema and dashboard versioning.
  • Configure the Dashboard versioning properties.
  • Select Save.
Dashboard versioning properties
Property Control Description
Schema and dashboard versioning toggle Enable to maintain prior versions of schemas and dashboards
Maximum number of versions per entity text box Set the maximum number of versions to maintain for each dashboard and schema
Backup frequency drop down list Set the cadence of version creation for each dashboard and schema
Note

When selecting a timed backup frequency, if an entity has not been changed since the last stored version Incorta will not add a redundant version to the dashboard and physical schema history.

Accessing dashboard version history

Here are the steps to access the dashboard version history:

  • Sign in to the Analytics Service.
  • In the Navigation bar, select Content.
  • In the Dashboard menu, select the desired dashboard.
  • In the upper right corner, select More Options (⋮ vertical ellipsis).
  • In the More Options menu, select Version History.
Search dashboard version

From the Version History menu, you can search for a specific version using the following search properties:

Property Control Description
Version Date drop down list Select a time frame to filter results of version history
Version Date Between calendar date picker Available when Custom is selected in Version Date. Select a start and end date to filter version history
Modified By text box Enter a username to filter by the user that edited the dashboard. The modified By field is not case sensitive
Dashboard version history properties
Property Control Description
Preview preview window Opens a preview window of the dashboard version. You can Restore, Export, or Create copy from the preview window
Restore button When selected the dashboard will be updated to the selected dashboard version
Export export window Opens an export window, in which you can optionally include Bookmarks and Scheduled Jobs in the export. The dashboard will be exported as a package of XML files.
Create a copy copy dashboard window Opens a Make a copy window. From this window you can rename the dashboard copy and select the Content file location.

Accessing physical schema version history

Here are the steps to access the physical schema version history:

  • Sign into the Incorta Direct Data Platform™
  • In the Navigation bar, select Schema
  • In the Schema menu, select the desired schema
  • In the upper right corner, select Settings (gear icon)
  • In the Settings menu, select Version History
Search physical schema version

From the Version History menu, you can search for a specific version using the following search properties:

Property Control Description
Restore button When selected the schema will be updated to the selected schema version
Export export window Opens an export window, in which you can optionally include Scheduled Jobs in the export. The schema will be exported as a .zip package of XML files.

Notebook sampling

In this release, there is a new notebook sampling feature that you can use to test Notebook with a subset of data in a large table, in order to make execution faster.

Here are the notebook sampling properties you can add:

Property Description
notebook.dataframe.limit Enter a value for the dataframe number of rows
notebook.dataframe.sampling.percentage Enter the percentage of dataframe sampling. Valid values are between 1 and 100.
notebook.dataframe.sampling.seed Optionally enter the seed used in sampling

The notebook sampling properties you add will be applied to every dataframe, but will not affect the execution of the materialized views.

Notebook configurations precedence

If you add both the notebook.dataframe.limit and the notebook.dataframe.sampling.percentageproperties, the notebook.dataframe.sampling.percentage property will be applied first, and the notebook.dataframe.limit property will be applied second.

Add a notebook sampling property

Here are the steps to add a notebook sampling property to your materialized view:

  • Within the physical schema, open the materialized view.
  • In the Data Source dialog, under Properties:, select Add Property.
  • In key, enter the notebook sampling property name.
  • In value, enter the notebook sampling property value.
  • Select Validate.
  • In the Action bar, select Done.

To learn more, please review Tools → Notebook Editor.


Cluster Management Console (CMC)

In this release, there are several enhancements to the CMC:

Queries and render requests timeout

To support running Incorta behind proxy servers that have a predefined connection timeout, the CMC Administrator can now set a time limit to execute queries and insight render requests. the timing is in milliseconds with the default value being -1, that is, no time limit is defined.

If the query executes within the specified time frame, it is returned back to the caller. Otherwise, a GUID is returned. As soon as the query execution is complete, it is added to the local cache. This allows Incorta to render requests with GUID and continue to fetch the local cache within the predefined timeout.

Simplified relations

In this release, you can only add an Incorta Node to a single Incorta cluster. An Incorta cluster can have one or more Incorta Nodes.

The CMC will automatically add an Incorta Node’s related services and add-ons to the Incorta cluster.

Clusters Manager

The CMC administrator can enable the auto-reload feature for the child tabs on the Clusters Manager. Simply enable the Enable auto-reload for 5 minutes toggle in the Action menu.

In this release, you can manage an Incorta Node and related child services and Add-ons within the Clusters Manager, including stopping and starting child entities.

In addition, you can now monitor the off-heap memory for a given service on the Nodes tab in the Clusters Manager. The CMC monitors the off-heap active memory and sends an alert if it exceeds 90% of the total allocated off-heap memory.

Note

If pooled memory exceeds 90%,the CMC will not send an alert as this is not considered an issue. Pooled memory is considered as available memory but it is reserved for the service off-heap.


Incorta Analytics and Loader Service

The 5.0 release introduces several key improvements to the Incorta Analytics and Loader Services such as:

Download or send a dashboard as PDF

In this release, a dashboard consumer can download insights and tabs as PDF files. In addition, a dashboard consumer can schedule sending a given tab, selected tabs, or all tabs on a dashboard as a PDF file via Email.

Download insight as PDF

A dashboard consumer can download any insight as a PDF file. The downloaded file contains the given insight after applying dashboard runtime filters or filter options.

Here are the steps to download an insight as a PDF file:

  • Access a dashboard.
  • For a given insight on a tab, select More Options (⋮ vertical ellipsis).
  • Select DownloadPDF.

Download tab as PDF

A dashboard consumer can download a given tab, selected tabs, or all tabs on a dashboard as a PDF file. The downloaded file contains all insights in the selected tab(s) after applying the dashboard runtime filters or filter options, if any. The downloaded file also contains the filter expressions of each applied dashboard runtime filter, if any.

Here are the steps to download one or more tabs as a PDF file:

  • In the Content Manger, select a dashboard to open.
  • For a given tab on the dashboard, select More Options (⋮ vertical ellipsis) next to the tab name.
  • Select DownloadPDF.
  • In Download as PDF dialog, for the Included Tabs, do one of the following:

    • Select the Include all tabs check box to download all the tabs on the dashboard as a PDF file..
    • Select the tabs list, and then select the tab(s) you want to include in the downloaded PDF file.
    • Select X next to the tab name to remove a selected tab.
  • Select Download.

Schedule dashboard delivery as PDF file via Email

As a dashboard consumer, you can schedule sending a given tab, selected tabs, or all tabs on a given dashboard as a PDF file via Email.

Here are the steps to schedule sending one or more tabs on a given dashboard as a PDF file via Email:

  • In the Content Manger, select a dashboard to open.
  • In the Action bar, select the Share icon (3 connected dots icon)
  • Select Schedule Delivery.
  • Enter the dashboard schedule and delivery properties.
  • For the Included Tabs, select the Include all tabs check box or select the tab(s) you want to include in the PDF file.
  • For the Data Format, select PDF.
  • Select Schedule.

Send dashboard as PDF

As a dashboard consumer, you can send a given tab, selected tabs, or all tabs on a given dashboard as a PDF file via Email.

Here are the steps to send one or more tabs on a given dashboard as a PDF file via Email:

  • In the Content Manger, select a dashboard to open.
  • n the Action barm select the Share icon (3 connected dots icon)
  • Select Send Now.
  • Enter the dashboard email properties.
  • For the Included Tabs, select the Include all tabs check box or select the tab(s) you want to include in the PDF file.
  • For the Data Format, select PDF.
  • Select Send.

Data Destinations

With this release, you can export a given insight, a tab on a dashboard, or the entire dashboard to Google Drive or Google Sheets data destinations. You can also schedule the delivery of all supported insights on a dashboard, including all tabs, to these data destinations, if applicable.

This release supports sending the following insights to data destinations.

  • Listings Tables
  • Aggregated Tables
  • Pivot Tables (in Excel and Google Sheets only)

With Google Drive data destinations, you can export and send supported insights in Comma Separated Values (CSV) or Excel (.xlsx) file formats. To export supported insights to the Google Sheets file format, you have to send to Google Sheets data destinations.

Important

The Export Server automatically converts Organizational, Advanced Map, and Sunburst insights to a tabular form and includes them in the generated file(s) when you send all supported insights on a tab or a dashboard, or when you schedule the delivery of supported insights on the dashboard. This applies to all supported file formats: CSV, Excel, and Google Sheets. However, you cannot send these types of visualizations individually.

To learn how to create and manage data destinations, please review the Data Manager.

Public API for the Analytics Service

This release introduces a RESTful Public Application Programming Interface (API) for the Analytics Service.

The Public API allows developers to interact with data within the Analytics Service for programmatic uses. In this initial release, the Public API contains authentication endpoints, dashboard prompt endpoints, and query endpoints.

For more details, see the Public API documentation.

Dynamic Measures

A dashboard developer now can implement dynamic measures for an insight with two or more measures and of the visualization type Pivot Table or Aggregated Table.

Here are the steps for a dashboard developer with Edit access rights for a given dashboard to implement dynamic measures for an insight:

  • In the Analyzer, in the Action bar, select Settings (gear icon).
  • In the Settings panel, in General, enable the Dynamic Measures toggle.
  • In the Dynamic Measures Default drop down list, optionally change the All value to the name of a measure pill.
  • Save your changes to the insight.

Here are the steps for a dashboard consumer with View access rights to select a dynamic measure:

  • To open the Dynamic Measure menu, in the insight header bar, select the listed measures.
  • In the Dynamic Measure menu, select Reset to apply the default configuration or select at least one or more measures.

Support for a formula in an Aggregate Filter

This release supports using a formula in an aggregate filter. You can use a formula for an aggregate filter to filter out the results returned by the aggregation. This is synonymous with the HAVING clause in a SQL Statement.

You can create only a formula expression that returns a boolean. Only aggregated groups that meet the formula conditions are included in the result set. You can use this feature when using the Analyzer to create or edit an Incorta Analyzer Table, Incorta View, dashboard insight or data notification, and when exploring physical schema or business schema data.

To learn more about an Aggregate Filter for an insight, see Concepts → Aggregate Filter.

Presentation variable as filter value for the Between filter operator of an applied filter

In this release, a dashboard developer can create an applied filter with a Between filter operator and select a presentation variable as a filter value. A presentation variable can also be selected for a filterable column that is of the type Date or Timestamp in the applied filter.

Enhanced Prompts to support an empty default filter value

In this release, a dashboard developer does not need to define a default filter value for a Prompt. As a dashboard filter, a prompt with an empty default filter value will show as a Filter bar pill with an incomplete filter expression. The incomplete filter expression will not affect the dashboard.

Preview session variable values in the Filter dialog

In the Filter dialog of the Filter bar, you can now preview the value of a date system variable, miscellaneous system variable, internal session variable, external session variable, or filter expression session variable.

Select a date system variable for global variables

When you create a date or timestamp global variable, you can select a date system variable as the global variable value in this release. The value saved, is the snapshot value of the system variable.

Support only for date system variables in a physical schema table formula column or load filter

Starting with this release, both a formula expression for a physical schema table formula column and a filter expression for a load filter can no longer reference:

  • Miscellaneous system variable
  • Internal session variable
  • External session variable
  • Filter expression session variable
  • Global variable

For both a physical schema table formula column and a filter expression for a load filter, you can reference a date system variable.

Analyzer productivity enhancements

There are several enhancements and improvements to the Analyzer in this release:

The Analyzer also includes the following enhancements that improve and enrich the user experience:

  • Improved dashboard performance for large tables
  • Ability to easily arrange the order of pills within a tray in the Insight panel

Save Menu Options for the Analyzer

In the Analyzer, this release introduces new options for how to save an insight depending on the Analyzer tool context. The Save Menu options are:

  • Save
  • Save as Insight
  • Save as Business View
Applicable Analyzer contexts

There are various contexts for the Analyzer tool that affect the available Save Menu options in the Analyzer.

Context Save Save as Insight Save as Business View Comment
Explore Data with the Analyzer for a physical schema New or existing dashboard or dashboard tab - New or existing dashboard business schema Business view only supports Listing Table
Explore Data with the Analyzer for a business schema New or existing dashboard or dashboard tab - New or existing dashboard business schema Business view only supports Listing Table
Incorta View Existing business schema - - -
Incorta Analyzer Table Existing physical schema - - -
Create an insight on a new or existing dashboard tab Current dashboard tab New or existing dashboard tab New or existing business schema Only available for Listing Table
Edit an insight on a dashboard tab Current dashboard tab New or existing dashboard tab New or existing business schema Only available for Listing Table
Save as Insight

The Save as Insight option opens the Save as dialog. In the Save as dialog, you can:

  • Create a new folder and add a dashboard with the insight to it
  • Search for an existing folder or dashboard
  • Select an existing folder and add a new dashboard with the insight to it
  • Select an existing dashboard, create a new dashboard tab, and add the insight to it
  • Select an existing dashboard, select an existing dashboard tab, and add the insight to
Save as Business View

The Save as Business view is only available for a Listing Table insight. In the Save as Business view dialog, you can:

  • Specify the business schema view name
  • Enter an optional description
  • Select an existing business schema and add the new view to it
  • Create a new business schema and add the new view to it
Note

In this release, it is only possible to create a business schema view for a Listing Table insight and not an Incorta View, even if there is a grouping dimension pill for the Listing Table insight.

Undo and Redo

In the Action bar of the Analyzer, you can now select Undo or Redo for insight changes.

View the Information dialog for a pill

Both the Properties panel and the Filter panel allows you to open the Information dialog for a pill. In the panel header, simply select the icon for the fully qualified name.

Disable and enable insight filters

For a given insight in the Analyzer, you can now disable and enable certain insights filters while in the Analyzer.

Here are the steps to disable a filter:

  • Select a pill in either the Individual Filter or the Aggregate Filter tray of the Insight panel.
  • In the Filters panel, in Advanced, enable the Disable Filter toggle.
  • Alternatively, in the Individual Filter or the Aggregate Filter tray, select View Filter Values.
  • In the Filter Value panel, for the given filter, select the Disable Filter icon.

Here are the steps to enable a filter:

  • Select a pill in either the Individual Filter or the Aggregate Filter tray of the Insight panel.
  • In the Filters panel, in Advanced, disable the Disable Filter toggle.
  • Alternatively, in the Individual Filter or the Aggregate Filter tray, select View Filter Values.
  • In the Filter Value panel, for the given filter, select the Disable Filter icon.
Important

When saving your changes to an insight, the Analyzer will delete all disabled insight filters.

New Visualizations and enhancements

In this release, there is the following new visualization:

In addition, this release includes several enhancements and improvements to existing visualizations:

Radial Bar Chart

This release includes the new Radial Bar Chart visualization. A Radial Bar Chart visualization is a Bar Chart plotted on a polar coordinate system, rather than on a Cartesian one. It is used to show comparisons among categories by using a circular shape. To learn more, please review Visualizations → Radial Bar Chart.

Advanced Map enhancements

For an Advanced Map insight, you can now rotate an Advanced Map insight (in previous releases, you could rotate the Advanced Map insight when previewing it using a device with a touchpad).

Grouping dimension plot bands

In this release, you have the option to add one or more plot bands to a Grouping Dimension pill. You can add plot bands to the following insight visualizations:

To add a plot band to a supported insight visualization, follow these steps:

  • For an insight, select a supported visualization in the Insight panel.
  • From the Data panel, add columns or formulas to the related Grouping Dimension tray and the Measure tray, as appropriate.
  • Select the arrow next to the name of the column or formula in the required tray (Grouping Dimension tray, for example) to access the pill properties.
  • In the Properties panel, in Format, select Add Plot Band.
  • In the Plot Band properties, in Start and Stop, add the start and end of the plot band, respectively.
  • Optionally, add a label to the plot band.
  • Select the color of the plot band.
  • Save your changes.

Max Rows Limit

For Listing Table, Aggregated Table, and Pivot Table visualizations, this release includes the re-introduction of the Insight Settings property, Max Rows Limit.

This property affects the total number of rows returned from the query without page size impacts. Set the value of this property to zero (0) to return all applicable rows.

Note

In this release, the Max Rows Limit property is also available for an Incorta Analyzer Table and aIncorta View.

New Connectors and enhancements

This release introduces the following new connectors:

In addition, this release includes the following enhancements and improvements to existing connectors:

Amazon DynamoDB Connector

Amazon DynamoDB is a fully managed proprietary NoSQL database service that supports key-value and document data structures and is offered as part of the Amazon Web Services (AWS) portfolio. It’s a fully managed, multi-region, multi-active, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. To learn more, please review Connectors → Amazon Web Services (AWS) DynamoDB

Apache Cassandra Connector

Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The Cassandra connector uses the Datastax Java Driver for Apache Cassandra which provides Java clients with a way to create CQL sessions, discover keyspaces, tables, and columns as well as execute CQL queries on Cassandra.

To learn more, please review Connectors → Cassandra.

Support chunking methods for all SQL connectors

This release introduces support for parallel extraction of large tables using the different chunking methods (by chunking size and chunking period) for the following connectors:

  • NetSuite
  • Presto
  • SQL Server
  • SQL Server jTDS
  • SAP Hana
  • Google BigQuery
  • SQL Custom Connector

Windows authentication with the SQL Server jTDS connector

In this release, the schema manager can connect to a SQL database using Active Directory user credentials. This authentication method is available when creating a data source for a SQL database using the SQL Server jTDS connector.

CData JDBC driver support for the Custom SQL connector

The SQL Custom connector now supports the CData JDBC drivers. The schema manager can use any of CData JDBC drivers to connect to different data sources.

Note: Custom CData connector

A future maintenance pack will introduce a Custom CData connector.

Extraction job timeout for Oracle and My SQL

For Oracle and MySQL datasets, you can now configure a time limit for the extraction job. The Loader Service terminates the extraction job for the SQL query in the dataset if the extraction time exceeds the configured limit.


Apache Spark, Materialized Views, and Incorta ML

In this release, there are several improvements related to Apache Spark:

  • SQLi enhancements for Spark queries
  • Enhanced error messages for materialized views
  • Incorta ML Time Series function enhancements
  • Quotation syntax for column names with special characters
  • Apache Spark 3 compatibility

SQLi enhancements for Spark queries

In this release, there are several SQLi enhancements:

  • Support for DISTINCT ON in a SELECT statement.
  • SELECT query support for a physical schema table with a self-referential join and runtime security filter that uses a descendantOf() built-in function for a hierarchical filter expression.

Enhanced Job Errors for materialized views

In this release, a materialized view now shows Job Errors that includes the related schema, the execution start time, and Error description.

In addition, you can View Details for the error. The detailed view provides means for further debugging including Incorta’s Stacktrace, Spark’s Stacktrace, Spark Application Configurations, and suggestions for resolution when applicable.

Improved discovery for materialized views

This release introduces enhancements to the materialized view execution time in terms of schema discovery for full loads and incremental loads.

As a schema manager, you can now configure a materialized view to process and validate the entire data frame or sample and validate the dataframe.

To enable discovery sampling, follow these steps:

  • In the Data Source dialog, in Properties, if no properties exist, select Add Property.
  • In key, enter spark.dataframe.sampling.enabled.
  • In value, enter true.
  • Select Validate.

Incorta ML Time Series enhancements

In this release, there are new features added to the Time Series functions within Incorta ML:

  • The Exponential Smoothing method is changed from Holt’s Exponential Smoothing to Holt Winter’s Exponential Smoothing and the parameters are enhanced for optimal modeling.
  • New enhancements to the Auto Arima to widen the search space for the Arima order (p, d, q)(P, D, Q) to achieve better forecasts and infer the period for seasonal differencing m using autocorrelation.
  • Optional parameters are available in the form of JSON in order to tweak the enhance model attributes.
  • The Confidence Interval width parameter is added to build model function.
  • Added a ew function for Time Series Model Plotting with Confidence Interval.
  • Added auto-complete in the Notebook Editor for all Incorta ML Time Series functions.

Quotation syntax for column names with special characters

In this release, a materialized view requires the Spark SQL and PostgreSQL quotations syntax for column names that haver special characters.

When using special characters in the name a materialized view that is either of the type Spark SQL or PostgreSQL, adhere to the following:

Context Back Tick Double Quote Single Quote No Quotes
Schema Name without special character
Schema Name with special character
Table Name without special character
Table Name with special character
Column Name without special character
Column Name with special character
Column/Table/Schema/SubQuery Aliases without special character
Column/Table/Schema/SubQuery Aliases with special character
Literals (String Constants)

Apache Spark 3 compatibility

This release is compatible with the following releases of Apache Spark 3:

  • spark-3.0.1-bin-hadoop2.7.tgz
  • spark-3.0.1-bin-hadoop3.2.tgz

You can download Apache Spark at https://spark.apache.org/downloads.html.

Apache Spark 3 requires the following:

  • Oracle Java SE 8, OpenJDK 8, or OpenJDK 11
  • Python 2.7/3.6+
  • Scala 2.12.x
  • R 3.4+
Important

You must install Spark 3 in another directory besides IncortNode.

Warning

Do not change, replace, or remove the default spark directory bundled with the Incorta Node installation for your standalone Incorta cluster. Related services reference various libraries that reside in the spark directory.

Install and configure Apache Spark 3

Here is an overview of the steps to install Apache Spark 3 for an on-premises standalone Incorta Cluster:

  • Download and untar Apache Spark 3
  • Stop the Incorta version of Spark
  • Stop the Incorta related services
  • Copy the Incorta Node spark related files
  • Copy the Incorta Node spark related files
  • Update the Spark Master URL in the Cluster Management Console (CMC)
  • Start Apache Spark 3
  • Start the Incorta related services

Installation of Apache Spark 3 requires:

  • a Linux system administrator with root access
  • a CMC Administrator
  • a Tenant Administrator (SuperUser account)
Download and untar Apache Spark 3

As the Linux system administrator, use a secure shell (ssh) to access the Incorta host.

As the incorta user, download the Apache Spark3 tarball file to the /tmp directory for the host of your standalone Incorta cluster. The following example is for the spark-3.0.1-bin-hadoop2.7.tgzfile from the UC Berkeley mirror:

su incorta
cd /tmp
wget  https://mirrors.ocf.berkeley.edu/apache/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz

As the incorta user, create shell variables for file names, folder names, folder paths, Incorta Services and any Notebook Add-on according to the installation of your standalone Incorta cluster and Apache Spark 3 download.

The following example is for spark-3.0.1-bin-hadoop2.7.tgz and the default installation path for Incorta which is /home/incorta/IncortaAnalytics.

cd /tmp
SPARK3_DONWLOAD_TAR_FILE=spark-3.0.1-bin-hadoop2.7.tgz
SPARK3_TAR_FOLDER=spark-3.0.1-bin-hadoop2.7
SPARK3_DOWNLOAD_PATH=/tmp/spark3/
SPARK3_INSTALLATION_PATH=/home/incorta/spark3
INCORTA_INSTALLATION_PATH=/home/incorta/IncortaAnalytics
LOADER_SERVICE=loaderService
ANALYTICS_SERVICE=analyticsService
NOTEBOOK_ADDON=localNotebook

In the Incorta installation path directory, create a spark3 directory.

mkdir $SPARK3_INSTALLATION_PATH

Untar and uncompress the downloaded tarball to the spark3 directory.

cd $SPARK3_DOWNLOAD_PATH
tar -xzvf $SPARK3_DONWLOAD_TAR_FILE  -C $SPARK3_INSTALLATION_PATH --strip-components=1 $SPARK3_TAR_FOLDER

Verify the untarred files.

ls -l $SPARK3_INSTALLATION_PATH

Create the eventLogs directory.

cd /tmp
mkdir $SPARK3_INSTALLATION_PATH/eventlogs
Stop the Incorta version of Spark

Stop Apache Spark from the command line.

cd $INCORTA_INSTALLATION_PATH/IncortaNode/
./stopSpark.sh

Stop the Analytics Service, Loader Service, and Notebook Add-on from either the command line or the Cluster Management Console.

cd $INCORTA_INSTALLATION_PATH/IncortaNode/
./stopService.sh $ANALYTICS_SERVICE
./stopService.sh $LOADER_SERVICE
./stopNotebook.sh $NOTEBOOK_ADDON

Copy the IncortaNode/spark/conf directory to the spark3 directory.

cp $INCORTA_INSTALLATION_PATH/IncortaNode/spark/conf/* ${SPARK3_INSTALLATION_PATH}/conf/

Copy the IncortaNode/spark/incorta directory to the spark3 directory.

cp -R $INCORTA_INSTALLATION_PATH/IncortaNode/spark/incorta ${SPARK3_INSTALLATION_PATH}/incorta

Edit the node.properties file.

vim $INCORTA_INSTALLATION_PATH/IncortaNode/node.properties

Change spark.home value to the path of the spark3 directory.

spark.home=/home/incorta/spark3

Add the spark.version property with a value of 3.

spark.version=3

OptionalL

If you have an environment variable for SPARK_HOME, change this also to the spark3 directory. For example, edit the custom.sh file and change SPARK_HOME.

sudo vim /etc/profile.d/custom.sh
SPARK_HOME=/home/incorta/spark3

Save your changes.

Start Apache Spark 3

Before starting Apache Spark 3, retrieve the required property values from both spark-env.sh and spark-defaults.conf.

SPARK_PUBLIC_DNS=`cat $SPARK3_INSTALLATION_PATH/conf/spark-env.sh | awk '/^SPARK_PUBLIC_DNS/' | cut -d '=' -f 2`


SPARK_MASTER_WEBUI_PORT=`cat $SPARK3_INSTALLATION_PATH/conf/spark-env.sh | awk '/^SPARK_MASTER_WEBUI_PORT/' | cut -d '=' -f 2`

SPARK_MASTER_URL=`cat $SPARK3_INSTALLATION_PATH/conf/spark-defaults.conf | awk '/^spark.master/' | cut -d ' ' -f 2`

echo $SPARK_PUBLIC_DNS
echo $SPARK_MASTER_WEBUI_PORT
echo $SPARK_MASTER_URL

Start the Apache Spark 3 and the related services.

cd $SPARK3_INSTALLATION_PATH/sbin
./start-master.sh
./start-slave.sh $SPARK_MASTER_URL
./start-history-server.sh
./start-mesos-shuffle-service.sh

Verify that you can access the Apache Spark 3 Master Web UI. Visit the following console output in a supported web browser.

echo "Copy the following into your web browser:   http://"${SPARK_PUBLIC_DNS}:${SPARK_MASTER_WEBUI_PORT}
Update the Spark Master URL in the Cluster Management Console (CMC)

Review the Spark Master URL and installation directory.

echo $SPARK_MASTER_URL
echo $SPARK3_INSTALLATION_PATH/

Open a web browser, and update the Spark Master URL in the CMC with these steps:

  • Sign in the CMC as the CMC Administrator.
  • In the Navigation bar, select Clusters.
  • In the List View, select your cluster.
  • Select the Cluster Configurations tab.
  • In Server Configurations, in the left panel, select Spark Integration.
  • In the right panel, edit the following to the previous Bash shell output:

    • Spark master URL property
    • SQL App Spark home (Optional)
    • SQL App Spark master URL (Optional)
  • Select Save.

Start the Analytics Service, Loader Service, and Notebook Add-on from either the command line or the Cluster Management Console.

cd $INCORTA_INSTALLATION_PATH/IncortaNode/
./startService.sh $ANALYTICS_SERVICE
./startService.sh $LOADER_SERVICE
./startNotebook.sh $NOTEBOOK_ADDON
Verify Apache Spark 3

Here are the steps to verify that Apache Spark 3 is running correctly using the tenant Sample Data and the SALES schema:

  • Sign in to a tenant as the SuperUser.
  • In the Navigation bar, select Schema.
  • In the Schema Manager, in the Action bar, select + Add NewCreate Schema
  • In the Create Schema dialog, enter sch_Spark3 or similar for the schema name.
  • Select Save.
  • In the Schema Designer, select Materialized.
  • In the Data Source dialog, for Language select Spark SQL.
  • Select Edit Query.
  • In the Edit Query dialog, enter:
    select * from SALES.CUSTOMERS limit 10;
  • Save your changes.
  • For Table Name, enter mv_Spark3.
  • In the Action bar, select Done.
  • In the Schema Manager, in the Action bar, select Load → Load now → Full.
  • In the Data Loading dialog, select Load.
  • In the Spark Master Web UI, in Completed Applications, verify the success of the Application ID with the name mv_Spark3.
  • After you verify success, delete sch_Spark3.
Warning for future Incorta upgrades

You may need to repeat some of these steps after a future upgrade such as:

  • Update related properties
  • Update the Spark Master URL in the Cluster Management Console (CMC)

Additional features and enhancements

Here are the new additional features and enhancements for this release:

Accessibility enhancements

This release introduces some accessibility enhancements that provide keyboard and VoiceOver accessibility for the Analytics Service screens and tools.

Data Agent

The Data Agent is an agent service that enables the extraction of data from an on-premises data source. Typically, an on-premises data source resides behind a firewall. A data agent facilitates the extraction from the data source behind a firewall to an Incorta cluster. The Incorta cluster can be on-premises, but is typically hosted by a cloud provider.

To learn more, please review Tools → Data Agent.

PK-Index Tool

The PK-Index tool is now included in this release. For this release, it is not required to run the PK-Index tool prior to upgrade. Doing so, however, may greatly improve the time it takes to perform an upgrade.

Improved Tableau support

As part of this release, Incorta has tested Tableau connectivity to a database with a custom Incorta connector. These tests were performed using an automated testing tool called the Tableau Data Source Verification Tool (TDVT). This is an important verification step in signing the Incorta connector for distribution. To learn more about accessing the integration between Incorta and Tableau, please review Integrations → Tableau.


© Incorta, Inc. All Rights Reserved.