Release Notes 5.0
Release Highlights
This release introduces several major improvements to the Cluster Management Console (CMC), Incorta Loader Service, and Incorta Analytics Service, in addition to the other services and Add-ons. The goal of the Incorta 5.0 release is to enhance data management and analytics capabilities.
This release offers versioning of physical schemas, an new Incorta SQL table, a new RESTful Public API, new connectors, a new Radial Bar Chart, dynamic measures, Apache Spark 3 compatibility, a Save Menu for the Analyzer that allows you to save an insight as a business schema view, the ability to download and send a dashboard as PDF, the ability to use a formula in an aggregate filter, the ability to export and download in new formats to data destinations, and much more.
Important new features and enhancements
There are several important features in this release:
- Download or send a dashboard as PDF
- Incorta SQL Table
- Data Destinations for Google Drive and Google Sheets
- Public API for the Analytics Service
- Physical schema and dashboard versioning
- Dynamic Measures
- Support for a formula in an Aggregate Filters
Additional improvements and enhancements
- Radial Bar Chart visualization
- Amazon DynamoDB Connector
- Apache Cassandra Connector
- Data Agent
- Save Menu options for the Analyzer
- Improved off heap memory management
- Improved query performance for Pivot and Aggregated Table insights
- Apache Spark 3 compatibility
When you install or upgrade to this release, review the End User License Agreement (EULA) for various changes.
Upgrade considerations
There are several upgrade considerations for this release.
If you are using IFRAMEs to embed Incorta or embedding another application within Incorta, a CMC Administrator must sign in to the Cluster Management Console and Enable Cross-Origin Access prior to the upgrade for each tenant. This option requires SSL configuration and mandates access to Incorta Analytics using HTTPS.
This release does not support the reference to miscellaneous system variables, internal session variables, external session variables, and filter expression session variables in the following:
- formula expression of a physical schema table formula column
- formula expression of a materialized view formula column
- filter expression of a physical schema table load filter
- filter expression of a materialized view load filter
Prior to upgrade, the following configurations in the Cluster Management Console require resolution in order to upgrade to this release:
- Multiple Incorta Clusters share Incorta Nodes that run the same Loader Service or an Analytics Service
- An Incorta Node has more than one Notebook Add-on
- An Incorta Node has no joined services or add-on
Incorta Labs
An Incorta Labs feature is experimental and functionality may produce unexpected results. For this reason, an Incorta Labs feature is not ready for use in a production environment. Incorta Support will investigate issues with an Incorta Labs feature. In a future release, an Incorta Labs feature may be either promoted to a product feature ready for use in a production environment or be deprecated without notice.
In this release, there are two new Incorta Labs features:
There also are improvements for existing Incorta Labs features:
The 4.9.5 release deprecated the Incorta Labs feature for enabling SQL Joins. This Incorta Labs feature is not available in this release.
Incorta SQL Table
In this release, in the Cluster Management Console you can enable the Incorta SQL Table, an Incorta Labs feature.
An Incorta SQL Table is a new type of derived table that a schema developer creates in a physical schema using a SELECT statement which includes a non-recursive Common Table Expression (CTE). It uses a new SQL engine that orchestrates complex query execution using the existing engine as a query processor. An Incorta SQL Table supports complex query shapes and SQL expressions such as multiple join types, correlated subqueries, and analytic queries.
In the SELECT statement of an Incorta SQL Table, all referenceable objects must be performance optimized physical schema tables or materialized views.
To learn more, review Concepts → Incorta SQL Table.
Physical schema and dashboard versioning
A CMC administrator can enable version history tracking of dashboards and physical schemas. Once enabled users will be able to access the version history of a dashboard or physical schema. As an Incorta user you can preview, restore, export, or create a copy of prior dashboard and physical schema versions.
Steps to enable dashboard and physical schema version history
The following are the steps for enabling the dashboard and physical schema version history:
- As the CMC administrator, sign in to the CMC.
- In the Navigation bar, select Clusters.
- In the cluster list, select a cluster name.
- In the canvas tab, select Cluster Configurations.
- In the Cluster Configurations menu, select Default Tenant Configurations.
- In the Default Tenant Configurations menu, select Incorta Labs.
- Enable Schema and dashboard versioning.
- Configure the Dashboard versioning properties.
- Select Save.
Dashboard versioning properties
Property | Control | Description |
---|---|---|
Schema and dashboard versioning | toggle | Enable to maintain prior versions of schemas and dashboards |
Maximum number of versions per entity | text box | Set the maximum number of versions to maintain for each dashboard and schema |
Backup frequency | drop down list | Set the cadence of version creation for each dashboard and schema |
When selecting a timed backup frequency, if an entity has not been changed since the last stored version Incorta will not add a redundant version to the dashboard and physical schema history.
Accessing dashboard version history
Here are the steps to access the dashboard version history:
- Sign in to the Analytics Service.
- In the Navigation bar, select Content.
- In the Dashboard menu, select the desired dashboard.
- In the upper right corner, select More Options (⋮ vertical ellipsis).
- In the More Options menu, select Version History.
Search dashboard version
From the Version History menu, you can search for a specific version using the following search properties:
Property | Control | Description |
---|---|---|
Version Date | drop down list | Select a time frame to filter results of version history |
Version Date Between | calendar date picker | Available when Custom is selected in Version Date. Select a start and end date to filter version history |
Modified By | text box | Enter a username to filter by the user that edited the dashboard. The modified By field is not case sensitive |
Dashboard version history properties
Property | Control | Description |
---|---|---|
Preview | preview window | Opens a preview window of the dashboard version. You can Restore, Export, or Create copy from the preview window |
Restore | button | When selected the dashboard will be updated to the selected dashboard version |
Export | export window | Opens an export window, in which you can optionally include Bookmarks and Scheduled Jobs in the export. The dashboard will be exported as a package of XML files. |
Create a copy | copy dashboard window | Opens a Make a copy window. From this window you can rename the dashboard copy and select the Content file location. |
Accessing physical schema version history
Here are the steps to access the physical schema version history:
- Sign into the Incorta Direct Data Platform™
- In the Navigation bar, select Schema
- In the Schema menu, select the desired schema
- In the upper right corner, select Settings (gear icon)
- In the Settings menu, select Version History
Search physical schema version
From the Version History menu, you can search for a specific version using the following search properties:
Property | Control | Description |
---|---|---|
Restore | button | When selected the schema will be updated to the selected schema version |
Export | export window | Opens an export window, in which you can optionally include Scheduled Jobs in the export. The schema will be exported as a .zip package of XML files. |
Notebook sampling
In this release, there is a new notebook sampling feature that you can use to test Notebook with a subset of data in a large table, in order to make execution faster.
Here are the notebook sampling properties you can add:
Property | Description |
---|---|
notebook.dataframe.limit |
Enter a value for the dataframe number of rows |
notebook.dataframe.sampling.percentage |
Enter the percentage of dataframe sampling. Valid values are between 1 and 100. |
notebook.dataframe.sampling.seed |
Optionally enter the seed used in sampling |
The notebook sampling properties you add will be applied to every dataframe, but will not affect the execution of the materialized views.
Notebook configurations precedence
If you add both the notebook.dataframe.limit
and the notebook.dataframe.sampling.percentage
properties, the notebook.dataframe.sampling.percentage
property will be applied first, and the notebook.dataframe.limit
property will be applied second.
Add a notebook sampling property
Here are the steps to add a notebook sampling property to your materialized view:
- Within the physical schema, open the materialized view.
- In the Data Source dialog, under Properties:, select Add Property.
- In key, enter the notebook sampling property name.
- In value, enter the notebook sampling property value.
- Select Validate.
- In the Action bar, select Done.
To learn more, please review Tools → Notebook Editor.
Cluster Management Console (CMC)
In this release, there are several enhancements to the CMC:
Queries and render requests timeout
To support running Incorta behind proxy servers that have a predefined connection timeout, the CMC Administrator can now set a time limit to execute queries and insight render requests. the timing is in milliseconds with the default value being -1, that is, no time limit is defined.
If the query executes within the specified time frame, it is returned back to the caller. Otherwise, a GUID is returned. As soon as the query execution is complete, it is added to the local cache. This allows Incorta to render requests with GUID and continue to fetch the local cache within the predefined timeout.
Simplified relations
In this release, you can only add an Incorta Node to a single Incorta cluster. An Incorta cluster can have one or more Incorta Nodes.
The CMC will automatically add an Incorta Node’s related services and add-ons to the Incorta cluster.
Clusters Manager
The CMC administrator can enable the auto-reload feature for the child tabs on the Clusters Manager. Simply enable the Enable auto-reload for 5 minutes toggle in the Action menu.
In this release, you can manage an Incorta Node and related child services and Add-ons within the Clusters Manager, including stopping and starting child entities.
In addition, you can now monitor the off-heap memory for a given service on the Nodes tab in the Clusters Manager. The CMC monitors the off-heap active memory and sends an alert if it exceeds 90% of the total allocated off-heap memory.
If pooled memory exceeds 90%,the CMC will not send an alert as this is not considered an issue. Pooled memory is considered as available memory but it is reserved for the service off-heap.
Incorta Analytics and Loader Service
The 5.0 release introduces several key improvements to the Incorta Analytics and Loader Services such as:
- Download or send a dashboard as PDF
- Data Destinations
- Public API for the Analytics Service
- Dynamic Measures
- Support for formulas in Aggregate Filters
- Applied filter supports specifying a presentation variable as filter value for the Between filter operator
- Enhanced Prompts to support an empty default filter value
- Preview session variable values in the Filter dialog
- Select a date system variable for global variables
- Support only for date system variables in a physical schema table formula column or load filter
- Analyzer productivity enhancements
- New Visualizations and enhancements
- New Connectors and enhancements
Download or send a dashboard as PDF
In this release, a dashboard consumer can download insights and tabs as PDF files. In addition, a dashboard consumer can schedule sending a given tab, selected tabs, or all tabs on a dashboard as a PDF file via Email.
Download insight as PDF
A dashboard consumer can download any insight as a PDF file. The downloaded file contains the given insight after applying dashboard runtime filters or filter options.
Here are the steps to download an insight as a PDF file:
- Access a dashboard.
- For a given insight on a tab, select More Options (⋮ vertical ellipsis).
- Select Download → PDF.
Download tab as PDF
A dashboard consumer can download a given tab, selected tabs, or all tabs on a dashboard as a PDF file. The downloaded file contains all insights in the selected tab(s) after applying the dashboard runtime filters or filter options, if any. The downloaded file also contains the filter expressions of each applied dashboard runtime filter, if any.
Here are the steps to download one or more tabs as a PDF file:
- In the Content Manger, select a dashboard to open.
- For a given tab on the dashboard, select More Options (⋮ vertical ellipsis) next to the tab name.
- Select Download → PDF.
-
In Download as PDF dialog, for the Included Tabs, do one of the following:
- Select the Include all tabs check box to download all the tabs on the dashboard as a PDF file..
- Select the tabs list, and then select the tab(s) you want to include in the downloaded PDF file.
- Select X next to the tab name to remove a selected tab.
- Select Download.
Schedule dashboard delivery as PDF file via Email
As a dashboard consumer, you can schedule sending a given tab, selected tabs, or all tabs on a given dashboard as a PDF file via Email.
Here are the steps to schedule sending one or more tabs on a given dashboard as a PDF file via Email:
- In the Content Manger, select a dashboard to open.
- In the Action bar, select the Share icon (3 connected dots icon)
- Select Schedule Delivery.
- Enter the dashboard schedule and delivery properties.
- For the Included Tabs, select the Include all tabs check box or select the tab(s) you want to include in the PDF file.
- For the Data Format, select PDF.
- Select Schedule.
Send dashboard as PDF
As a dashboard consumer, you can send a given tab, selected tabs, or all tabs on a given dashboard as a PDF file via Email.
Here are the steps to send one or more tabs on a given dashboard as a PDF file via Email:
- In the Content Manger, select a dashboard to open.
- n the Action barm select the Share icon (3 connected dots icon)
- Select Send Now.
- Enter the dashboard email properties.
- For the Included Tabs, select the Include all tabs check box or select the tab(s) you want to include in the PDF file.
- For the Data Format, select PDF.
- Select Send.
Data Destinations
With this release, you can export a given insight, a tab on a dashboard, or the entire dashboard to Google Drive or Google Sheets data destinations. You can also schedule the delivery of all supported insights on a dashboard, including all tabs, to these data destinations, if applicable.
This release supports sending the following insights to data destinations.
- Listings Tables
- Aggregated Tables
- Pivot Tables (in Excel and Google Sheets only)
With Google Drive data destinations, you can export and send supported insights in Comma Separated Values (CSV) or Excel (.xlsx) file formats. To export supported insights to the Google Sheets file format, you have to send to Google Sheets data destinations.
The Export Server automatically converts Organizational, Advanced Map, and Sunburst insights to a tabular form and includes them in the generated file(s) when you send all supported insights on a tab or a dashboard, or when you schedule the delivery of supported insights on the dashboard. This applies to all supported file formats: CSV, Excel, and Google Sheets. However, you cannot send these types of visualizations individually.
To learn how to create and manage data destinations, please review the Data Manager.
Public API for the Analytics Service
This release introduces a RESTful Public Application Programming Interface (API) for the Analytics Service.
The Public API allows developers to interact with data within the Analytics Service for programmatic uses. In this initial release, the Public API contains authentication endpoints, dashboard prompt endpoints, and query endpoints.
For more details, see the Public API documentation.
Dynamic Measures
A dashboard developer now can implement dynamic measures for an insight with two or more measures and of the visualization type Pivot Table or Aggregated Table.
Here are the steps for a dashboard developer with Edit access rights for a given dashboard to implement dynamic measures for an insight:
- In the Analyzer, in the Action bar, select Settings (gear icon).
- In the Settings panel, in General, enable the Dynamic Measures toggle.
- In the Dynamic Measures Default drop down list, optionally change the All value to the name of a measure pill.
- Save your changes to the insight.
Here are the steps for a dashboard consumer with View access rights to select a dynamic measure:
- To open the Dynamic Measure menu, in the insight header bar, select the listed measures.
- In the Dynamic Measure menu, select Reset to apply the default configuration or select at least one or more measures.
Support for a formula in an Aggregate Filter
This release supports using a formula in an aggregate filter. You can use a formula for an aggregate filter to filter out the results returned by the aggregation. This is synonymous with the HAVING
clause in a SQL Statement.
You can create only a formula expression that returns a boolean. Only aggregated groups that meet the formula conditions are included in the result set. You can use this feature when using the Analyzer to create or edit an Incorta Analyzer Table, Incorta View, dashboard insight or data notification, and when exploring physical schema or business schema data.
To learn more about an Aggregate Filter for an insight, see Concepts → Aggregate Filter.
Presentation variable as filter value for the Between filter operator of an applied filter
In this release, a dashboard developer can create an applied filter with a Between filter operator and select a presentation variable as a filter value. A presentation variable can also be selected for a filterable column that is of the type Date or Timestamp in the applied filter.
Enhanced Prompts to support an empty default filter value
In this release, a dashboard developer does not need to define a default filter value for a Prompt. As a dashboard filter, a prompt with an empty default filter value will show as a Filter bar pill with an incomplete filter expression. The incomplete filter expression will not affect the dashboard.
Preview session variable values in the Filter dialog
In the Filter dialog of the Filter bar, you can now preview the value of a date system variable, miscellaneous system variable, internal session variable, external session variable, or filter expression session variable.
Select a date system variable for global variables
When you create a date or timestamp global variable, you can select a date system variable as the global variable value in this release. The value saved, is the snapshot value of the system variable.
Support only for date system variables in a physical schema table formula column or load filter
Starting with this release, both a formula expression for a physical schema table formula column and a filter expression for a load filter can no longer reference:
- Miscellaneous system variable
- Internal session variable
- External session variable
- Filter expression session variable
- Global variable
For both a physical schema table formula column and a filter expression for a load filter, you can reference a date system variable.
Analyzer productivity enhancements
There are several enhancements and improvements to the Analyzer in this release:
- Save Menu Options for the Analyzer
- Undo and Redo
- View the Information dialog for a pill
- Disable and enable insight filters
The Analyzer also includes the following enhancements that improve and enrich the user experience:
- Improved dashboard performance for large tables
- Ability to easily arrange the order of pills within a tray in the Insight panel
Save Menu Options for the Analyzer
In the Analyzer, this release introduces new options for how to save an insight depending on the Analyzer tool context. The Save Menu options are:
- Save
- Save as Insight
- Save as Business View
Applicable Analyzer contexts
There are various contexts for the Analyzer tool that affect the available Save Menu options in the Analyzer.
Context | Save | Save as Insight | Save as Business View | Comment |
---|---|---|---|---|
Explore Data with the Analyzer for a physical schema | New or existing dashboard or dashboard tab | - | New or existing dashboard business schema | Business view only supports Listing Table |
Explore Data with the Analyzer for a business schema | New or existing dashboard or dashboard tab | - | New or existing dashboard business schema | Business view only supports Listing Table |
Incorta View | Existing business schema | - | - | - |
Incorta Analyzer Table | Existing physical schema | - | - | - |
Create an insight on a new or existing dashboard tab | Current dashboard tab | New or existing dashboard tab | New or existing business schema | Only available for Listing Table |
Edit an insight on a dashboard tab | Current dashboard tab | New or existing dashboard tab | New or existing business schema | Only available for Listing Table |
Save as Insight
The Save as Insight option opens the Save as dialog. In the Save as dialog, you can:
- Create a new folder and add a dashboard with the insight to it
- Search for an existing folder or dashboard
- Select an existing folder and add a new dashboard with the insight to it
- Select an existing dashboard, create a new dashboard tab, and add the insight to it
- Select an existing dashboard, select an existing dashboard tab, and add the insight to
Save as Business View
The Save as Business view is only available for a Listing Table insight. In the Save as Business view dialog, you can:
- Specify the business schema view name
- Enter an optional description
- Select an existing business schema and add the new view to it
- Create a new business schema and add the new view to it
In this release, it is only possible to create a business schema view for a Listing Table insight and not an Incorta View, even if there is a grouping dimension pill for the Listing Table insight.
Undo and Redo
In the Action bar of the Analyzer, you can now select Undo or Redo for insight changes.
View the Information dialog for a pill
Both the Properties panel and the Filter panel allows you to open the Information dialog for a pill. In the panel header, simply select the icon for the fully qualified name.
Disable and enable insight filters
For a given insight in the Analyzer, you can now disable and enable certain insights filters while in the Analyzer.
Here are the steps to disable a filter:
- Select a pill in either the Individual Filter or the Aggregate Filter tray of the Insight panel.
- In the Filters panel, in Advanced, enable the Disable Filter toggle.
- Alternatively, in the Individual Filter or the Aggregate Filter tray, select View Filter Values.
- In the Filter Value panel, for the given filter, select the Disable Filter icon.
Here are the steps to enable a filter:
- Select a pill in either the Individual Filter or the Aggregate Filter tray of the Insight panel.
- In the Filters panel, in Advanced, disable the Disable Filter toggle.
- Alternatively, in the Individual Filter or the Aggregate Filter tray, select View Filter Values.
- In the Filter Value panel, for the given filter, select the Disable Filter icon.
When saving your changes to an insight, the Analyzer will delete all disabled insight filters.
New Visualizations and enhancements
In this release, there is the following new visualization:
In addition, this release includes several enhancements and improvements to existing visualizations:
Radial Bar Chart
This release includes the new Radial Bar Chart visualization. A Radial Bar Chart visualization is a Bar Chart plotted on a polar coordinate system, rather than on a Cartesian one. It is used to show comparisons among categories by using a circular shape. To learn more, please review Visualizations → Radial Bar Chart.
Advanced Map enhancements
For an Advanced Map insight, you can now rotate an Advanced Map insight (in previous releases, you could rotate the Advanced Map insight when previewing it using a device with a touchpad).
Grouping dimension plot bands
In this release, you have the option to add one or more plot bands to a Grouping Dimension pill. You can add plot bands to the following insight visualizations:
To add a plot band to a supported insight visualization, follow these steps:
- For an insight, select a supported visualization in the Insight panel.
- From the Data panel, add columns or formulas to the related Grouping Dimension tray and the Measure tray, as appropriate.
- Select the arrow next to the name of the column or formula in the required tray (Grouping Dimension tray, for example) to access the pill properties.
- In the Properties panel, in Format, select Add Plot Band.
- In the Plot Band properties, in Start and Stop, add the start and end of the plot band, respectively.
- Optionally, add a label to the plot band.
- Select the color of the plot band.
- Save your changes.
Max Rows Limit
For Listing Table, Aggregated Table, and Pivot Table visualizations, this release includes the re-introduction of the Insight Settings property, Max Rows Limit.
This property affects the total number of rows returned from the query without page size impacts. Set the value of this property to zero (0) to return all applicable rows.
In this release, the Max Rows Limit property is also available for an Incorta Analyzer Table and aIncorta View.
New Connectors and enhancements
This release introduces the following new connectors:
In addition, this release includes the following enhancements and improvements to existing connectors:
- Support chunking methods for all SQL connectors
- Support for CData JDBC drivers for the Custom SQL connector
- Extraction job timeout for Oracle and MySQL connectors
Amazon DynamoDB Connector
Amazon DynamoDB is a fully managed proprietary NoSQL database service that supports key-value and document data structures and is offered as part of the Amazon Web Services (AWS) portfolio. It’s a fully managed, multi-region, multi-active, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. To learn more, please review Connectors → Amazon Web Services (AWS) DynamoDB
Apache Cassandra Connector
Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The Cassandra connector uses the Datastax Java Driver for Apache Cassandra which provides Java clients with a way to create CQL sessions, discover keyspaces, tables, and columns as well as execute CQL queries on Cassandra.
To learn more, please review Connectors → Cassandra.
Support chunking methods for all SQL connectors
This release introduces support for parallel extraction of large tables using the different chunking methods (by chunking size and chunking period) for the following connectors:
- NetSuite
- Presto
- SQL Server
- SQL Server jTDS
- SAP Hana
- Google BigQuery
- SQL Custom Connector
Windows authentication with the SQL Server jTDS connector
In this release, the schema manager can connect to a SQL database using Active Directory user credentials. This authentication method is available when creating a data source for a SQL database using the SQL Server jTDS connector.
CData JDBC driver support for the Custom SQL connector
The SQL Custom connector now supports the CData JDBC drivers. The schema manager can use any of CData JDBC drivers to connect to different data sources.
A future maintenance pack will introduce a Custom CData connector.
Extraction job timeout for Oracle and My SQL
For Oracle and MySQL datasets, you can now configure a time limit for the extraction job. The Loader Service terminates the extraction job for the SQL query in the dataset if the extraction time exceeds the configured limit.
Apache Spark, Materialized Views, and Incorta ML
In this release, there are several improvements related to Apache Spark:
- SQLi enhancements for Spark queries
- Enhanced error messages for materialized views
- Incorta ML Time Series function enhancements
- Quotation syntax for column names with special characters
- Apache Spark 3 compatibility
SQLi enhancements for Spark queries
In this release, there are several SQLi enhancements:
- Support for
DISTINCT ON
in a SELECT statement. - SELECT query support for a physical schema table with a self-referential join and runtime security filter that uses a descendantOf() built-in function for a hierarchical filter expression.
Enhanced Job Errors for materialized views
In this release, a materialized view now shows Job Errors that includes the related schema, the execution start time, and Error description.
In addition, you can View Details for the error. The detailed view provides means for further debugging including Incorta’s Stacktrace, Spark’s Stacktrace, Spark Application Configurations, and suggestions for resolution when applicable.
Improved discovery for materialized views
This release introduces enhancements to the materialized view execution time in terms of schema discovery for full loads and incremental loads.
As a schema manager, you can now configure a materialized view to process and validate the entire data frame or sample and validate the dataframe.
To enable discovery sampling, follow these steps:
- In the Data Source dialog, in Properties, if no properties exist, select Add Property.
- In key, enter
spark.dataframe.sampling.enabled
. - In value, enter true.
- Select Validate.
Incorta ML Time Series enhancements
In this release, there are new features added to the Time Series functions within Incorta ML:
- The Exponential Smoothing method is changed from Holt’s Exponential Smoothing to Holt Winter’s Exponential Smoothing and the parameters are enhanced for optimal modeling.
- New enhancements to the Auto Arima to widen the search space for the Arima order (p, d, q)(P, D, Q) to achieve better forecasts and infer the period for seasonal differencing m using autocorrelation.
- Optional parameters are available in the form of JSON in order to tweak the enhance model attributes.
- The Confidence Interval width parameter is added to build model function.
- Added a ew function for Time Series Model Plotting with Confidence Interval.
- Added auto-complete in the Notebook Editor for all Incorta ML Time Series functions.
Quotation syntax for column names with special characters
In this release, a materialized view requires the Spark SQL and PostgreSQL quotations syntax for column names that haver special characters.
When using special characters in the name a materialized view that is either of the type Spark SQL or PostgreSQL, adhere to the following:
Context | Back Tick | Double Quote | Single Quote | No Quotes |
---|---|---|---|---|
Schema Name without special character | ✔ | ✔ | ✔ | |
Schema Name with special character | ✔ | ✔ | ||
Table Name without special character | ✔ | ✔ | ✔ | |
Table Name with special character | ✔ | ✔ | ||
Column Name without special character | ✔ | ✔ | ✔ | |
Column Name with special character | ✔ | |||
Column/Table/Schema/SubQuery Aliases without special character | ✔ | ✔ | ✔ | |
Column/Table/Schema/SubQuery Aliases with special character | ✔ | ✔ | ||
Literals (String Constants) | ✔ |
Apache Spark 3 compatibility
This release is compatible with the following releases of Apache Spark 3:
spark-3.0.1-bin-hadoop2.7.tgz
spark-3.0.1-bin-hadoop3.2.tgz
You can download Apache Spark at https://spark.apache.org/downloads.html.
Apache Spark 3 requires the following:
- Oracle Java SE 8, OpenJDK 8, or OpenJDK 11
- Python 2.7/3.6+
- Scala 2.12.x
- R 3.4+
You must install Spark 3 in another directory besides IncortNode
.
Do not change, replace, or remove the default spark
directory bundled with the Incorta Node installation for your standalone Incorta cluster. Related services reference various libraries that reside in the spark
directory.
Install and configure Apache Spark 3
Here is an overview of the steps to install Apache Spark 3 for an on-premises standalone Incorta Cluster:
- Download and untar Apache Spark 3
- Stop the Incorta version of Spark
- Stop the Incorta related services
- Copy the Incorta Node spark related files
- Copy the Incorta Node spark related files
- Update the Spark Master URL in the Cluster Management Console (CMC)
- Start Apache Spark 3
- Start the Incorta related services
Installation of Apache Spark 3 requires:
- a Linux system administrator with root access
- a CMC Administrator
- a Tenant Administrator (SuperUser account)
Download and untar Apache Spark 3
As the Linux system administrator, use a secure shell (ssh) to access the Incorta host.
As the incorta user, download the Apache Spark3 tarball file to the /tmp
directory for the host of your standalone Incorta cluster. The following example is for the spark-3.0.1-bin-hadoop2.7.tgz
file from the UC Berkeley mirror:
su incorta
cd /tmp
wget https://mirrors.ocf.berkeley.edu/apache/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
As the incorta user, create shell variables for file names, folder names, folder paths, Incorta Services and any Notebook Add-on according to the installation of your standalone Incorta cluster and Apache Spark 3 download.
The following example is for spark-3.0.1-bin-hadoop2.7.tgz and the default installation path for Incorta which is /home/incorta/IncortaAnalytics.
cd /tmp
SPARK3_DONWLOAD_TAR_FILE=spark-3.0.1-bin-hadoop2.7.tgz
SPARK3_TAR_FOLDER=spark-3.0.1-bin-hadoop2.7
SPARK3_DOWNLOAD_PATH=/tmp/spark3/
SPARK3_INSTALLATION_PATH=/home/incorta/spark3
INCORTA_INSTALLATION_PATH=/home/incorta/IncortaAnalytics
LOADER_SERVICE=loaderService
ANALYTICS_SERVICE=analyticsService
NOTEBOOK_ADDON=localNotebook
In the Incorta installation path directory, create a spark3 directory.
mkdir $SPARK3_INSTALLATION_PATH
Untar and uncompress the downloaded tarball to the spark3 directory.
cd $SPARK3_DOWNLOAD_PATH
tar -xzvf $SPARK3_DONWLOAD_TAR_FILE -C $SPARK3_INSTALLATION_PATH --strip-components=1 $SPARK3_TAR_FOLDER
Verify the untarred files.
ls -l $SPARK3_INSTALLATION_PATH
Create the eventLogs directory.
cd /tmp
mkdir $SPARK3_INSTALLATION_PATH/eventlogs
Stop the Incorta version of Spark
Stop Apache Spark from the command line.
cd $INCORTA_INSTALLATION_PATH/IncortaNode/
./stopSpark.sh
Stop the Incorta related services
Stop the Analytics Service, Loader Service, and Notebook Add-on from either the command line or the Cluster Management Console.
cd $INCORTA_INSTALLATION_PATH/IncortaNode/
./stopService.sh $ANALYTICS_SERVICE
./stopService.sh $LOADER_SERVICE
./stopNotebook.sh $NOTEBOOK_ADDON
Copy Incorta Node spark related files
Copy the IncortaNode/spark/conf
directory to the spark3
directory.
cp $INCORTA_INSTALLATION_PATH/IncortaNode/spark/conf/* ${SPARK3_INSTALLATION_PATH}/conf/
Copy the IncortaNode/spark/incorta
directory to the spark3 directory.
cp -R $INCORTA_INSTALLATION_PATH/IncortaNode/spark/incorta ${SPARK3_INSTALLATION_PATH}/incorta
Update related properties
Edit the node.properties
file.
vim $INCORTA_INSTALLATION_PATH/IncortaNode/node.properties
Change spark.home
value to the path of the spark3
directory.
spark.home=/home/incorta/spark3
Add the spark.version
property with a value of 3.
spark.version=3
OptionalL
If you have an environment variable for SPARK_HOME
, change this also to the spark3
directory. For example, edit the custom.sh
file and change SPARK_HOME
.
sudo vim /etc/profile.d/custom.sh
SPARK_HOME=/home/incorta/spark3
Save your changes.
Start Apache Spark 3
Before starting Apache Spark 3, retrieve the required property values from both spark-env.sh
and spark-defaults.conf
.
SPARK_PUBLIC_DNS=`cat $SPARK3_INSTALLATION_PATH/conf/spark-env.sh | awk '/^SPARK_PUBLIC_DNS/' | cut -d '=' -f 2`
SPARK_MASTER_WEBUI_PORT=`cat $SPARK3_INSTALLATION_PATH/conf/spark-env.sh | awk '/^SPARK_MASTER_WEBUI_PORT/' | cut -d '=' -f 2`
SPARK_MASTER_URL=`cat $SPARK3_INSTALLATION_PATH/conf/spark-defaults.conf | awk '/^spark.master/' | cut -d ' ' -f 2`
echo $SPARK_PUBLIC_DNS
echo $SPARK_MASTER_WEBUI_PORT
echo $SPARK_MASTER_URL
Start the Apache Spark 3 and the related services.
cd $SPARK3_INSTALLATION_PATH/sbin
./start-master.sh
./start-slave.sh $SPARK_MASTER_URL
./start-history-server.sh
./start-mesos-shuffle-service.sh
Verify that you can access the Apache Spark 3 Master Web UI. Visit the following console output in a supported web browser.
echo "Copy the following into your web browser: http://"${SPARK_PUBLIC_DNS}:${SPARK_MASTER_WEBUI_PORT}
Update the Spark Master URL in the Cluster Management Console (CMC)
Review the Spark Master URL and installation directory.
echo $SPARK_MASTER_URL
echo $SPARK3_INSTALLATION_PATH/
Open a web browser, and update the Spark Master URL in the CMC with these steps:
- Sign in the CMC as the CMC Administrator.
- In the Navigation bar, select Clusters.
- In the List View, select your cluster.
- Select the Cluster Configurations tab.
- In Server Configurations, in the left panel, select Spark Integration.
-
In the right panel, edit the following to the previous Bash shell output:
- Spark master URL property
- SQL App Spark home (Optional)
- SQL App Spark master URL (Optional)
- Select Save.
Start the Incorta related services
Start the Analytics Service, Loader Service, and Notebook Add-on from either the command line or the Cluster Management Console.
cd $INCORTA_INSTALLATION_PATH/IncortaNode/
./startService.sh $ANALYTICS_SERVICE
./startService.sh $LOADER_SERVICE
./startNotebook.sh $NOTEBOOK_ADDON
Verify Apache Spark 3
Here are the steps to verify that Apache Spark 3 is running correctly using the tenant Sample Data and the SALES schema:
- Sign in to a tenant as the SuperUser.
- In the Navigation bar, select Schema.
- In the Schema Manager, in the Action bar, select + Add New → Create Schema
- In the Create Schema dialog, enter sch_Spark3 or similar for the schema name.
- Select Save.
- In the Schema Designer, select Materialized.
- In the Data Source dialog, for Language select Spark SQL.
- Select Edit Query.
- In the Edit Query dialog, enter:
select * from SALES.CUSTOMERS limit 10;
- Save your changes.
- For Table Name, enter mv_Spark3.
- In the Action bar, select Done.
- In the Schema Manager, in the Action bar, select Load → Load now → Full.
- In the Data Loading dialog, select Load.
- In the Spark Master Web UI, in Completed Applications, verify the success of the Application ID with the name mv_Spark3.
- After you verify success, delete sch_Spark3.
You may need to repeat some of these steps after a future upgrade such as:
- Update related properties
- Update the Spark Master URL in the Cluster Management Console (CMC)
Additional features and enhancements
Here are the new additional features and enhancements for this release:
Accessibility enhancements
This release introduces some accessibility enhancements that provide keyboard and VoiceOver accessibility for the Analytics Service screens and tools.
Data Agent
The Data Agent is an agent service that enables the extraction of data from an on-premises data source. Typically, an on-premises data source resides behind a firewall. A data agent facilitates the extraction from the data source behind a firewall to an Incorta cluster. The Incorta cluster can be on-premises, but is typically hosted by a cloud provider.
To learn more, please review Tools → Data Agent.
PK-Index Tool
The PK-Index tool is now included in this release. For this release, it is not required to run the PK-Index tool prior to upgrade. Doing so, however, may greatly improve the time it takes to perform an upgrade.
Improved Tableau support
As part of this release, Incorta has tested Tableau connectivity to a database with a custom Incorta connector. These tests were performed using an automated testing tool called the Tableau Data Source Verification Tool (TDVT). This is an important verification step in signing the Incorta connector for distribution. To learn more about accessing the integration between Incorta and Tableau, please review Integrations → Tableau.