Release Notes 4.9

Release Highlights

The goal of the Incorta 4.9 release is to embrace the design themes of consistency, clarity, and efficiency with a strong focus on increasing user productivity and delight. As a result, this release showcases modern design and engineering at its best with the introduction of an all new, and greatly improved, Analyzer experience.

This release introduces several major improvements to the Cluster Management Console (CMC), Incorta Loader Service, Incorta Analytics Service, and Incorta ML.

Important New Features and Enhancements

There are several important features in this release:

New Analyzer
Dashboard tabs
Date Part
CMC Scheduler
Connectors for Cisco Meraki and Splunk
Global Variables
User interface and usability enhancements for the Schema Manager and Load Job Viewer
Schema notifications for load job failures and success
Support for SparkR and Incorta PostgreSQL for materialized views
New Rich Text, Sankey and Sunburst visualizations
New Advanced Map visualization with Mapbox
Incorta ML New Features

Additional Improvements and Enhancements

Support for folders in the Google Sheets connector
Support for more incremental data type columns for MS SQL Server
Parallelize Apache Spark queries with split Parquet files
Schema dependencies for inspector jobs
Content Manager preserves the List view sorting

Upgrade to Incorta 4.9

Important

Prior to upgrading to Incorta 4.9, please review and follow the procedures outlined in the Upgrade to Incorta 4.9 documentation.

Note

Any changes to these properties require that you restart all services in the Incorta Cluster.

Cluster Management Console (CMC)

The following new configurations and enhancements are available in the Cluster Management Console (CMC) for this release:

Single Sign-on (SSO) Auto Provisioning
Infrastructure Management
Terminated Unexpectedly as a Service State
CMC Scheduler
Tenants: Backup now, Restore and Overwrite, and Execute Inspector now
Mapbox Integration

Single Sign-on (SSO) Auto Provisioning

With Single Sign-on (SSO) Auto Provisioning, security administrators no longer need to manually create an Incorta user and assign the user to a group. Incorta will honor the SSO provider authentication, automatically create the Incorta user in the given tenant, and assign the user to a default group.

Note

You must have Single Sign-on (SSO) already configured for the default tenant configuration or the specific tenant configuration. You must also have created a security group for a specific tenant. You are not able to assign a group for the default tenant configuration.

Here are the steps to enable this option as the default tenant configuration in the CMC:

In the Navigation bar, select Clusters.
In the cluster list, select a Cluster name.
In the canvas tabs, select Cluster Configurations.
In the panel tabs, select Default Tenant Configurations.
In the left pane, select Security.
In the right pane, confirm that SSO is the Authentication Type.
Enable Auto provision SSO users.
Select Save.

Here are the steps to enable this option for a specific tenant configuration in the CMC:

In the Navigation bar, select Clusters.
In the cluster list, select a Cluster name.
In the canvas tabs, select the Tenants tab.
In the Tenant list, for the given tenant, select Configure.
In the left pane, select Security.
In the right pane, confirm that SSO is the Authentication Type.
Enable Auto provision SSO users.
For Auto provisioned SSO users default group,select a group name.
Select Save.

Infrastructure Management

In this release, for a given cluster, the CMC now includes an Infrastructure section. In the Infrastructure section, an administrator can enable or disable the following infrastructure servers:

Apache Derby database server for the Incorta metadata database server
Apache ZooKeeper server
Apache Spark standalone server

Important

This feature only supports a typical, standalone cluster configuration where the following are true:

Apache Derby serves as the Incorta Metadata Database
Apache ZooKeeper is the Incorta packaged (incorta-package.zip) distributed version.
Apache Spark is the Incorta packaged (incorta-package.zip) distributed version

To enable or disable a infrastructure application in the cluster, follow these steps:

In the Navigation bar, select Clusters.
In the cluster list, select a cluster name.
In the canvas tabs, select** Details**.
In the Infrastructure section, enable or disable any of the following:
- Metadata Database
- Zookeeper
- Spark

In this release, where the feature does not support the version or server, there is a disabled toggle control. For example, if the cluster uses a MySQL Server database for the Incorta metadata database, the Metadata Database toggle appears as disabled.

Warning for Mandatory Infrastructure Servers

The CMC will now show a warning indicating that a mandatory infrastructure server, such as the Incorta Metadata database or the Apache ZooKeeper, is not running.

To start the server, follow these steps:

In the Navigation bar, select Clusters.
In the cluster list, select a cluster name.
In the warning, select Start.

Terminated Unexpectedly as a Service State

The CMC will now report when a service unexpectedly terminates. In this release, the possible service states are now:

Stopping
Stopped
Processing
Starting Tenants
Started
Terminated Unexpectedly

To view the status of services in the cluster, follow these steps:

In the Navigation bar, select Clusters.
In the cluster list, select a cluster name.
In the canvas tabs, select Details.
In the Status section, review the state of the Analytics, Loader, and Add-on services.

Alternatively, you can view the status of services in the cluster in the Services tab:

In the Navigation bar, select Clusters.
In the cluster list, select a cluster name.
In the canvas tabs, select Services.
In the Status column, review the state of the Analytics, Loader, and Add-on services.

Note

Service Details also reports the Status of the service state.

CMC Scheduler

This release includes the CMC Scheduler. For a given cluster, the CMC Scheduler allows an administrator to create and manage jobs for:

Tenant backups
The Inspector Tool

To learn more, review Tools → CMC Scheduler.

Mapbox Integration

Mapbox is an open source mapping platform for custom designed maps. This release supports integration with Mapbox for the newly introduced Advanced Map insight visualization available in the new Analyzer.

Mapbox uses access tokens to associate API requests with an account. To learn more, please review Access Tokens | How Mapbox works.

It is optional to use your organization’s Mapbox Access Token as Incorta’s access token is the default in this release.

Important

Because of API request limits associated with Incorta’s default access token, an API request may not be processed. If you encounter an issue, please reach out to Support for more details.

Here are the steps to use your organization’s Mapbox Access Token in the default tenant configuration in the CMC:

In the Navigation bar, select Clusters.
In the cluster list, select a Cluster name.
In the canvas tabs, select Cluster Configurations.
In the panel tabs, select Default Tenant Configurations.
In the left pane, select Integration.
In the right pane, in Mapbox API Key, specify the token value.
Select Save.

Here are the steps to use your organization’s Mapbox Access Token for a specific tenant configuration in the CMC:

In the Navigation bar, select Clusters.
In the cluster list, select a Cluster name.
In the canvas tabs, select the Tenants tab.
In the Tenant list, for the given tenant, select Configure.
In the left pane, select Integration.
In the right pane, in Mapbox API Key, specify the token value.
Select Save.

Incorta Analytics and Loader Service

The 4.9 release introduces several key improvements to the Incorta Analytics and Loader Services such as:

New Analyzer
Dashboard tabs
Advanced Map visualization with Mapbox
Rich Text visualization
Sankey visualization
Sunburst visualization
Date Part
Cisco Meraki Connector
Splunk Connector
System variables for incremental extracts
Global Variable
Schema Notifications
Schema Manager supports Filter by Load Status
Load Job Viewer enhancements
Continue on Error and Finished with Errors status
Save without validation and discovery
Incorta PostgreSQL for materialized views
SparkR support for materialized views

New Analyzer

Embracing the design themes of consistency, clarity, and efficiency, this release showcases modern analytics engineering with the introduction of an all new, and greatly improved, Analyzer experience that promises increased productivity and delight.

You can use the new Analyzer for the following:

To create and edit an insight on a dashboard tab
To create and edit an Incorta Table in a physical schema
To create and edit an Incorta View in a business schema

Anatomy of the New Analyzer

The Analyzer now opens in full and there is no Navigation bar. You must select Save or Cancel to close the Analyzer. Here is a high level description of the Analyzers’s anatomy:

Action bar
Data panel
Manage Data Sets panel
Insight panel
Properties panel
Filter panel
Filter Values panel
Visualization canvas
Settings panel

Action bar

In the Action, you can perform the following

Download (download icon)
SQL
Settings
Cancel
Save

When supported by the configured insight visualization, you can view the Reference SQL and/or download a file as.csv or .xlsx.

Data panel

To filter and find items in the Data panel, enter a search term in the Search text box or use the Column Type drop down menu to narrow your results. Column Types include:

String
Numerical
Date
Timestamp
Boolean
Key

For a given column in the tree, select the information icon to view the column details and preview sample data.

You can also manage the tree hierarchy. The More Options (kebab icon) menu allows you to:

Collapse to Schema and Table Level
Sort by Name or Original Order

Multi-select

To select multiple columns in the Data panel, you must use the following keystrokes:

On Mac OS, use CMD
On Windows OS and Linux OS, use ALT

You must drag and drop multi-select columns to a tray or target box in another panel.

Manage Data Sets panel

You can use the Manage Data Sets panel to add selected schemas, business schemas, tables, or views to the Data panel.

The Manage Data Sets panel contains the Views and Tables tabs that can be filtered using search.

Insight panel

The Insight panel shows the selected visualization. The default selecting is Listing Table. Simply select the downward arrow (V) to change the visualization type.

Visualizations

Here is a list of visualizations in this release:

Tables

Listing Table
Aggregate Table
Pivot Table

Charts

Column
Stacked Column
Percent Column
Bar
Stacked Bar
Percent Bar
Area
Stacked Area
Percent Area
Line
Stacked Line
Percent Line
Pie
Donut
Pie Donut
Sunburst
Combination
Spider
Line Time Series
Time Series
Area Range
Combo Dual Axis
Dual Axis
Dual X-Axis
Map
Bubble Map
Advanced Map
Funnel
Pyramid
Scatter
Treemap
Heatmap
Tag Cloud
Bubble
Packed Bubble
Organizational

Others

KPI
Rich Tex
Gauge
Solid Gauge

Trays

The visualization selection determines the available trays within the Insight panel. Rich Text is the only visualization without available trays.

From the Data panel, you can add one or more columns to a tray. When applicable, you can also add a formula to a tray. All trays now have a Clear All command. A column or formula in a tray is a Pill. Each pill has configurable properties. The parent tray determines the available properties of a pill.

Important

About a pill name:
To change the name of the column or formula that is a Pill, you must double-click. In the text box, you can modify the name. In this release, there is no visible Name or Label property for a pill.

Here is a list of some of the trays available within the Insight panel:

Grouping Dimension
Coloring Dimension
Row (only for Pivot Table)
Column (only for Pivot Table)
Measure (available in all visualizations except Advanced Maps and Rich Text)
Layers (only for Advanced Map)
Color By (only for Advanced Map)
Size By (only for Advanced Map)
Sort By (only for Listing and Aggregated Tables where all pills are in the Measure tray)
Source (only for Sankey)
Target (only for Sankey)
Individual Filter (can view Filter Values)
Aggregate Filter (can view Filter Values)
Distinct Filter (only for Listing and Aggregated Tables where all pills are in the Measure tray)

Properties panel

Using the Properties panel, you can easily modify the properties of a pill. Here are some examples:

easily apply and remove formatting, including conditional formatting
quickly select a drill down link using a tree view of dashboards and dashboards tabs
for a gauge, add a range of a specific color
copy and paste a bulk list of individual filter values
define a date part for a timestamp or date column

Depending on the tray, not all pills have configurable properties. For example, the Distinct Filter does not offer a Properties panel. Some pills allow for direct configuration, such as a pill in the Sort By tray where you can directly set the sort direction.

Grouping Dimension

Date Part for timestamp and date columns
URL
Show Empty Groups
Sort By with Clear All
Dashboards Drill Down

Measure

Because properties are specific to the given visualization, not all properties are applicable.

Date Part for timestamp and date columns
Aggregation
Scale
Running Total
Filter (this is a measure filter)
Format
Conditional Formatting (available for Listing Table, Aggregate Table, Pivot Table and KPI)
Abbreviate on Hover (Available for most chart visualizations)
Color (Available for most visualizations)
Plot Band (Available for most Column, Bar, Area, and Line charts)
Minimum (Only for gauges)
Maximum (Only for gauges)
Gauge Ranges (Only for gauges)
Dashboards Drill Down
Base Field (Not available for Advanced Map)
Average Lines (Available for most Column, Bar, Area, and Line charts)
Query Plan (read only)

Individual Filter

Specify the filter operator, select values, edit bulk values, and add individual values in this panel.

Date Part for timestamp and date columns
Operator
Values
Add

Note

If a pill does not have a defined filter, the pill will show a validation warning (red circle).

Aggregate Filter

Specify the aggregation type, the filter operator, and values. If needed, edit in bulk values, and add individual values in this panel.

Date Part for timestamp and date columns
Aggregation
Operator
Values

Note

If a pill does not have a defined filter, the pill will show a validation warning (red circle).

Color By

Applicable only for an Advanced Map, Color By is a measure, so it has the properties of a measure, except for conditional formatting, with the addition of:

Color Palette

Size By

Applicable only for an Advanced Map, Size By is for a measure, so it has the properties of a measure, except for conditional formatting, with the addition of:

Radius

Row

Applicable only for a Pivot Table, Row is for a dimension, and has the following properties:

Sort By
Dashboards Drill Down

Column

Applicable only for a Pivot Table, Column is for a dimension, and has the following properties:

Sort By

Coloring Dimension

Applicable for most charts, Color Dimension is for a dimension, and has the following properties:

Sort By
Format Color Palette

Source

Applicable only for Sankey, Source is for a dimension, and has the following properties:

Sort By
Dashboards Drill Down

Target

Applicable only for Sankey, Target is for a dimension, and has the following properties:

Sort By
Dashboards Drill Down

Additional Panels for detailed properties

There are several additional panels for adding a supported feature to a visualization or for formatting a value conditionally. In these instances, an additional Properties panel opens in the Analyzer.

Add Plot Band

To add one or more bands to an applicable chart, in the Properties panel, select Add Plot band. The Add Plot Band panel contains the following properties:

Start
Stop
Label
Background (color selection)

Add Dashboard panel

To specify a drill down to a tab in the same or another dashboard, in the Properties panel, select Add Dashboard. The Add Dashboard panel contains the following properties:

Include Runtime Filters
Search
Tree control to select a dashboard tab

Add Average Line

Certain charts support adding an Average Line. This line can be an Average Line, Linear Trend, Simple Moving Average, or an Exponential Moving Average. To specify one or more Average Lines, in the Properties panel, select Add Average Line. The Add Average Line panel contains the following properties:

Average Line Type
Line Style
Period (Only for moving averages)

Conditional Formatting

For the Listing Table, Aggregated Table, Pivot Table, and KPI visualizations, you can specify one or more conditional formats for a given measure. In the Properties panel, in Conditional Formatting, select Add Conditional Format. The Conditional Formatting panel contains the following properties:

Aggregation
Value
Background
Text Color

Add Gauge range

For the Gauge or Solid Gauge visualizations, you can specify one or more Gauge Ranges for a given measure. In the Properties panel, in Gauge Ranges, select Add Gauge Range. The Gauge Range contains the following properties:

Stop %
Background (color)

Filter Values panel

For a given insight, you can now view all specified insight filters in the Filter Values panel. Insight filters are:

Individual Filter
Aggregate Filter
Distinct Filter

To open the Filter Values panel, select the View Filter Values icon for any of these trays. To close the panel, select X.

The Filter Value panel shows a summary of the filter properties. You can collapse and expand Individual Filter, Aggregate Filter, and Distinct Filter in this panel.

Individual Filter

For each pill, there is a summary of the column or formula, operator, preview of values, and count of values. Select the summary to open the Individual Filter panel for the given pill.

Aggregate Filter

For each pill, the summary shows the aggregate filter expression. Select the summary to open the Aggregate Filter panel for the given pill.

Distinct Filter

For each pill, the summary shows the distinct filter.

Visualization canvas

The visualization canvas shows a preview of the insight. You can configure an insight title and insight description for all visualizations other than Rich Text.

In this release, for Table visualizations, you can select and copy rows, columns, and or cells in the insight in the canvas. For the canvas, you can also select the full screen icon to collapse all open panels as well as the expand icon to open all closed panels.

Settings Panel

This release combines the General setting properties with the Layout properties for the selected insight visualization. Additional settings are Advanced, Format, and Map Settings.

General

Here are the properties for General settings:

Page Size
Max Rows
Logarithmic
Percentage of Column
Auto Refresh
Merge Columns (Aggregated Table),
Collapsed (Aggregated Table)
Merge Rows (Aggregated Table, Pivot Table)
Row Grand Total (Pivot Table)
Row Subtotal (Pivot Table)
Column Grand Total (Pivot Table)
Column Total At (Pivot Table)
Hide Columns (PivotTable)
Subtotal (Listing Table, Aggregated Table)
Total (Listing Table, Aggregated Table)

Layout

Here are the properties for Layout settings:

Fix Columns
Headers
Transpose
Rotation
Legend
Data Labels
Values
Hide Zero Values
Connect Values
Fixed Placement
X-Axis Labels
X-Axis Title
Y-Axis Labels
Y-Axis Title
Y-Axis Min
Y-Axis Max

Format

Format settings are only for KPI insight visualizations. Select a format to apply to all pills with the property Auto Format selected. In this manner, you can easily apply a format to all KPI measures. For an individual pill, you can override the Format property, if so desired.

Map Settings

Map Settings are only for Advanced Maps. Here are the properties for Map Settings:

Style
Data Labels
Legend

Advanced

Here are the properties for Advanced settings:

Max Groups
Missing Value Text
Join Measures

Dashboard tabs

In this release, a dashboard now includes one or more tabs. A tab is an easy and convenient way to organize a dashboard into logical sections. You can create drill down links between tabs in a given dashboard or another dashboard.

A new dashboard has a single default tab, Tab 1. A dashboard can include a maximum of 10 tabs. You can easily add a new tab, rename a tab, change the order of a tab, and delete a tab.

For tabs with an existing insight visualization, you can duplicate the tab, edit the layout, personalize the tab, refresh the data for insight visualizations on the tab, as well as download the tab as either a .xlsx (MS Excel file) or as a .html file.

You can also hide all tabs or show all tabs for a given dashboard.

About tab names

A tab must have a name that:

is unique to the dashboard
has a least 1 character and no more than 255 characters in length
can contain spaces, special characters, and even Unicode emoji characters (utf-mb)

Note

Only the first 20 characters of a tab name will appear in the tab.

Add a new tab

To add a new tab, simply select + next to the existing tab.

Rename an existing tab

To rename a tab, for the given tab, select More Options (kebab icon).
In the More Options menu, select Rename.
In the tab text box, enter a new name.
To save your changes, press Enter or Return.

Change the order of a tab

To change the order of a tab, simply drag and drop the tab to the left or right of an existing tab.

Delete a tab

To delete a tab, for the given tab, select More Options (kebab icon).
In the More Options menu, select Delete.
In the dialog, select OK.

Duplicate an existing tab

To duplicate a tab, for the given tab, select More Options (kebab icon).
In the More Options menu, select Duplicate.

Edit the tab layout

You can edit the layout of a tab that has two or more insights and has not been personalized.

To edit the layout of tab insights, for the given tab, select More Options (kebab icon).
In the More Options menu, select Edit Layout.
Make your layout changes.
In the Action bar, select Save.

Personalize the tab

To personalize dashboards and tabs, the user must belong to a group that has the Dashboard Analyzer role. A personalized tab includes the ability to edit the tab layout.

To personalize the tab insights, for the given tab, select More Options (kebab icon).
In the More Options menu, select Personalize.
Make your personalization changes.
In the Action bar, select Save.

Alternatively, for the selected tab, in the dashboard Action bar, you can select the Personalization icon, and in the menu, select Personalize.

Refresh data for a tab

To refresh the data for insights on a given tab, select More Options (kebab icon).
In the More Options menu, select Refresh Data.

Download

You can download a tab as either a .xlsx (MS Excel file) or as a .html file. The .xlsx download option only supports these insight visualizations types:

Listing Table
Aggregated Table
Pivot Table

To download a tab as a .xlsx file, follow these steps:

For the given tab, select More Options (kebab icon).
In the More Options menu, select Download → XLSX.

To download a tab as a .html file, follow these steps:

For the given tab, select More Options (kebab icon).
In the More Options menu, select Download → HTML.

Hide all tabs

To hide all tabs, the dashboard must be in its Original View with one or more tabs showing. Here are the steps to Hide all tabs:

In the Action bar, select More Options (kebab icon).
In the menu, select Hide Tabs.

Note

When enabling Hide Tabs, the insights for the selected tab are initially visible as the dashboard. However, when closing and returning to the dashboard, only the first tab insights are visible as the dashboard.

Show all tabs

To show all tabs, the dashboard must be in its Original View with all tabs hidden. Here are the steps to Show all tabs:

In the Action bar, select More Options (kebab icon).
In the menu, select Show Tabs.

Advanced Map visualization with Mapbox

This release includes the Advanced Map visualization in the new Analyzer that uses Mapbox.

Map Settings

Map Settings are only for Advanced Maps.

To open the Settings panel, in the Action bar, select Settings (Gear icon). In Map Settings, you configure the properties for these settings. Here are the properties for Map Settings:

Style (Auto, Light, Dark, Satellite, Outdoors, Satellite Streets)
Data Labels
Legend

About Layers

A layer contains the data (Geo Data) for a specified geographical entity (Geo Entity) that Mapbox visualizes. An Advance Map visualization can have one or more layers. Each layer has a unique name, a configurable Layer Settings, and configurable Geo Data.

A dashboard user typically zooms (in and out) from one layer into another layer on an Advanced Map insight.

Layer Settings

Each layer has Layer Settings:

Property	Description
Data Labels	Enable toggle to view geo data labels
Legend	Enable toggle to view the map legend
Visible	Enable toggle to view the layer
Type	Select Type Marker, select Shape(Pin, Square, Cross, Times, or Circle) Area Bubble, define Size by measure Heatmap
Zoom range	Use slider to narrow initial zoom from 0% to 100%
Opacity	Use slider to set opacity for the selected type, 0% to 100%

About Geo Data

Geo Data defines the data for the layer. It specifies the Geo Entity for the layer and the aggregation measurements for the entity. The aggregations related to the Geo Entity appear in the map visualization.

For example, the colors of a country represent the various population sizes where the Geo Entity is a country and the color by measure is city populations.

In Geo Data, Color By, Size By, and Tooltip contain the various aggregations. Color By is the only required measure. Bubble visualization types require a Size By measure. A Tooltip can contain multiple measures.

About a Geo Entity

A Geo Entity can be either a Geo Attribute or a Geospatial Point.

Geospatial (Lat/Long)

A Geospatial Point is a tuple of latitude and longitude. Geospatial has one column or formula for Latitude and another column or formula for Longitude.

Property	Description
Latitude	Column or formula that will map the latitude
Longitude	Column or formula that will map the longitude

Geo Attribute Property

A Geo Attribute is a column that conforms to a Geo Role. A Geo Role is a Country, County, County Subdivision, State, City, or Zip Code. Mabpox identifies Geo Roles using its own repository. A Geo Attribute is has only one column or formula for the following:

Property	Description
Geo attribute	Column or formula that will map the Geo Role
Geo Role	Country, State, County, County Subdivision, City, Zip Code
Dashboard Drill down

Note

Zip Code is a Geo Role that is applicable only for the United States.

About Geo Data measures

Geo Data requires one measure for aggregation. For additional aggregations, create additional layers.

You can add only one column or formula that will serve as a measure for each of the following:

Color By
Size By
Tooltip

Color By

You can add only one column or formula as the Color By measure. You can define the properties for a Color by measure as follows:

Aggregation
Scale
Running Total
Filter (measure filter)
Format
Color Palette

Size By

Size By is only applicable for Geo Entity with a Layer Setting of the Bubble type. You can add only one column or formula as the Size By measure. Here are the properties for a Size By measure:

Aggregation
Scale
Running Total
Filter (measure filter)
Format
Color Palette

Tooltip

A Tooltip appears in the mouse hover state over the Geo Entity on the map. You can optionally measure Geo Data with one or more Tooltips. You can add one or more columns or formulas as a Tooltip. Here are the properties of a Tooltip:

Aggregation
Scale
Running Total
Filter (measure filter)
Format
Color Palette
Enable Base field

Create an Advanced Map visualization

For an existing dashboard, to create an Advanced Map visualization insight, follow these steps:

If not already open, open a dashboard.
To add a new insight to the dashboard, in the Action bar, select +.
If needed, in the Analyzer, in the Data panel, select Add Data Set.
In the Manage Data Sets panel, select one or more business schema views and/or one or more physical schema tables.
To close the Manage Data Sets panel, select X or any other area of the Analyzer.
In the Insight panel, select Listing Table or V (down arrow).
In the Insight panel, in Charts, select Advanced Map.

Managing Layers

To modify the name of layer 1 in Insight panel, follow these steps:

To change the default layer 1 name, double click the layer 1 pill.
In the text box, enter a layer name, press Enter or select any other area of the Analyzer.

To add a new layer for an Advanced Map insight, follow these steps:

In the Insight panel, in Layer, select + Add Layer.

To change the layer order of an Advanced Map insight, follow these steps:

In the Insight panel, in Layer, select the specific layer pill that you want to reorder.
In Layer, to move the layer up, select Up (up arrow icon ↑).
To move the layer down, select Down (down arrow icon ↓).

To modify the Layer Settings, follow these steps:

In the Insight panel, for a given Layer pill, select > (right arrow).
In the Properties panel, in Layer Settings, modify the settings properties:
- Data Labels
- Legend
- Visible
- Type
- Zoom Range
- Opacity
To close the Properties panel, select X or any other area of the Analyzer.

Specify and configure Geo Data

To specify a Geo Attribute and define the properties of the Geo Data for a given layer, follow these steps:

In the Insight panel, in Layer, select a layer.
In Geo Data, select Geo Attribute.
From the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Geo Attribute target box.
In the Properties panel, define the Geo Role.

To specify a Geospatial Point and define the properties of the Geo Data for a given layer, follow these steps:

In the Insight panel, in Layer, select a layer.
In Geo Data, select Lat/Long.
From the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Latitude target box.
From the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Longitude target box.

Here are the steps to specify the Color By measure:

For the given layer’s Geo Data Geo Attribute, from the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Color By target box.
In the Properties panel, define the measure properties.

Here are the steps to specify the Size By measure:

For the given layer’s Geo Data Geo Attribute, from the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Size By target box.
In the Properties panel, define the measure properties.

Here are the steps to specify the Tooltip measure:

For the given layer’s Geo Data Geo Attribute, from the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Tooltip target box.
In the Properties panel, define the measure properties.

Rich Text visualization

This release includes a new visualization for Rich Text. Using a built-in, What-You-See-Is-What-You-Get (WYSIWYG) editor that embeds into the Analyzer, you can easily create, edit, format, and preview rich text. Incorta stores the rich text as HTML for the insight.

The WYSIWYG editor for the Rich Text visualization supports the following:

Text font selection and font size.
Text formatting
Text coloring, including custom colors, for text foreground and background.
Text alignment and image (left, right, center, and justified)
Text indentation
Unordered and ordered bullet lists
GIF, JPG, and PNG image embedding as a HTTP source <img src="https://www.mywebsite.com/myimage.jpg">
Copy and paste GIF, JPG, and PNG
Edit a copied image using the Edit Image controls such as brightness, contrast, gama, cropping, orientation, and mirroring.
Link embedding as a HTTP source <a href="https:/www.mywebsite.com">Link</a>
Referencing system, internal, and external session variables with the $$ syntax such as $$user and $$currentDate, even for attribute values of an html element
Viewing and editing the HTML source
<IFRAME> in HTML source

Note

It is possible to create a web link to another dashboard or dashboard tab using the full HTTP URL. However, these links will not function as drill down links to other dashboards with regards to optionally applying dashboard runtime filters.

Create a Rich Text visualization

For an existing dashboard, to create a Rich Text visualization insight, follow these steps:

To add a new insight to the dashboard tab, in the Action bar, select +.
In the Insight panel, select Listing Table or V (down arrow).
In the Insight pane, in Others, select Rich Text.
In the Rich Text editor, add and format your text.
To optionally preview your changes, in the Menu bar, select View → Preview, or in the Toolbar, select Preview (Eye icon), and then select X or Close.
To save your changes, in the Actions bar, select Save.

Sankey visualization

This release includes the new Sankey visualization. A Sankey chart visualizes the flow between two or more nodes. A node in can be a source node, a target node, or both. Incorta dynamically determines an intermediary node, a node which is a source and a target. For intermediary nodes, a user can select a node in the visualization, and Filter by Source or Target.

The flow lines that link between source and target show as individual colored bands that visualize via the width of the band the weight of the measure. In this sense, each link has three parameters: from, to, and weight.

Example Data

Country_Source, Country_Target, Measure<
Brazil, Portugal, 5,
Brazil, Spain, 1,
Canada, Portugal, 1,
Mexico, Portugal, 1,
Mexico, Spain, 5,
Portugal, Egypt, 2,
Portugal, Senegal, 1,
Portugal, Morocco, 1,
Spain, Senegal, 1,
Spain, Morocco, 3,
Egypt, China, 5,
Egypt, India, 1,
Egypt, Japan, 3,
Senegal, China, 5,
Senegal, Japan, 3,
Morocco, China, 5,
Morocco, India, 1,
Morocco, Japan, 3

In the example above, there are several countries that are both source and targets: Portugal, Spain, Senegal, Egypt, and Morocco. Incorta dynamically calculates the weight for the bands between source and target nodes.

Important

In this release, all columns must be a data backed column or persisted computed column from a physical schema table (formula column). A Sankey insight will not recognize a runtime formula column for a measure. A runtime formula column exists in the insight itself or exists as a formula column in a business schema view. In addition, because of dynamic associations, Filter by Source and Target will invoke errors for a Source or Target which are runtime formula columns.

Create a Sankey visualization

For an existing dashboard, to create a Sankey visualization insight, follow these steps:

To add a new insight to the dashboard tab, in the Action bar, select +.
If needed, in the Analyzer, in the Data panel, select Add Data Set.
In the Manage Data Sets panel, select one or more business schema views and/or one or more physical schema tables.
To close the Manage Data Sets panel, select X or any other area of the Analyzer.
In the Insight panel, select Listing Table or V (down arrow).
In the Insight panel, in Charts, select Sankey.
From the Data panel to the Insight panel, add…
- one column or formula to Source
- one column or formula to Target
- one data backed column to Measure
To save your changes, in the Actions bar, select Save.

Sunburst Visualization

A sunburst chart visualizes hierarchical data in a circular shape. Parent nodes in the hierarchy are inner elements, and the outer rings of elements are child nodes. Multiple grouping dimensions characterize a single aggregated measure. The Sunburst visualization supports both columns and runtime formula columns.

The visualization supports the following user interactions:

Select an element in an outer ring to Expand or Filter by:
- Filter by applies a dashboard runtime filter.
- Expand drills-in-place; select the center element to return back from the drill-in-place.

Create a Sunburst Visualization

In the Analyzer, in the Insight panel, the pill order of grouping dimensions affects the hierarchy of nodes: the first pill is the parent, and the last pill is the child.

For an existing dashboard, to create a Sunburst visualization insight, follow these steps:

To add a new insight to the dashboard tab, in the Action bar, select +.
If needed, in the Analyzer, in the Data panel, select Add Data Set.
In the Manage Data Sets panel, select one or more business schema views and/or one or more physical schema tables.
To close the Manage Data Sets panel, select X or any other area of the Analyzer.
In the Insight panel, select Listing Table or V (down arrow).
In the Insight panel, in Charts, select Sunburst.
From the Data panel to the Insight panel, add…
- one or more columns and formulas to Grouping Dimensions
- only one column or formula to Measure.
To save your changes, in the Action bar, select Save.

Date Part

In the new Analyzer, for this release, you can now select a date part for a date or timestamp column in a Grouping Dimension or Measure. You can also specify a date part column as an individual filter for an insight. Using a date part is equivalent to using a built-in Date function. The options for date parts are:

Full, the column itself
Year, as year(date exp)
Quarter, as quarter(date exp)
Month, as month(date exp)
Day, as day(date exp)

Date parts are available in:

Analyzer for an Incorta Table
Analyzer for an Incorta View
Analyzer for an Insight

In this release, a date part supports only data-backed date and timestamps columns. A date part does not support a runtime formula column in a business schema view or Incorta View.

A date part column supports dashboard prompt and applied filters. You can sort a date part column by defining a Sort By date.

Configure a column as a Date Part

Here are the steps to configure a column as a date part.

In the Data panel, select either a date or timestamp column to add to one of the following trays in the Insight panel:
- Grouping Dimension
- Measure
- Individual Filter
- Aggregate Filter
Select the pill in the tray.
In the Properties panel, in Date Part, select one of the following:
- Full
- Year
- Quarter
- Month
- Day
To save your changes, in the Action bar, select Save.

Cisco Meraki Connector

Cisco Meraki is a cloud IT management software that provides users with a scalable and secure solution that can help them create and control their networks. Cisco Meraki’s products include wireless, switching, security, enterprise mobility management and security cameras, all centrally managed from the web. To learn more, please review Connectors → Cisco Meraki.

Splunk Connector

Splunk is a software product that captures, indexes, and correlates real-time, machine-generated data in a searchable repository from which it can generate graphs, reports, alerts, dashboards, and visualizations. Currently, the Splunk connector extracts data represented as Splunk reports. To learn more, please review Connectors → Splunk.

System variables for incremental extracts

In this release, there are two new system variables that you can use to specify a window of time for incremental table loads for a table with a SQL data source. The variables are:

$$job_extract_start_time Dynamically evaluates to the start time of the query execution (extraction)
$$job_extract_reference_time Dynamically evaluates to the Last Successful Extract Time

You can use these variables in the WHERE clause of a SQL query. The new system variables support both Query and Update Query configurations. Here is an example:

SELECT
    `GUID`,
    `TENANT_ID`,
    `PARENT_GUID`,
    `TARGET_TYPE`,
    `TARGET_NAME`,
    `TARGET_ID`,
    `JOB_TYPE`,
    `STATE`,
    `LEADER_NODE_NAME`,
    `START_TIME`,
    `END_TIME`,
    `LAST_MODIFIED`,
    `DURATION`,
    `MESSAGE`
FROM
    `incorta_metadata`.`JOB`
WHERE `START_TIME` >= $$job_extract_reference_time
  AND `START_TIME` <= $$job_extract_start_time

Global Variables

This release introduces global variables. Unlike other objects in Incorta, a global variable is available to all tenant users. A global variable has a name, description, type, and value. A global variable is a static variable.

To learn more, review Concepts → Global Variable.

Schema Notifications

This release introduces schema notifications. A schema notification is an email notification that contains the sender name, the schema name, the schema load status, and a direct link to the Load Job Viewer that contains the schema load job summary and details.

A notification name:

is between 1 and 255 characters
can contains spaces and special characters

You can also specify text up to 4,000 characters in an email body.

Note

Notifications require a tenant email configuration for an outgoing email server using SMTP or EWS in the Cluster Management Console (CMC).

Create a schema notification

In this release, you can create a schema notification using the Schema Manager or the Scheduler.

With the Schema Manager, create a schema notification

There are two ways to open the Create Notification via Email dialog from the Schema Manager:

In the Navigation bar, select Schema.
In the Schema Manager, in the Context tab, select the Schemas tab.
In the List View, select the checkbox for one or more schemas.
In the Search bar, select More Options (kebab icon).
In the More Options menu, select Create Notification.

In the Navigation bar, select Schema.
In the Schema Manager, in the Context tab, select the Schemas tab.
In the List View, for a given schema row, select More Options (kebab icon).
In the More Options menu, select Create Notification.

To create an email notification, follow these steps:

In the Create Notification via Email dialog, enter the Notification Name.
In Notify On, select Success and/or Failure.
In Select Schema(s), select one or more schemas.
In Recipients, specify at least one
- Username
- Group
- or enter an email address
For each Recipient, specify if this is TO, CC, or BCC.
In Body, optionally enter the email text.
To save, select Done.

With the Scheduler, create a schema notification

Here are the steps to create a schema notification with the Scheduler:

In the Navigation bar, select Scheduler.
In the Scheduler, in the Context tab, select the Schema Notifications tab.
In the Action bar, select + New → Create Notification.
In the Create Notification via Email dialog, enter the Notification Name.
In Notify On, select Success and/or Failure.
In Select Schema(s), select one or more schemas.
In Recipients, specify at least one
- Username
- Group
- or enter an email address
For each Recipient, specify if this is TO, CC, or BCC.
In Body, optionally enter the email text.
To save, select Done.

Manage schema notifications

Only the tenant super user, users that belong to a group assigned the SuperRole role, or users that belong to a group assigned the Schema Manager role, can both create and manage schema notifications for a given tenant.

Search for a schema notification

To search for a schema notification follow these steps:

In the Navigation bar, select Scheduler.
In the Scheduler, in the Context tab, select the Schema Notifications tab.
In the Search bar text box, enter a search term.

Edit a schema notification

To edit a schema notification, follow these steps:

In the Navigation bar, select Scheduler.
In the Scheduler, in the Context tab, select the Schema Notifications tab.
In the List View, select the schema notification row, or in the right row gutter, select Edit (pen icon).
In the Edit Notification dialog, modify any of the following:
- Notification Name
- Notify On
- Select Schema(s)
- Recipients
- Body
To save, select Done.

Delete one or more schema notifications

To delete one or more schema notification, follow these steps:

In the Navigation bar, select Scheduler.
In the Scheduler, in the Context tab, select the Schema Notifications tab.
In the List View, select the checkbox for each row for deletion.
In the Search bar, select Delete (trash icon).
In the Delete notification(s) dialog, select Delete.

To delete a single schema notification, follow these steps:

In the Navigation bar, select Scheduler.
In the Scheduler, in the Context tab, select the Schema Notifications
In the List View, highlight the specific schema notification row, and in the right row gutter, select Delete (trash icon).
In the Delete notification(s) dialog, select Delete.

Schema Manager supports Filter by Load Status

In the Schema Manager, for the Schema tab, the List View of schemas now shows the following columns:

Name
Last Successful Load
Owner
Modified By
Status
Last Load
Next Load
Data on Disk
More Options (Kebab)

In this release, it is now possible to filter by one or more schema load statuses. The schema load status options are:

Succeeded
Finished With Errors
Failed
Interrupted
Running
In Queue

Load Job Viewer enhancements

For a selected physical schema in a given tenant, the Load Job Viewer shows both current load jobs and previous load jobs. In this release, the Jobs section contains a summarized table of all load jobs for the selected schema. For the selected load job summary, the Job details section contains all the load details for each and every table in the load.

Important

Unlike previous release, in this release, the Load Job Viewer only shows the status of a load job from the perspective of the Loader Service, not the Analytics Services. This change from previous release means that the Load Job Viewer no longer waits for the Analytics Service to report the successful load of performance optimized tables into memory. For this reason, in certain cases, the Load Job Viewer may report the successful status of a load job, but a dashboard insight that references the successfully loaded schema will show the message, The data is being refreshed. This means that the Analytics Service is still loading into memory the related performance optimized tables required by the insight query.

There are two ways to access the Load Job Viewer:

In the Schema Manager, in the List View, for a schema with a Load Status other than No Data Loaded, select the Load Status link:
- Success
- Finished with Errors
In the Schema Designer for a given schema, in the Summary section, in Last Load Status, select the link:
- Please Load Data
- Date Time
- Load Status such as In Queue or Finished with Errors

About a load job summary

A load job summary contains the following details:

Service
Start Time
End Time
Load Status
Duration

About a load status

In this release, Load Status now shows the status for a given load job. The statuses are:

Succeeded
Finished With Errors
Failed
Interrupted
Running
In Queue

Filter by Load Status

New to this release is the ability to filter the summarized table of load jobs by Load Status. In the Job Summary section, in the Load Status column, you can now filter the load job summaries by one or more load status filters. The filter options are:

Succeeded
Finished with Errors
Failed
Interrupted
Running
In Queue

A tooltip for Load Status details the durations of parallelized activities and states within a load job. A given loader service executes many load job activities in parallel. As a result, the load status tooltip depicts activity durations that, when aggregated, are typically greater than the overall duration of the load job itself. The tooltip statuses are:

In Queue
Extraction
Enrichment
Load
Post-load

About the Job Details section

This release enhances the extraction and load details for each individual table in the selected load job. The Job Load Details table shows the following sortable columns:

Name: The name of the table with the option to filter by search term
Load Type: Full, Incremental, Staging; this is a Sortable column
Extraction - Start: the start time of the table extraction by the loader service; Sortable
Extraction - Duration: the duration time
Extraction - Extracted: the number of rows extracted
Extraction - Rejected: the number or rows rejected in the extraction
Load - Start: the start time of the table load into the analytics service
Load - Duration: the duration time
Load - Loaded: the number of rows loaded by the analytics service
Status: the current load status of the table with the option to filter by one or more statuses

About the table load status

In this release, the status column now shows the load status for a given table. While a load is running, these statuses are viewable. The table load statuses are:

In Queue
Extracting
Enriching
Loading
Post-loading
Success
Failed

Filter by table load status

New to this release is the ability to filter the tables for a specific load job by table load status. In the Job Details section, in the Status column, you can now filter by one or more status filters. The filter options are:

Succeeded
Failed
Interrupted
Running
In Queue

A tooltip for the table load status details the durations of parallelized activities and states within the extraction and load phases of a table load. The tooltip statuses are:

In Queue
Extraction
Enrichment
Load
Post-load

Note

The activities and related states differ based on the load type (Full, Incremental, or Staging) as well as the table type such as a SQL Database table, Materialized View, or Incorta table.

Continue on Error and Finished with Errors status

Continue on Error is a new feature in this release. The feature allows a load job to continue even when there are errors. In other words, a load job for a schema will continue when there is an error or exception. The job will complete and the load job report will depict a load status of Finished with Errors.

Here are the types of errors that will not stop the load job for a schema:

A join creation error
A formula calculation error
An error with an alias table selected table reference
An error with an Incorta Table (a table created with the Analyzer)

Join errors will write an internal error flag within the direct data mapping (snapshot) files. Formula calculation errors will result in a column with null values. An error with an Incorta table results in empty table columns.

Here are the steps to view the details of Finished with Errors:

In the Load Job Viewer, in Jobs, select a specific load job with errors.
For a job with errors, in Load Status, select Finished with Errors.
In the Job Errors dialog, review the error.
Optionally select Copy to Clipboard.
For each error, to review the specific error message and error trace, select View Details.
Optionally select Copy to Clipboard.
To close the details dialog, select Ok.
To close the Job Errors dialog, select Ok.

Save without validation and discovery

In this release, you can now modify and save a materialized view, single-source table, or multi-source table without validating any script changes or discovering changes to the output columns.

Save without validation and discovery for materialized views

You can now save script changes to a materialized view without validating the changes. In the Table Editor, in the Table summary section, a materialized view that has not been validated will show a status of Not validated. In the Schema Manager, in Tables, a materialized view that has not been validated will show the warning, Not validated.

There are several ways to resolve the Not validated warning for a materialized view:

Manually validate the script changes in the Data Source dialog
If the materialized view does not support incremental loading, perform a successful full load for the materialized view
If the materialized view supports incremental loading, perform both a successful full load and incremental load for the materialized view

Note

A materialized view without validation will successfully load if the columns in the previous version and the Not validated version are the same. Discovery in this sense is for column names, not column data types.

Here are the steps to save changes to a materialized view script without validation:

For the given schema in Schema Designer, select an existing Materialized View.
In the Table Editor, in the Table summary section, to open the Data Source dialog, select the table icon.
To edit the execution code, in Script, select one of the following:
- with Notebook Integration enabled, to open the Script Editor, select Edit Query.
- with Notebook Integration enabled, to open the Notebook Editor, select Edit in Notebook.
- without Notebook Integration enabled, select the Script open icon or textbox.
Modify the execution code in either the Script Editor or the Notebook Editor.
To close the Script Editor or the Notebook Editor, select Done.
In the Data Source dialog, select V (down arrow) → Save Script Only.
In the Table summary section, verify Not validated.
In the Action bar, select Done.

Save without validation and discovery for single-source and multi-source tables

A single-source table is a physical schema table with only one defined data source. A multi-source table is a table with two or more defined data sources.

This release supports editing a single-source or multi-source table without discovering the columns of a specified data source.

In the Table Editor, in the Table summary section, a data source for a table that has not been validated will show a status of Not validated. In the Schema Manager, in Tables, a table that has not been validated will show the warning, Not validated.

There are several ways to resolve the Not validated warning for a single-source or multi-source table:

In the Data Source dialog, select Validate.
If the table does not support incremental loading, perform a successful full load for the table
If the table supports incremental loading, perform both a successful full load and incremental load for the table

Note

For a multi-source table with a data source that is Not validated.it is not possible to manage the output columns. A table without validation will successfully load if there are common columns in the previous version and the Not validated version. Discovery in this sense is for column names, not column data types.

Here are the steps to save changes to either an existing single-source or multi-source table without validating the changes:

For the given schema in Schema Designer, select a multi-source table.
In the Table Editor, in the Table summary section, to open the Data Source dialog, select a table icon.
Make the required changes.
In the Data Source dialog, select V (down arrow) → Save Without Discovery.
In the Table summary section, verify Not Validated for the modified data source.
In the Action bar, select Done.

Incorta PostgreSQL for materialized views

You can now create a materialized view using PostgreSQL. Using the Script Editor, you can define a SQL SELECT statement using the PostgreSQL syntax. This new feature replaces the need to create a PostgreSQL data source in order to materialize a schema table or business schema view as Apache Parquet in shared storage.

Important

This release supports a single threaded JDBC connection for a PostgreSQL materialized view. Because data is serialized from memory into the PostgreSQL protocol and then deserialized back into memory, for large tables over hundreds of millions of rows, you may run into performance issues and scalability limits.

Unlike the Spark SQL option for a materialized view, PostgreSQL enables the following use cases and scenarios:

Querying a business schema view
Querying formula columns in a physical schema view from another schema table
Using PostgreSQL built-in functions

Create a Materialized View with PostgreSQL using the Script Editor

In this release, for a PostgreSQL materialized view, only the Script Editor is available. Here are the steps to creating Materialized View with PostgreSQL using the Script Editor:

For the given schema in Schema Designer, in the Action bar, select + New → Materialized View.
In the Data Source dialog, in Language, select Incorta PostgreSQL.
In Script…
- without Notebook Integration enabled, to open the Script Editor, select the Script open icon or text box.
- with Notebook Integration enabled, select Edit Query.
Enter your PostgreSQL SELECT statement.
Select Done.
Select Save.
Specify a Table Name.
In the Action bar, select Done.

SparkR support for materialized views

This release supports creating a materialized view using SparkR. SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. R is one of the most popular programming languages for statistical modeling and analysis.

About R

R is a freely available language and environment for statistical computing and graphical analysis. R provides support for a wide variety of statistical and graphical techniques: linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, and many more. R also supports data wrangling. For example, packages such as dplyr or readr can transform messy data into a structured form. R simplifies quality plotting and graphing with its native support for libraries such as ggplot2 and plotly. In addition, R has a rich set of packages with over 10,000 packages in the CRAN repository.

There are many packages that provide support for machine learning algorithms related to classification, regression, and neural networks.

About SparkR

To learn more about the SparkR, please review the documentation for SparkR for Apache Spark 2.4.3 as this is the version that comes bundled with Incorta.

Installation Requirements

All Incorta Nodes in the given cluster require R 3.4 or above. You can find R available for download at https://cran.r-project.org/mirrors.html

Additional R packages

After confirming the successful installation of R3.4 or above on all hosts, you must also install the following packages from R shell:

To install the Knitr package, enter the following command in R shell:

install.packages("knitr")

To install the Stringi package, enter the following command in R shell:

install.packages("stringi")

To install the Stringr package, enter the following command in R shell:

install.packages("stringr")

To install the httr package, enter the following command in R shell:

install.packages("httr")

To install the SparkR package, enter the following command in R shell:

install.packages("https://cran.r-project.org/src/contrib/Archive/SparkR/SparkR_2.4.3.tar.gz", repos = NULL, type="source")

SparkR Example The following example reads the rows from the SALES.CUSTOMERS table and then persists the results:

s = read("SALES.CUSTOMERS")
save(s)

Available Helper Methods

There are several helper methods available for creating a materialized view with R.

Method	Description
`get_last_refresh_time ( ): Long`	Returns the last refreshment date for the materialized view
`save(dataFrame: DataFrame): Unit`	Required method to persist the materialized view
`read(tableName: String): DataFrame`	Read a schema table and return the table as a dataframe
`readFormat(format: String, path: String): DataFrame`	Reads a data source and a path and returns a dataframe object

Available Helper Methods with Notebook Integration enabled

In addition to the existing helper methods, here are the following helper methods for R and Notebooks:

Method	Description
`display(dataFrame: DataFrame): Unit`	Displays the dataframe results
`incorta$show(df: DataFrame)`	Shows the results of the dataframe
`incorta$printSchema(df: DataFrame)`	Prints the schema of the dataframe
`incorta$describe(df: DataFrame): Unit`	Displays for each column in the schema, the count, mean, standard deviation, min, and max values
`incorta$head(dataFrame: DataFrame, n: Int=1): Unit`	Displays the N number or dataframe results, where N is optional
`incorta$put(key: String, value: Object): Unit`	Adds a new property to the map of properties
`incorta$get(key: String): Object`	Retrieves a property from the map of properties

Create a materialized view with SparkR using the Script Editor

With Notebook Integration disabled for a given tenant, you can only edit a materialized view using the Script Editor. In this release, with Notebook Integration enabled for the tenant, you can edit using the materialized view using either the Script Editor or with the Notebook Editor.

You must call the save(dataframe) method to persist the materialized view.

Here are the steps to create a materialized view with SparkR using the Script Editor:

For the given schema in Schema Designer, in the Action bar, select + New.
In the Add New menu, select Materialized View.
In the Data Source dialog, in Language, select Spark R.
In Script…
- without Notebook Integration enabled, to open the Script Editor, select the Script open icon or text box.
- with Notebook Integration enabled, select Edit Query.
Enter your R code.
Select Done.
To specify additional materialized view Spark properties, select Add Property.
Select Save.
Specify a Table Name.
In the Action bar, select Done.

Create a materialized view with SparkR and the Notebook Editor

A SparkR notebook has the %r declaration. You must call the save(dataframe) method to persist the materialized view.

Here are the steps to create a materialized view with SparkR and the Notebook Editor:

For the given schema in Schema Designer, in the Action bar, select + New.
In the Add New menu, select Materialized View.
In the Data Source dialog, in Language, select Spark R.
In Script, select Edit in Notebook.
In one or more paragraphs, enter the R code for the materialized view.
Select Done.
To specify additional materialized view Spark properties, select Add Property.
Select Save.
Specify a Table Name.
In the Action bar, select Done.

Additional Considerations for SparkR and the Notebook Editor

In certain cases, a notebook paragraph in R will show a status of Finished even though the paragraph output reports an error. Some errors will show in the data source dialog. Check the application logs for the root cause of the stack trace.

Incorta ML New Features

In this release, there are several important new features added to Incorta’s Machine Learning (ML) Library:

Columns Encoding
Encoding Recommendations
Data Balancing
Model Building and Prediction
Model Evaluation
Time Series Forecasting

Python Requirements

The incorta_ml library supports Python 2.7, Python 3.5, Python 3.6, and Python 3.7. Pandas officially supports these versions of Python.

Important

Deprecation notice concerning Incorta Power BI Connector:
Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7.

Required Incorta ML Python libraries

Python 3.5 and higher requires the following libraries installed using pip:

pyspark
numpy
pandas
lime
fbprophet
statsmodels
pmdarima (python 3+)
imbalanced-learn

Python 2.7 requires the following libraries installed using pip:

pystan==2.17
subprocess32==3.2.6
numpy==1.15.4
scipy==1.2.2
networkx==2.2
matplotlib==2.1.0
pywavelets==1.0.3
scikit-learn==0.20.3
scikit-image==0.14.3
lime==0.1.1.30
statsmodels==0.10.2
pyramid-arima
holidays==0.9.12
fbprophet==0.5.0
seaborn==0.9.1
cufflinks==0.17.0
imbalanced-learn==0.4.3
imbalanced-learn==0.4.3
importlib_resources

Columns Encoding

Column encoding converts string and date type columns to numeric features.

Signature

from incorta_ml import encode_columns
output_df = encode_columns(df, handle, columns, is_training=False)

Parameters

Paramater	Description
`input_df`	a Spark dataframe that contains one or more string or date columns
`handle`	a unique identifier
`columns`	a list of column names that you want to encode
`is_training`	specify as True to build the transformation in Spark’s Directed Acyclic Graph (DAG) or if this is the first time to prepare the training data, otherwise specify as False to apply the transformation on a testing data set or development data set.

Returns

output_df: a dataframe with converted strings and date types as numeric features

Encoding recommendations

Returns a recommended list of column names for column encoding.

Signature

from incorta_ml import recommend_encoding
output_df = recommend_encoding(input_df, exclude, suppress_printing=True)

Parameters

Paramater	Description
`input_df`	a Spark dataframe that contains one or more string or date columns
`exclude`	a list of column names to exclude from recommendations
`suppress_printing`	a flag; False prints the recommended columns

Returns

output_df: dataframe that contains numeric features, transformed features, and labels

Data Balancing

Balances the training data of a classification problem based on a categorical column.

Signature

from incorta_ml import balance_data
output_df = balance_data(df, label_column_name)

Parameters

Paramater	Description
`input_df`	a Spark dataframe
`label_column_name`	categorical column for the balancing to be based on.

Returns

output_df: balanced dataframe

Model building and prediction

Builds the model and persists the model to disk. Incorta ML supports the following algorithms:

LogisticRegression
DecisionTreeClassifier
RandomForestClassifier
GBTClassifier*
MultilayerPerceptronClassifier*
LinearSVC*
NaiveBayes
LinearRegression
GeneralizedLinearRegression
DecisionTreeRegressor
RandomForestRegressor
GBTRegressor
IsotonicRegression

GBTClassifier and LinearSVC works only with binary classification data. Support for MultilayerPerceptronClassifier is in manual mode. You must specify as parameters the layers in the form of a two dimensional (2-D) array. You can add any number of hidden layers and hidden layer sizes:

[
  [number_of_input_features, hidden_layer_1, hidden_layer_2, hidden_layer_n, number_of_classes],
 ]

Signature

from incorta_ml import build_model
auto_modeling(input_df, model_name, label_column_name, params=None, mode=None)
output_df = predict(input_df, model_name)

Parameters

Paramater	Description
`input_df`	a Spark dataframe that contains the featurecolumns and label columns. All feature columns must be numeric
`model_name`	a name to identify the Model
`label_column_name`	the name of the label columns as a two part qualified name, input_df.columns
`params`	Set as None to enable Auto Mode. Otherwise, specify the model and its parameters at PySpark MLlib
`mode`	a string such as ‘classification’, ‘regression’ or `None` to specify Auto Mode. In Auto Mode, it is not necessary to specify `params` other than `None` or `{}`.

Returns

Persists the model to disk.

Model Evaluation

Evaluate a model on a dataframe.

Signature

from incorta_ml import evaluate
output_df=evaluate(input_df, model_name)

Parameters

Paramater	Description
`input_df`	a Spark dataframe that contains the feature columns and the label column for which the schema should match that of the training data
`model_name`	the name of the model for evaluation

Returns

output_df: a dataframe that contains two columns, metric_name and value, where each row represents a metric value.

Time Series Forecasting

Builds a Time Series model, persists the model to disk, and returns a dataframe that includes predictions and a number of future points. For more information about frequency formulas, please refer to pandas documentation for time series offset aliases.

Signature

train_time_series_model(input_df, handle, time_column_name, value_column_name,  algorithm_names, horizon=0, intervals=None)

Parameters

Paramater	Description
`input_df`	a Spark dataframe that contains a date column or timestamp column, and a values column
`handle`	a name to identify the process. Recommendation is to have the name reflect the prediction for the feature.
`time_column_name`	names of the date or timestamp columns available in the specified dataframe, each as a qualified name such as `input_df.column`
`value_column_name`	name of the values columns, each as a qualified name such as `input_df.column`
`algorithm_names`	names of the algorithms to use: `'est'` for simple exponential smoothing `'auto_arima'` for auto arima `'fbprophet'` for facebook prophet
`horizon`	number of points that will be selected as a testing set.
`intervals`	the size of each period, a frequency formula, an integer representing the number of seconds, or `None` for auto.

Returns

Returns a dataframe that includes predictions and a number of future points.

Additional Improvements and Enhancements

In the 4.9 release, there are additional improvements and enhancements:

Support for folders in the Google Sheets connector
Support for more incremental data type columns for MS SQL Server
Parallelize Apache Spark queries with split Parquet files
Schema dependencies for inspector jobs
Content Manager preserves the List view sorting

Support for folders in the Google Sheets connector

In this release, the connector for Google sheets now supports selecting a folder that contains one or more Google Sheet documents. To learn more, please review Connectors → Google Sheets.

Support for more incremental data type columns for MS SQL Server

For Microsoft SQL Server database tables, specify an incremental column of the type double, integer, or long Incremental Support

In this release, for an incremental table of the type SQL Database with a data source that is MS SQL Server, an incremental column supports numeric types such as double, integer, and long. This is in addition to date and timestamp.

Parallelize Apache Spark queries with split Parquet files

In this release, you can inform the Loader Service to split Parquet files to take advantage of unused cores in Apache Spark. Only on a full second load will the splitting of files occur.

The num.of.parquet.parts informs the Parquet writer to split an existing Parquet file where each part contains at maximum the number of records per part.

You can define this property in the node.properties file, or for the Loader Service, in the service.properties file.

Syntax

num.of.parquet.parts=<INTEGER>

Example that split the Parquet into a maximum of 10 parts:

num.of.parquet.parts=10

Additional considerations

In order to determine the number of parts, after the first full load, review the row count. Next, using the core count available to Apache Spark.

Note

If migrating from one Apache Spark environment to another and the core counts differ, recalculate and reassign the num.of.parquet.parts property.

Schema dependencies for inspector jobs

In this release, an Inspector job will now generate a new file, schemaDependency.csv. The file contains schema dependencies as follows:

Tenant Name
Schema
Dependent Schema

The new file identifies joins from one schema to another schema and includes alias tables that contain a formula column.

Content Manager preserves the List view sorting

In this release, the sort order of a select column in the List view of the Content (Catalog) Manager will be preserved.

Known Issues

Here are the known issues in this 4.9.0 release:

Multiple suspended schedules will show up as not scheduled.
Certain dashboard insights may show the message, The data is being refreshed, even though the Loader Service reports a successful load of the schema.
A PostgreSQL materialized view that refers to a formula column in a physical schema may fail during a load job.
For an Advanced Map insight, it is not possible to set a Base Field for a Geo Data measure.