Release Notes 4.9

Release Highlights

The goal of the Incorta 4.9 release is to embrace the design themes of consistency, clarity, and efficiency with a strong focus on increasing user productivity and delight. As a result, this release showcases modern design and engineering at its best with the introduction of an all new, and greatly improved, Analyzer experience.

This release introduces several major improvements to the Cluster Management Console (CMC), Incorta Loader Service, Incorta Analytics Service, and Incorta ML.

Important New Features and Enhancements

There are several important features in this release:

Additional Improvements and Enhancements

Upgrade to Incorta 4.9

Important

Prior to upgrading to Incorta 4.9, please review and follow the procedures outlined in the Upgrade to Incorta 4.9 documentation.

Note

Any changes to these properties require that you restart all services in the Incorta Cluster.


Cluster Management Console (CMC)

The following new configurations and enhancements are available in the Cluster Management Console (CMC) for this release:

Single Sign-on (SSO) Auto Provisioning

With Single Sign-on (SSO) Auto Provisioning, security administrators no longer need to manually create an Incorta user and assign the user to a group. Incorta will honor the SSO provider authentication, automatically create the Incorta user in the given tenant, and assign the user to a default group.

Note

You must have Single Sign-on (SSO) already configured for the default tenant configuration or the specific tenant configuration. You must also have created a security group for a specific tenant. You are not able to assign a group for the default tenant configuration.

Here are the steps to enable this option as the default tenant configuration in the CMC:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Cluster Configurations.
  • In the panel tabs, select Default Tenant Configurations.
  • In the left pane, select Security.
  • In the right pane, confirm that SSO is the Authentication Type.
  • Enable Auto provision SSO users.
  • Select Save.

Here are the steps to enable this option for a specific tenant configuration in the CMC:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select the Tenants tab.
  • In the Tenant list, for the given tenant, select Configure.
  • In the left pane, select Security.
  • In the right pane, confirm that SSO is the Authentication Type.
  • Enable Auto provision SSO users.
  • For Auto provisioned SSO users default group,select a group name.
  • Select Save.

Infrastructure Management

In this release, for a given cluster, the CMC now includes an Infrastructure section. In the Infrastructure section, an administrator can enable or disable the following infrastructure servers:

  • Apache Derby database server for the Incorta metadata database server
  • Apache ZooKeeper server
  • Apache Spark standalone server
Important

This feature only supports a typical, standalone cluster configuration where the following are true:

  • Apache Derby serves as the Incorta Metadata Database
  • Apache ZooKeeper is the Incorta packaged (incorta-package.zip) distributed version.
  • Apache Spark is the Incorta packaged (incorta-package.zip) distributed version

To enable or disable a infrastructure application in the cluster, follow these steps:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a cluster name.
  • In the canvas tabs, select** Details**.
  • In the Infrastructure section, enable or disable any of the following:

    • Metadata Database
    • Zookeeper
    • Spark

In this release, where the feature does not support the version or server, there is a disabled toggle control. For example, if the cluster uses a MySQL Server database for the Incorta metadata database, the Metadata Database toggle appears as disabled.

Warning for Mandatory Infrastructure Servers

The CMC will now show a warning indicating that a mandatory infrastructure server, such as the Incorta Metadata database or the Apache ZooKeeper, is not running.

To start the server, follow these steps:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a cluster name.
  • In the warning, select Start.

Terminated Unexpectedly as a Service State

The CMC will now report when a service unexpectedly terminates. In this release, the possible service states are now:

  • Stopping
  • Stopped
  • Processing
  • Starting Tenants
  • Started
  • Terminated Unexpectedly

To view the status of services in the cluster, follow these steps:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a cluster name.
  • In the canvas tabs, select Details.
  • In the Status section, review the state of the Analytics, Loader, and Add-on services.

Alternatively, you can view the status of services in the cluster in the Services tab:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a cluster name.
  • In the canvas tabs, select Services.
  • In the Status column, review the state of the Analytics, Loader, and Add-on services.
Note

Service Details also reports the Status of the service state.

CMC Scheduler

This release includes the CMC Scheduler. For a given cluster, the CMC Scheduler allows an administrator to create and manage jobs for:

  • Tenant backups
  • The Inspector Tool

To learn more, review Tools → CMC Scheduler.

Mapbox Integration

Mapbox is an open source mapping platform for custom designed maps. This release supports integration with Mapbox for the newly introduced Advanced Map insight visualization available in the new Analyzer.

Mapbox uses access tokens to associate API requests with an account. To learn more, please review Access Tokens | How Mapbox works.

It is optional to use your organization’s Mapbox Access Token as Incorta’s access token is the default in this release.

Important

Because of API request limits associated with Incorta’s default access token, an API request may not be processed. If you encounter an issue, please reach out to Support for more details.

Here are the steps to use your organization’s Mapbox Access Token in the default tenant configuration in the CMC:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Cluster Configurations.
  • In the panel tabs, select Default Tenant Configurations.
  • In the left pane, select Integration.
  • In the right pane, in Mapbox API Key, specify the token value.
  • Select Save.

Here are the steps to use your organization’s Mapbox Access Token for a specific tenant configuration in the CMC:

  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select the Tenants tab.
  • In the Tenant list, for the given tenant, select Configure.
  • In the left pane, select Integration.
  • In the right pane, in Mapbox API Key, specify the token value.
  • Select Save.

Incorta Analytics and Loader Service

The 4.9 release introduces several key improvements to the Incorta Analytics and Loader Services such as:

New Analyzer

Embracing the design themes of consistency, clarity, and efficiency, this release showcases modern analytics engineering with the introduction of an all new, and greatly improved, Analyzer experience that promises increased productivity and delight.

You can use the new Analyzer for the following:

  • To create and edit an insight on a dashboard tab
  • To create and edit an Incorta Table in a physical schema
  • To create and edit an Incorta View in a business schema

Anatomy of the New Analyzer

The Analyzer now opens in full and there is no Navigation bar. You must select Save or Cancel to close the Analyzer. Here is a high level description of the Analyzers’s anatomy:

  • Action bar
  • Data panel
  • Manage Data Sets panel
  • Insight panel
  • Properties panel
  • Filter panel
  • Filter Values panel
  • Visualization canvas
  • Settings panel
Action bar

In the Action, you can perform the following

  • Download (download icon)
  • SQL
  • Settings
  • Cancel
  • Save

When supported by the configured insight visualization, you can view the Reference SQL and/or download a file as.csv or .xlsx.

Data panel

To filter and find items in the Data panel, enter a search term in the Search text box or use the Column Type drop down menu to narrow your results. Column Types include:

  • String
  • Numerical
  • Date
  • Timestamp
  • Boolean
  • Key

For a given column in the tree, select the information icon to view the column details and preview sample data.

You can also manage the tree hierarchy. The More Options (kebab icon) menu allows you to:

  • Collapse to Schema and Table Level
  • Sort by Name or Original Order
Multi-select

To select multiple columns in the Data panel, you must use the following keystrokes:

  • On Mac OS, use CMD
  • On Windows OS and Linux OS, use ALT

You must drag and drop multi-select columns to a tray or target box in another panel.

Manage Data Sets panel

You can use the Manage Data Sets panel to add selected schemas, business schemas, tables, or views to the Data panel.

The Manage Data Sets panel contains the Views and Tables tabs that can be filtered using search.

Insight panel

The Insight panel shows the selected visualization. The default selecting is Listing Table. Simply select the downward arrow (V) to change the visualization type.

Visualizations

Here is a list of visualizations in this release:

Tables

  • Listing Table
  • Aggregate Table
  • Pivot Table

Charts

  • Column
  • Stacked Column
  • Percent Column
  • Bar
  • Stacked Bar
  • Percent Bar
  • Area
  • Stacked Area
  • Percent Area
  • Line
  • Stacked Line
  • Percent Line
  • Pie
  • Donut
  • Pie Donut
  • Sunburst
  • Combination
  • Spider
  • Line Time Series
  • Time Series
  • Area Range
  • Combo Dual Axis
  • Dual Axis
  • Dual X-Axis
  • Map
  • Bubble Map
  • Advanced Map
  • Funnel
  • Pyramid
  • Scatter
  • Treemap
  • Heatmap
  • Tag Cloud
  • Bubble
  • Packed Bubble
  • Organizational

Others

  • KPI
  • Rich Tex
  • Gauge
  • Solid Gauge
Trays

The visualization selection determines the available trays within the Insight panel. Rich Text is the only visualization without available trays.

From the Data panel, you can add one or more columns to a tray. When applicable, you can also add a formula to a tray. All trays now have a Clear All command. A column or formula in a tray is a Pill. Each pill has configurable properties. The parent tray determines the available properties of a pill.

Important

About a pill name:
To change the name of the column or formula that is a Pill, you must double-click. In the text box, you can modify the name. In this release, there is no visible Name or Label property for a pill.

Here is a list of some of the trays available within the Insight panel:

  • Grouping Dimension
  • Coloring Dimension
  • Row (only for Pivot Table)
  • Column (only for Pivot Table)
  • Measure (available in all visualizations except Advanced Maps and Rich Text)
  • Layers (only for Advanced Map)
  • Color By (only for Advanced Map)
  • Size By (only for Advanced Map)
  • Sort By (only for Listing and Aggregated Tables where all pills are in the Measure tray)
  • Source (only for Sankey)
  • Target (only for Sankey)
  • Individual Filter (can view Filter Values)
  • Aggregate Filter (can view Filter Values)
  • Distinct Filter (only for Listing and Aggregated Tables where all pills are in the Measure tray)
Properties panel

Using the Properties panel, you can easily modify the properties of a pill. Here are some examples:

  • easily apply and remove formatting, including conditional formatting
  • quickly select a drill down link using a tree view of dashboards and dashboards tabs
  • for a gauge, add a range of a specific color
  • copy and paste a bulk list of individual filter values
  • define a date part for a timestamp or date column

Depending on the tray, not all pills have configurable properties. For example, the Distinct Filter does not offer a Properties panel. Some pills allow for direct configuration, such as a pill in the Sort By tray where you can directly set the sort direction.

Grouping Dimension
  • Date Part for timestamp and date columns
  • URL
  • Show Empty Groups
  • Sort By with Clear All
  • Dashboards Drill Down
Measure

Because properties are specific to the given visualization, not all properties are applicable.

  • Date Part for timestamp and date columns
  • Aggregation
  • Scale
  • Running Total
  • Filter (this is a measure filter)
  • Format
  • Conditional Formatting (available for Listing Table, Aggregate Table, Pivot Table and KPI)
  • Abbreviate on Hover (Available for most chart visualizations)
  • Color (Available for most visualizations)
  • Plot Band (Available for most Column, Bar, Area, and Line charts)
  • Minimum (Only for gauges)
  • Maximum (Only for gauges)
  • Gauge Ranges (Only for gauges)
  • Dashboards Drill Down
  • Base Field (Not available for Advanced Map)
  • Average Lines (Available for most Column, Bar, Area, and Line charts)
  • Query Plan (read only)
Individual Filter

Specify the filter operator, select values, edit bulk values, and add individual values in this panel.

  • Date Part for timestamp and date columns
  • Operator
  • Values
  • Add
Note

If a pill does not have a defined filter, the pill will show a validation warning (red circle).

Aggregate Filter

Specify the aggregation type, the filter operator, and values. If needed, edit in bulk values, and add individual values in this panel.

  • Date Part for timestamp and date columns
  • Aggregation
  • Operator
  • Values
Note

If a pill does not have a defined filter, the pill will show a validation warning (red circle).

Color By

Applicable only for an Advanced Map, Color By is a measure, so it has the properties of a measure, except for conditional formatting, with the addition of:

  • Color Palette
Size By

Applicable only for an Advanced Map, Size By is for a measure, so it has the properties of a measure, except for conditional formatting, with the addition of:

  • Radius
Row

Applicable only for a Pivot Table, Row is for a dimension, and has the following properties:

  • Sort By
  • Dashboards Drill Down
Column

Applicable only for a Pivot Table, Column is for a dimension, and has the following properties:

  • Sort By
Coloring Dimension

Applicable for most charts, Color Dimension is for a dimension, and has the following properties:

  • Sort By
  • Format Color Palette
Source

Applicable only for Sankey, Source is for a dimension, and has the following properties:

  • Sort By
  • Dashboards Drill Down
Target

Applicable only for Sankey, Target is for a dimension, and has the following properties:

  • Sort By
  • Dashboards Drill Down
Additional Panels for detailed properties

There are several additional panels for adding a supported feature to a visualization or for formatting a value conditionally. In these instances, an additional Properties panel opens in the Analyzer.

Add Plot Band

To add one or more bands to an applicable chart, in the Properties panel, select Add Plot band. The Add Plot Band panel contains the following properties:

  • Start
  • Stop
  • Label
  • Background (color selection)
Add Dashboard panel

To specify a drill down to a tab in the same or another dashboard, in the Properties panel, select Add Dashboard. The Add Dashboard panel contains the following properties:

  • Include Runtime Filters
  • Search
  • Tree control to select a dashboard tab
Add Average Line

Certain charts support adding an Average Line. This line can be an Average Line, Linear Trend, Simple Moving Average, or an Exponential Moving Average. To specify one or more Average Lines, in the Properties panel, select Add Average Line. The Add Average Line panel contains the following properties:

  • Average Line Type
  • Line Style
  • Period (Only for moving averages)
Conditional Formatting

For the Listing Table, Aggregated Table, Pivot Table, and KPI visualizations, you can specify one or more conditional formats for a given measure. In the Properties panel, in Conditional Formatting, select Add Conditional Format. The Conditional Formatting panel contains the following properties:

  • Aggregation
  • Value
  • Background
  • Text Color
Add Gauge range

For the Gauge or Solid Gauge visualizations, you can specify one or more Gauge Ranges for a given measure. In the Properties panel, in Gauge Ranges, select Add Gauge Range. The Gauge Range contains the following properties:

  • Stop %
  • Background (color)
Filter Values panel

For a given insight, you can now view all specified insight filters in the Filter Values panel. Insight filters are:

  • Individual Filter
  • Aggregate Filter
  • Distinct Filter

To open the Filter Values panel, select the View Filter Values icon for any of these trays. To close the panel, select X.

The Filter Value panel shows a summary of the filter properties. You can collapse and expand Individual Filter, Aggregate Filter, and Distinct Filter in this panel.

Individual Filter

For each pill, there is a summary of the column or formula, operator, preview of values, and count of values. Select the summary to open the Individual Filter panel for the given pill.

Aggregate Filter

For each pill, the summary shows the aggregate filter expression. Select the summary to open the Aggregate Filter panel for the given pill.

Distinct Filter

For each pill, the summary shows the distinct filter.

Visualization canvas

The visualization canvas shows a preview of the insight. You can configure an insight title and insight description for all visualizations other than Rich Text.

In this release, for Table visualizations, you can select and copy rows, columns, and or cells in the insight in the canvas. For the canvas, you can also select the full screen icon to collapse all open panels as well as the expand icon to open all closed panels.

Settings Panel

This release combines the General setting properties with the Layout properties for the selected insight visualization. Additional settings are Advanced, Format, and Map Settings.

General

Here are the properties for General settings:

  • Page Size
  • Max Rows
  • Logarithmic
  • Percentage of Column
  • Auto Refresh
  • Merge Columns (Aggregated Table),
  • Collapsed (Aggregated Table)
  • Merge Rows (Aggregated Table, Pivot Table)
  • Row Grand Total (Pivot Table)
  • Row Subtotal (Pivot Table)
  • Column Grand Total (Pivot Table)
  • Column Total At (Pivot Table)
  • Hide Columns (PivotTable)
  • Subtotal (Listing Table, Aggregated Table)
  • Total (Listing Table, Aggregated Table)

Layout

Here are the properties for Layout settings:

  • Fix Columns
  • Headers
  • Transpose
  • Rotation
  • Legend
  • Data Labels
  • Values
  • Hide Zero Values
  • Connect Values
  • Fixed Placement
  • X-Axis Labels
  • X-Axis Title
  • Y-Axis Labels
  • Y-Axis Title
  • Y-Axis Min
  • Y-Axis Max

Format

Format settings are only for KPI insight visualizations. Select a format to apply to all pills with the property Auto Format selected. In this manner, you can easily apply a format to all KPI measures. For an individual pill, you can override the Format property, if so desired.

Map Settings

Map Settings are only for Advanced Maps. Here are the properties for Map Settings:

  • Style
  • Data Labels
  • Legend

Advanced

Here are the properties for Advanced settings:

  • Max Groups
  • Missing Value Text
  • Join Measures

Dashboard tabs

In this release, a dashboard now includes one or more tabs. A tab is an easy and convenient way to organize a dashboard into logical sections. You can create drill down links between tabs in a given dashboard or another dashboard.

A new dashboard has a single default tab, Tab 1. A dashboard can include a maximum of 10 tabs. You can easily add a new tab, rename a tab, change the order of a tab, and delete a tab.

For tabs with an existing insight visualization, you can duplicate the tab, edit the layout, personalize the tab, refresh the data for insight visualizations on the tab, as well as download the tab as either a .xlsx (MS Excel file) or as a .html file.

You can also hide all tabs or show all tabs for a given dashboard.

About tab names

A tab must have a name that:

  • is unique to the dashboard
  • has a least 1 character and no more than 255 characters in length
  • can contain spaces, special characters, and even Unicode emoji characters (utf-mb)
Note

Only the first 20 characters of a tab name will appear in the tab.

Add a new tab

To add a new tab, simply select + next to the existing tab.

Rename an existing tab

  • To rename a tab, for the given tab, select More Options (kebab icon).
  • In the More Options menu, select Rename.
  • In the tab text box, enter a new name.
  • To save your changes, press Enter or Return.

Change the order of a tab

  • To change the order of a tab, simply drag and drop the tab to the left or right of an existing tab.

Delete a tab

  • To delete a tab, for the given tab, select More Options (kebab icon).
  • In the More Options menu, select Delete.
  • In the dialog, select OK.

Duplicate an existing tab

  • To duplicate a tab, for the given tab, select More Options (kebab icon).
  • In the More Options menu, select Duplicate.

Edit the tab layout

You can edit the layout of a tab that has two or more insights and has not been personalized.

  • To edit the layout of tab insights, for the given tab, select More Options (kebab icon).
  • In the More Options menu, select Edit Layout.
  • Make your layout changes.
  • In the Action bar, select Save.

Personalize the tab

To personalize dashboards and tabs, the user must belong to a group that has the Dashboard Analyzer role. A personalized tab includes the ability to edit the tab layout.

  • To personalize the tab insights, for the given tab, select More Options (kebab icon).
  • In the More Options menu, select Personalize.
  • Make your personalization changes.
  • In the Action bar, select Save.

Alternatively, for the selected tab, in the dashboard Action bar, you can select the Personalization icon, and in the menu, select Personalize.

Refresh data for a tab

  • To refresh the data for insights on a given tab, select More Options (kebab icon).
  • In the More Options menu, select Refresh Data.

Download

You can download a tab as either a .xlsx (MS Excel file) or as a .html file. The .xlsx download option only supports these insight visualizations types:

  • Listing Table
  • Aggregated Table
  • Pivot Table

To download a tab as a .xlsx file, follow these steps:

  • For the given tab, select More Options (kebab icon).
  • In the More Options menu, select Download → XLSX.

To download a tab as a .html file, follow these steps:

  • For the given tab, select More Options (kebab icon).
  • In the More Options menu, select Download → HTML.

Hide all tabs

To hide all tabs, the dashboard must be in its Original View with one or more tabs showing. Here are the steps to Hide all tabs:

  • In the Action bar, select More Options (kebab icon).
  • In the menu, select Hide Tabs.
Note

When enabling Hide Tabs, the insights for the selected tab are initially visible as the dashboard. However, when closing and returning to the dashboard, only the first tab insights are visible as the dashboard.

Show all tabs

To show all tabs, the dashboard must be in its Original View with all tabs hidden. Here are the steps to Show all tabs:

  • In the Action bar, select More Options (kebab icon).
  • In the menu, select Show Tabs.

Advanced Map visualization with Mapbox

This release includes the Advanced Map visualization in the new Analyzer that uses Mapbox.

Map Settings

Map Settings are only for Advanced Maps.

To open the Settings panel, in the Action bar, select Settings (Gear icon). In Map Settings, you configure the properties for these settings. Here are the properties for Map Settings:

  • Style (Auto, Light, Dark, Satellite, Outdoors, Satellite Streets)
  • Data Labels
  • Legend

About Layers

A layer contains the data (Geo Data) for a specified geographical entity (Geo Entity) that Mapbox visualizes. An Advance Map visualization can have one or more layers. Each layer has a unique name, a configurable Layer Settings, and configurable Geo Data.

A dashboard user typically zooms (in and out) from one layer into another layer on an Advanced Map insight.

Layer Settings

Each layer has Layer Settings:

Property Description
Data Labels Enable toggle to view geo data labels
Legend Enable toggle to view the map legend
Visible Enable toggle to view the layer
Type Select Type
  • Marker, select Shape(Pin, Square, Cross, Times, or Circle)
  • Area
  • Bubble, define Size by measure
  • Heatmap
  • Zoom range Use slider to narrow initial zoom from 0% to 100%
    Opacity Use slider to set opacity for the selected type, 0% to 100%
    About Geo Data

    Geo Data defines the data for the layer. It specifies the Geo Entity for the layer and the aggregation measurements for the entity. The aggregations related to the Geo Entity appear in the map visualization.

    For example, the colors of a country represent the various population sizes where the Geo Entity is a country and the color by measure is city populations.

    In Geo Data, Color By, Size By, and Tooltip contain the various aggregations. Color By is the only required measure. Bubble visualization types require a Size By measure. A Tooltip can contain multiple measures.

    About a Geo Entity

    A Geo Entity can be either a Geo Attribute or a Geospatial Point.

    Geospatial (Lat/Long)

    A Geospatial Point is a tuple of latitude and longitude. Geospatial has one column or formula for Latitude and another column or formula for Longitude.

    Property Description
    Latitude Column or formula that will map the latitude
    Longitude Column or formula that will map the longitude

    Geo Attribute Property

    A Geo Attribute is a column that conforms to a Geo Role. A Geo Role is a Country, County, County Subdivision, State, City, or Zip Code. Mabpox identifies Geo Roles using its own repository. A Geo Attribute is has only one column or formula for the following:

    Property Description
    Geo attribute Column or formula that will map the Geo Role
    Geo Role Country, State, County, County Subdivision, City, Zip Code
    Dashboard Drill down
    Note

    Zip Code is a Geo Role that is applicable only for the United States.

    About Geo Data measures

    Geo Data requires one measure for aggregation. For additional aggregations, create additional layers.

    You can add only one column or formula that will serve as a measure for each of the following:

    • Color By
    • Size By
    • Tooltip

    Color By

    You can add only one column or formula as the Color By measure. You can define the properties for a Color by measure as follows:

    • Aggregation
    • Scale
    • Running Total
    • Filter (measure filter)
    • Format
    • Color Palette

    Size By

    Size By is only applicable for Geo Entity with a Layer Setting of the Bubble type. You can add only one column or formula as the Size By measure. Here are the properties for a Size By measure:

    • Aggregation
    • Scale
    • Running Total
    • Filter (measure filter)
    • Format
    • Color Palette

    Tooltip

    A Tooltip appears in the mouse hover state over the Geo Entity on the map. You can optionally measure Geo Data with one or more Tooltips. You can add one or more columns or formulas as a Tooltip. Here are the properties of a Tooltip:

    • Aggregation
    • Scale
    • Running Total
    • Filter (measure filter)
    • Format
    • Color Palette
    • Enable Base field

    Create an Advanced Map visualization

    For an existing dashboard, to create an Advanced Map visualization insight, follow these steps:

    • If not already open, open a dashboard.
    • To add a new insight to the dashboard, in the Action bar, select +.
    • If needed, in the Analyzer, in the Data panel, select Add Data Set.
    • In the Manage Data Sets panel, select one or more business schema views and/or one or more physical schema tables.
    • To close the Manage Data Sets panel, select X or any other area of the Analyzer.
    • In the Insight panel, select Listing Table or V (down arrow).
    • In the Insight panel, in Charts, select Advanced Map.
    Managing Layers

    To modify the name of layer 1 in Insight panel, follow these steps:

    • To change the default layer 1 name, double click the layer 1 pill.
    • In the text box, enter a layer name, press Enter or select any other area of the Analyzer.

    To add a new layer for an Advanced Map insight, follow these steps:

    • In the Insight panel, in Layer, select + Add Layer.

    To change the layer order of an Advanced Map insight, follow these steps:

    • In the Insight panel, in Layer, select the specific layer pill that you want to reorder.
    • In Layer, to move the layer up, select Up (up arrow icon ↑).
    • To move the layer down, select Down (down arrow icon ↓).

    To modify the Layer Settings, follow these steps:

    • In the Insight panel, for a given Layer pill, select > (right arrow).
    • In the Properties panel, in Layer Settings, modify the settings properties:

      • Data Labels
      • Legend
      • Visible
      • Type
      • Zoom Range
      • Opacity
    • To close the Properties panel, select X or any other area of the Analyzer.
    Specify and configure Geo Data

    To specify a Geo Attribute and define the properties of the Geo Data for a given layer, follow these steps:

    • In the Insight panel, in Layer, select a layer.
    • In Geo Data, select Geo Attribute.
    • From the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Geo Attribute target box.
    • In the Properties panel, define the Geo Role.

    To specify a Geospatial Point and define the properties of the Geo Data for a given layer, follow these steps:

    • In the Insight panel, in Layer, select a layer.
    • In Geo Data, select Lat/Long.
    • From the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Latitude target box.
    • From the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Longitude target box.

    Here are the steps to specify the Color By measure:

    • For the given layer’s Geo Data Geo Attribute, from the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Color By target box.
    • In the Properties panel, define the measure properties.

    Here are the steps to specify the Size By measure:

    • For the given layer’s Geo Data Geo Attribute, from the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Size By target box.
    • In the Properties panel, define the measure properties.

    Here are the steps to specify the Tooltip measure:

    • For the given layer’s Geo Data Geo Attribute, from the Data panel to the Insight panel, drag and drop a column or formula column (Add Formula) to the Tooltip target box.
    • In the Properties panel, define the measure properties.

    Rich Text visualization

    This release includes a new visualization for Rich Text. Using a built-in, What-You-See-Is-What-You-Get (WYSIWYG) editor that embeds into the Analyzer, you can easily create, edit, format, and preview rich text. Incorta stores the rich text as HTML for the insight.

    The WYSIWYG editor for the Rich Text visualization supports the following:

    • Text font selection and font size.
    • Text formatting
    • Text coloring, including custom colors, for text foreground and background.
    • Text alignment and image (left, right, center, and justified)
    • Text indentation
    • Unordered and ordered bullet lists
    • GIF, JPG, and PNG image embedding as a HTTP source <img src="https://www.mywebsite.com/myimage.jpg">
    • Copy and paste GIF, JPG, and PNG
    • Edit a copied image using the Edit Image controls such as brightness, contrast, gama, cropping, orientation, and mirroring.
    • Link embedding as a HTTP source <a href="https:/www.mywebsite.com">Link</a>
    • Referencing system, internal, and external session variables with the $$ syntax such as $$user and $$currentDate, even for attribute values of an html element
    • Viewing and editing the HTML source
    • <IFRAME> in HTML source
    Note

    It is possible to create a web link to another dashboard or dashboard tab using the full HTTP URL. However, these links will not function as drill down links to other dashboards with regards to optionally applying dashboard runtime filters.

    Create a Rich Text visualization

    For an existing dashboard, to create a Rich Text visualization insight, follow these steps:

    • To add a new insight to the dashboard tab, in the Action bar, select +.
    • In the Insight panel, select Listing Table or V (down arrow).
    • In the Insight pane, in Others, select Rich Text.
    • In the Rich Text editor, add and format your text.
    • To optionally preview your changes, in the Menu bar, select View → Preview, or in the Toolbar, select Preview (Eye icon), and then select X or Close.
    • To save your changes, in the Actions bar, select Save.

    Sankey visualization

    This release includes the new Sankey visualization. A Sankey chart visualizes the flow between two or more nodes. A node in can be a source node, a target node, or both. Incorta dynamically determines an intermediary node, a node which is a source and a target. For intermediary nodes, a user can select a node in the visualization, and Filter by Source or Target.

    The flow lines that link between source and target show as individual colored bands that visualize via the width of the band the weight of the measure. In this sense, each link has three parameters: from, to, and weight.

    Example Data

    Country_Source, Country_Target, Measure<
    Brazil, Portugal, 5,
    Brazil, Spain, 1,
    Canada, Portugal, 1,
    Mexico, Portugal, 1,
    Mexico, Spain, 5,
    Portugal, Egypt, 2,
    Portugal, Senegal, 1,
    Portugal, Morocco, 1,
    Spain, Senegal, 1,
    Spain, Morocco, 3,
    Egypt, China, 5,
    Egypt, India, 1,
    Egypt, Japan, 3,
    Senegal, China, 5,
    Senegal, Japan, 3,
    Morocco, China, 5,
    Morocco, India, 1,
    Morocco, Japan, 3

    In the example above, there are several countries that are both source and targets: Portugal, Spain, Senegal, Egypt, and Morocco. Incorta dynamically calculates the weight for the bands between source and target nodes.

    Important

    In this release, all columns must be a data backed column or persisted computed column from a physical schema table (formula column). A Sankey insight will not recognize a runtime formula column for a measure. A runtime formula column exists in the insight itself or exists as a formula column in a business schema view. In addition, because of dynamic associations, Filter by Source and Target will invoke errors for a Source or Target which are runtime formula columns.

    Create a Sankey visualization

    For an existing dashboard, to create a Sankey visualization insight, follow these steps:

    • To add a new insight to the dashboard tab, in the Action bar, select +.
    • If needed, in the Analyzer, in the Data panel, select Add Data Set.
    • In the Manage Data Sets panel, select one or more business schema views and/or one or more physical schema tables.
    • To close the Manage Data Sets panel, select X or any other area of the Analyzer.
    • In the Insight panel, select Listing Table or V (down arrow).
    • In the Insight panel, in Charts, select Sankey.
    • From the Data panel to the Insight panel, add…

      • one column or formula to Source
      • one column or formula to Target
      • one data backed column to Measure
    • To save your changes, in the Actions bar, select Save.

    Sunburst Visualization

    A sunburst chart visualizes hierarchical data in a circular shape. Parent nodes in the hierarchy are inner elements, and the outer rings of elements are child nodes. Multiple grouping dimensions characterize a single aggregated measure. The Sunburst visualization supports both columns and runtime formula columns.

    The visualization supports the following user interactions:

    • Select an element in an outer ring to Expand or Filter by:

      • Filter by applies a dashboard runtime filter.
      • Expand drills-in-place; select the center element to return back from the drill-in-place.

    Create a Sunburst Visualization

    In the Analyzer, in the Insight panel, the pill order of grouping dimensions affects the hierarchy of nodes: the first pill is the parent, and the last pill is the child.

    For an existing dashboard, to create a Sunburst visualization insight, follow these steps:

    • To add a new insight to the dashboard tab, in the Action bar, select +.
    • If needed, in the Analyzer, in the Data panel, select Add Data Set.
    • In the Manage Data Sets panel, select one or more business schema views and/or one or more physical schema tables.
    • To close the Manage Data Sets panel, select X or any other area of the Analyzer.
    • In the Insight panel, select Listing Table or V (down arrow).
    • In the Insight panel, in Charts, select Sunburst.
    • From the Data panel to the Insight panel, add…

      • one or more columns and formulas to Grouping Dimensions
      • only one column or formula to Measure.
    • To save your changes, in the Action bar, select Save.

    Date Part

    In the new Analyzer, for this release, you can now select a date part for a date or timestamp column in a Grouping Dimension or Measure. You can also specify a date part column as an individual filter for an insight. Using a date part is equivalent to using a built-in Date function. The options for date parts are:

    • Full, the column itself
    • Year, as year(date exp)
    • Quarter, as quarter(date exp)
    • Month, as month(date exp)
    • Day, as day(date exp)

    Date parts are available in:

    • Analyzer for an Incorta Table
    • Analyzer for an Incorta View
    • Analyzer for an Insight

    In this release, a date part supports only data-backed date and timestamps columns. A date part does not support a runtime formula column in a business schema view or Incorta View.

    A date part column supports dashboard prompt and applied filters. You can sort a date part column by defining a Sort By date.

    Configure a column as a Date Part

    Here are the steps to configure a column as a date part.

    • In the Data panel, select either a date or timestamp column to add to one of the following trays in the Insight panel:

      • Grouping Dimension
      • Measure
      • Individual Filter
      • Aggregate Filter
    • Select the pill in the tray.
    • In the Properties panel, in Date Part, select one of the following:

      • Full
      • Year
      • Quarter
      • Month
      • Day
    • To save your changes, in the Action bar, select Save.

    Cisco Meraki Connector

    Cisco Meraki is a cloud IT management software that provides users with a scalable and secure solution that can help them create and control their networks. Cisco Meraki’s products include wireless, switching, security, enterprise mobility management and security cameras, all centrally managed from the web. To learn more, please review Connectors → Cisco Meraki.

    Splunk Connector

    Splunk is a software product that captures, indexes, and correlates real-time, machine-generated data in a searchable repository from which it can generate graphs, reports, alerts, dashboards, and visualizations. Currently, the Splunk connector extracts data represented as Splunk reports. To learn more, please review Connectors → Splunk.

    System variables for incremental extracts

    In this release, there are two new system variables that you can use to specify a window of time for incremental table loads for a table with a SQL data source. The variables are:

    • $$job_extract_start_time Dynamically evaluates to the start time of the query execution (extraction)
    • $$job_extract_reference_time Dynamically evaluates to the Last Successful Extract Time

    You can use these variables in the WHERE clause of a SQL query. The new system variables support both Query and Update Query configurations. Here is an example:

    SELECT
        `GUID`,
        `TENANT_ID`,
        `PARENT_GUID`,
        `TARGET_TYPE`,
        `TARGET_NAME`,
        `TARGET_ID`,
        `JOB_TYPE`,
        `STATE`,
        `LEADER_NODE_NAME`,
        `START_TIME`,
        `END_TIME`,
        `LAST_MODIFIED`,
        `DURATION`,
        `MESSAGE`
    FROM
        `incorta_metadata`.`JOB`
    WHERE `START_TIME` >= $$job_extract_reference_time
      AND `START_TIME` <= $$job_extract_start_time

    Global Variables

    This release introduces global variables. Unlike other objects in Incorta, a global variable is available to all tenant users. A global variable has a name, description, type, and value. A global variable is a static variable.

    To learn more, review Concepts → Global Variable.

    Schema Notifications

    This release introduces schema notifications. A schema notification is an email notification that contains the sender name, the schema name, the schema load status, and a direct link to the Load Job Viewer that contains the schema load job summary and details.

    A notification name:

    • is between 1 and 255 characters
    • can contains spaces and special characters

    You can also specify text up to 4,000 characters in an email body.

    Note

    Notifications require a tenant email configuration for an outgoing email server using SMTP or EWS in the Cluster Management Console (CMC).

    Create a schema notification

    In this release, you can create a schema notification using the Schema Manager or the Scheduler.

    With the Schema Manager, create a schema notification

    There are two ways to open the Create Notification via Email dialog from the Schema Manager:

    • In the Navigation bar, select Schema.
    • In the Schema Manager, in the Context tab, select the Schemas tab.
    • In the List View, select the checkbox for one or more schemas.
    • In the Search bar, select More Options (kebab icon).
    • In the More Options menu, select Create Notification.

    or

    • In the Navigation bar, select Schema.
    • In the Schema Manager, in the Context tab, select the Schemas tab.
    • In the List View, for a given schema row, select More Options (kebab icon).
    • In the More Options menu, select Create Notification.

    To create an email notification, follow these steps:

    • In the Create Notification via Email dialog, enter the Notification Name.
    • In Notify On, select Success and/or Failure.
    • In Select Schema(s), select one or more schemas.
    • In Recipients, specify at least one

      • Username
      • Group
      • or enter an email address
    • For each Recipient, specify if this is TO, CC, or BCC.
    • In Body, optionally enter the email text.
    • To save, select Done.
    With the Scheduler, create a schema notification

    Here are the steps to create a schema notification with the Scheduler:

    • In the Navigation bar, select Scheduler.
    • In the Scheduler, in the Context tab, select the Schema Notifications tab.
    • In the Action bar, select + New → Create Notification.
    • In the Create Notification via Email dialog, enter the Notification Name.
    • In Notify On, select Success and/or Failure.
    • In Select Schema(s), select one or more schemas.
    • In Recipients, specify at least one

      • Username
      • Group
      • or enter an email address
    • For each Recipient, specify if this is TO, CC, or BCC.
    • In Body, optionally enter the email text.
    • To save, select Done.

    Manage schema notifications

    Only the tenant super user, users that belong to a group assigned the SuperRole role, or users that belong to a group assigned the Schema Manager role, can both create and manage schema notifications for a given tenant.

    Search for a schema notification

    To search for a schema notification follow these steps:

    • In the Navigation bar, select Scheduler.
    • In the Scheduler, in the Context tab, select the Schema Notifications tab.
    • In the Search bar text box, enter a search term.
    Edit a schema notification

    To edit a schema notification, follow these steps:

    • In the Navigation bar, select Scheduler.
    • In the Scheduler, in the Context tab, select the Schema Notifications tab.
    • In the List View, select the schema notification row, or in the right row gutter, select Edit (pen icon).
    • In the Edit Notification dialog, modify any of the following:

      • Notification Name
      • Notify On
      • Select Schema(s)
      • Recipients
      • Body
    • To save, select Done.
    Delete one or more schema notifications

    To delete one or more schema notification, follow these steps:

    • In the Navigation bar, select Scheduler.
    • In the Scheduler, in the Context tab, select the Schema Notifications tab.
    • In the List View, select the checkbox for each row for deletion.
    • In the Search bar, select Delete (trash icon).
    • In the Delete notification(s) dialog, select Delete.

    To delete a single schema notification, follow these steps:

    • In the Navigation bar, select Scheduler.
    • In the Scheduler, in the Context tab, select the Schema Notifications
    • In the List View, highlight the specific schema notification row, and in the right row gutter, select Delete (trash icon).
    • In the Delete notification(s) dialog, select Delete.

    Schema Manager supports Filter by Load Status

    In the Schema Manager, for the Schema tab, the List View of schemas now shows the following columns:

    • Name
    • Last Successful Load
    • Owner
    • Modified By
    • Status
    • Last Load
    • Next Load
    • Data on Disk
    • More Options (Kebab)

    In this release, it is now possible to filter by one or more schema load statuses. The schema load status options are:

    • Succeeded
    • Finished With Errors
    • Failed
    • Interrupted
    • Running
    • In Queue

    Load Job Viewer enhancements

    For a selected physical schema in a given tenant, the Load Job Viewer shows both current load jobs and previous load jobs. In this release, the Jobs section contains a summarized table of all load jobs for the selected schema. For the selected load job summary, the Job details section contains all the load details for each and every table in the load.

    Important

    Unlike previous release, in this release, the Load Job Viewer only shows the status of a load job from the perspective of the Loader Service, not the Analytics Services. This change from previous release means that the Load Job Viewer no longer waits for the Analytics Service to report the successful load of performance optimized tables into memory. For this reason, in certain cases, the Load Job Viewer may report the successful status of a load job, but a dashboard insight that references the successfully loaded schema will show the message, The data is being refreshed. This means that the Analytics Service is still loading into memory the related performance optimized tables required by the insight query.

    There are two ways to access the Load Job Viewer:

    • In the Schema Manager, in the List View, for a schema with a Load Status other than No Data Loaded, select the Load Status link:

      • Success
      • Finished with Errors
    • In the Schema Designer for a given schema, in the Summary section, in Last Load Status, select the link:

      • Please Load Data
      • Date Time
      • Load Status such as In Queue or Finished with Errors

    About a load job summary

    A load job summary contains the following details:

    • Service
    • Start Time
    • End Time
    • Load Status
    • Duration
    About a load status

    In this release, Load Status now shows the status for a given load job. The statuses are:

    • Succeeded
    • Finished With Errors
    • Failed
    • Interrupted
    • Running
    • In Queue
    Filter by Load Status

    New to this release is the ability to filter the summarized table of load jobs by Load Status. In the Job Summary section, in the Load Status column, you can now filter the load job summaries by one or more load status filters. The filter options are:

    • Succeeded
    • Finished with Errors
    • Failed
    • Interrupted
    • Running
    • In Queue
    Load status tooltip

    A tooltip for Load Status details the durations of parallelized activities and states within a load job. A given loader service executes many load job activities in parallel. As a result, the load status tooltip depicts activity durations that, when aggregated, are typically greater than the overall duration of the load job itself. The tooltip statuses are:

    • In Queue
    • Extraction
    • Enrichment
    • Load
    • Post-load

    About the Job Details section

    This release enhances the extraction and load details for each individual table in the selected load job. The Job Load Details table shows the following sortable columns:

    • Name: The name of the table with the option to filter by search term
    • Load Type: Full, Incremental, Staging; this is a Sortable column
    • Extraction - Start: the start time of the table extraction by the loader service; Sortable
    • Extraction - Duration: the duration time
    • Extraction - Extracted: the number of rows extracted
    • Extraction - Rejected: the number or rows rejected in the extraction
    • Load - Start: the start time of the table load into the analytics service
    • Load - Duration: the duration time
    • Load - Loaded: the number of rows loaded by the analytics service
    • Status: the current load status of the table with the option to filter by one or more statuses
    About the table load status

    In this release, the status column now shows the load status for a given table. While a load is running, these statuses are viewable. The table load statuses are:

    • In Queue
    • Extracting
    • Enriching
    • Loading
    • Post-loading
    • Success
    • Failed
    Filter by table load status

    New to this release is the ability to filter the tables for a specific load job by table load status. In the Job Details section, in the Status column, you can now filter by one or more status filters. The filter options are:

    • Succeeded
    • Failed
    • Interrupted
    • Running
    • In Queue
    Table load status tooltip

    A tooltip for the table load status details the durations of parallelized activities and states within the extraction and load phases of a table load. The tooltip statuses are:

    • In Queue
    • Extraction
    • Enrichment
    • Load
    • Post-load
    Note

    The activities and related states differ based on the load type (Full, Incremental, or Staging) as well as the table type such as a SQL Database table, Materialized View, or Incorta table.

    Continue on Error and Finished with Errors status

    Continue on Error is a new feature in this release. The feature allows a load job to continue even when there are errors. In other words, a load job for a schema will continue when there is an error or exception. The job will complete and the load job report will depict a load status of Finished with Errors.

    Here are the types of errors that will not stop the load job for a schema:

    • A join creation error
    • A formula calculation error
    • An error with an alias table selected table reference
    • An error with an Incorta Table (a table created with the Analyzer)

    Join errors will write an internal error flag within the direct data mapping (snapshot) files. Formula calculation errors will result in a column with null values. An error with an Incorta table results in empty table columns.

    Here are the steps to view the details of Finished with Errors:

    • In the Load Job Viewer, in Jobs, select a specific load job with errors.
    • For a job with errors, in Load Status, select Finished with Errors.
    • In the Job Errors dialog, review the error.
    • Optionally select Copy to Clipboard.
    • For each error, to review the specific error message and error trace, select View Details.
    • Optionally select Copy to Clipboard.
    • To close the details dialog, select Ok.
    • To close the Job Errors dialog, select Ok.

    Save without validation and discovery

    In this release, you can now modify and save a materialized view, single-source table, or multi-source table without validating any script changes or discovering changes to the output columns.

    Save without validation and discovery for materialized views

    You can now save script changes to a materialized view without validating the changes. In the Table Editor, in the Table summary section, a materialized view that has not been validated will show a status of Not validated. In the Schema Manager, in Tables, a materialized view that has not been validated will show the warning, Not validated.

    There are several ways to resolve the Not validated warning for a materialized view:

    • Manually validate the script changes in the Data Source dialog
    • If the materialized view does not support incremental loading, perform a successful full load for the materialized view
    • If the materialized view supports incremental loading, perform both a successful full load and incremental load for the materialized view
    Note

    A materialized view without validation will successfully load if the columns in the previous version and the Not validated version are the same. Discovery in this sense is for column names, not column data types.

    Here are the steps to save changes to a materialized view script without validation:

    • For the given schema in Schema Designer, select an existing Materialized View.
    • In the Table Editor, in the Table summary section, to open the Data Source dialog, select the table icon.
    • To edit the execution code, in Script, select one of the following:

      • with Notebook Integration enabled, to open the Script Editor, select Edit Query.
      • with Notebook Integration enabled, to open the Notebook Editor, select Edit in Notebook.
      • without Notebook Integration enabled, select the Script open icon or textbox.
    • Modify the execution code in either the Script Editor or the Notebook Editor.
    • To close the Script Editor or the Notebook Editor, select Done.
    • In the Data Source dialog, select V (down arrow) → Save Script Only.
    • In the Table summary section, verify Not validated.
    • In the Action bar, select Done.

    Save without validation and discovery for single-source and multi-source tables

    A single-source table is a physical schema table with only one defined data source. A multi-source table is a table with two or more defined data sources.

    This release supports editing a single-source or multi-source table without discovering the columns of a specified data source.

    In the Table Editor, in the Table summary section, a data source for a table that has not been validated will show a status of Not validated. In the Schema Manager, in Tables, a table that has not been validated will show the warning, Not validated.

    There are several ways to resolve the Not validated warning for a single-source or multi-source table:

    • In the Data Source dialog, select Validate.
    • If the table does not support incremental loading, perform a successful full load for the table
    • If the table supports incremental loading, perform both a successful full load and incremental load for the table
    Note

    For a multi-source table with a data source that is Not validated.it is not possible to manage the output columns. A table without validation will successfully load if there are common columns in the previous version and the Not validated version. Discovery in this sense is for column names, not column data types.

    Here are the steps to save changes to either an existing single-source or multi-source table without validating the changes:

    • For the given schema in Schema Designer, select a multi-source table.
    • In the Table Editor, in the Table summary section, to open the Data Source dialog, select a table icon.
    • Make the required changes.
    • In the Data Source dialog, select V (down arrow) → Save Without Discovery.
    • In the Table summary section, verify Not Validated for the modified data source.
    • In the Action bar, select Done.

    Incorta PostgreSQL for materialized views

    You can now create a materialized view using PostgreSQL. Using the Script Editor, you can define a SQL SELECT statement using the PostgreSQL syntax. This new feature replaces the need to create a PostgreSQL data source in order to materialize a schema table or business schema view as Apache Parquet in shared storage.

    Important

    This release supports a single threaded JDBC connection for a PostgreSQL materialized view. Because data is serialized from memory into the PostgreSQL protocol and then deserialized back into memory, for large tables over hundreds of millions of rows, you may run into performance issues and scalability limits.

    Unlike the Spark SQL option for a materialized view, PostgreSQL enables the following use cases and scenarios:

    • Querying a business schema view
    • Querying formula columns in a physical schema view from another schema table
    • Using PostgreSQL built-in functions

    Create a Materialized View with PostgreSQL using the Script Editor

    In this release, for a PostgreSQL materialized view, only the Script Editor is available. Here are the steps to creating Materialized View with PostgreSQL using the Script Editor:

    • For the given schema in Schema Designer, in the Action bar, select + New → Materialized View.
    • In the Data Source dialog, in Language, select Incorta PostgreSQL.
    • In Script

      • without Notebook Integration enabled, to open the Script Editor, select the Script open icon or text box.
      • with Notebook Integration enabled, select Edit Query.
    • Enter your PostgreSQL SELECT statement.
    • Select Done.
    • Select Save.
    • Specify a Table Name.
    • In the Action bar, select Done.

    SparkR support for materialized views

    This release supports creating a materialized view using SparkR. SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. R is one of the most popular programming languages for statistical modeling and analysis.

    About R

    R is a freely available language and environment for statistical computing and graphical analysis. R provides support for a wide variety of statistical and graphical techniques: linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, and many more. R also supports data wrangling. For example, packages such as dplyr or readr can transform messy data into a structured form. R simplifies quality plotting and graphing with its native support for libraries such as ggplot2 and plotly. In addition, R has a rich set of packages with over 10,000 packages in the CRAN repository.

    There are many packages that provide support for machine learning algorithms related to classification, regression, and neural networks.

    About SparkR

    To learn more about the SparkR, please review the documentation for SparkR for Apache Spark 2.4.3 as this is the version that comes bundled with Incorta.

    Installation Requirements

    All Incorta Nodes in the given cluster require R 3.4 or above. You can find R available for download at https://cran.r-project.org/mirrors.html

    Additional R packages

    After confirming the successful installation of R3.4 or above on all hosts, you must also install the following packages from R shell:

    To install the Knitr package, enter the following command in R shell:

    install.packages("knitr")

    To install the Stringi package, enter the following command in R shell:

    install.packages("stringi")

    To install the Stringr package, enter the following command in R shell:

    install.packages("stringr")

    To install the httr package, enter the following command in R shell:

    install.packages("httr")

    To install the SparkR package, enter the following command in R shell:

    install.packages("https://cran.r-project.org/src/contrib/Archive/SparkR/SparkR_2.4.3.tar.gz", repos = NULL, type="source")

    SparkR Example The following example reads the rows from the SALES.CUSTOMERS table and then persists the results:

    s = read("SALES.CUSTOMERS")
    save(s)

    Available Helper Methods

    There are several helper methods available for creating a materialized view with R.

    Method Description
    get_last_refresh_time ( ): Long Returns the last refreshment date for the materialized view
    save(dataFrame: DataFrame): Unit Required method to persist the materialized view
    read(tableName: String): DataFrame Read a schema table and return the table as a dataframe
    readFormat(format: String, path: String): DataFrame Reads a data source and a path and returns a dataframe object

    Available Helper Methods with Notebook Integration enabled

    In addition to the existing helper methods, here are the following helper methods for R and Notebooks:

    Method Description
    display(dataFrame: DataFrame): Unit Displays the dataframe results
    incorta$show(df: DataFrame) Shows the results of the dataframe
    incorta$printSchema(df: DataFrame) Prints the schema of the dataframe
    incorta$describe(df: DataFrame): Unit Displays for each column in the schema,
    the count, mean, standard deviation, min, and max values
    incorta$head(dataFrame: DataFrame, n: Int=1): Unit Displays the N
    number or dataframe results, where N is optional
    incorta$put(key: String, value: Object): Unit Adds a new property to the map of properties
    incorta$get(key: String): Object Retrieves a property from the map of properties

    Create a materialized view with SparkR using the Script Editor

    With Notebook Integration disabled for a given tenant, you can only edit a materialized view using the Script Editor. In this release, with Notebook Integration enabled for the tenant, you can edit using the materialized view using either the Script Editor or with the Notebook Editor.

    You must call the save(dataframe) method to persist the materialized view.

    Here are the steps to create a materialized view with SparkR using the Script Editor:

    • For the given schema in Schema Designer, in the Action bar, select + New.
    • In the Add New menu, select Materialized View.
    • In the Data Source dialog, in Language, select Spark R.
    • In Script…

      • without Notebook Integration enabled, to open the Script Editor, select the Script open icon or text box.
      • with Notebook Integration enabled, select Edit Query.
    • Enter your R code.
    • Select Done.
    • To specify additional materialized view Spark properties, select Add Property.
    • Select Save.
    • Specify a Table Name.
    • In the Action bar, select Done.

    Create a materialized view with SparkR and the Notebook Editor

    A SparkR notebook has the %r declaration. You must call the save(dataframe) method to persist the materialized view.

    Here are the steps to create a materialized view with SparkR and the Notebook Editor:

    • For the given schema in Schema Designer, in the Action bar, select + New.
    • In the Add New menu, select Materialized View.
    • In the Data Source dialog, in Language, select Spark R.
    • In Script, select Edit in Notebook.
    • In one or more paragraphs, enter the R code for the materialized view.
    • Select Done.
    • To specify additional materialized view Spark properties, select Add Property.
    • Select Save.
    • Specify a Table Name.
    • In the Action bar, select Done.
    Additional Considerations for SparkR and the Notebook Editor

    In certain cases, a notebook paragraph in R will show a status of Finished even though the paragraph output reports an error. Some errors will show in the data source dialog. Check the application logs for the root cause of the stack trace.


    Incorta ML New Features

    In this release, there are several important new features added to Incorta’s Machine Learning (ML) Library:

    • Columns Encoding
    • Encoding Recommendations
    • Data Balancing
    • Model Building and Prediction
    • Model Evaluation
    • Time Series Forecasting

    Python Requirements

    The incorta_ml library supports Python 2.7, Python 3.5, Python 3.6, and Python 3.7. Pandas officially supports these versions of Python.

    Important

    Deprecation notice concerning Incorta Power BI Connector:
    Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7.

    Required Incorta ML Python libraries

    Python 3.5 and higher requires the following libraries installed using pip:

    • pyspark
    • numpy
    • pandas
    • lime
    • fbprophet
    • statsmodels
    • pmdarima (python 3+)
    • imbalanced-learn

    Python 2.7 requires the following libraries installed using pip:

    • pystan==2.17
    • subprocess32==3.2.6
    • numpy==1.15.4
    • scipy==1.2.2
    • networkx==2.2
    • matplotlib==2.1.0
    • pywavelets==1.0.3
    • scikit-learn==0.20.3
    • scikit-image==0.14.3
    • lime==0.1.1.30
    • statsmodels==0.10.2
    • pyramid-arima
    • holidays==0.9.12
    • fbprophet==0.5.0
    • seaborn==0.9.1
    • cufflinks==0.17.0
    • imbalanced-learn==0.4.3
    • imbalanced-learn==0.4.3
    • importlib_resources

    Columns Encoding

    Column encoding converts string and date type columns to numeric features.

    Signature

    from incorta_ml import encode_columns
    output_df = encode_columns(df, handle, columns, is_training=False)

    Parameters

    Paramater Description
    input_df a Spark dataframe that contains one or more string or date columns
    handle a unique identifier
    columns a list of column names that you want to encode
    is_training specify as True to build the transformation in Spark’s Directed Acyclic Graph (DAG) or if this is the first time to prepare the training data, otherwise specify as False to apply the transformation on a testing data set or development data set.

    Returns

    output_df: a dataframe with converted strings and date types as numeric features

    Encoding recommendations

    Returns a recommended list of column names for column encoding.

    Signature

    from incorta_ml import recommend_encoding
    output_df = recommend_encoding(input_df, exclude, suppress_printing=True)

    Parameters

    Paramater Description
    input_df a Spark dataframe that contains one or more string or date columns
    exclude a list of column names to exclude from recommendations
    suppress_printing a flag; False prints the recommended columns

    Returns

    output_df: dataframe that contains numeric features, transformed features, and labels

    Data Balancing

    Balances the training data of a classification problem based on a categorical column.

    Signature

    from incorta_ml import balance_data
    output_df = balance_data(df, label_column_name)

    Parameters

    Paramater Description
    input_df a Spark dataframe
    label_column_name categorical column for the balancing to be based on.

    Returns

    output_df: balanced dataframe

    Model building and prediction

    Builds the model and persists the model to disk. Incorta ML supports the following algorithms:

    • LogisticRegression
    • DecisionTreeClassifier
    • RandomForestClassifier
    • GBTClassifier*
    • MultilayerPerceptronClassifier*
    • LinearSVC*
    • NaiveBayes
    • LinearRegression
    • GeneralizedLinearRegression
    • DecisionTreeRegressor
    • RandomForestRegressor
    • GBTRegressor
    • IsotonicRegression

    GBTClassifier and LinearSVC works only with binary classification data. Support for MultilayerPerceptronClassifier is in manual mode. You must specify as parameters the layers in the form of a two dimensional (2-D) array. You can add any number of hidden layers and hidden layer sizes:

    [
      [number_of_input_features, hidden_layer_1, hidden_layer_2, hidden_layer_n, number_of_classes],
     ]

    Signature

    from incorta_ml import build_model
    auto_modeling(input_df, model_name, label_column_name, params=None, mode=None)
    output_df = predict(input_df, model_name)

    Parameters

    Paramater Description
    input_df a Spark dataframe that contains the
    featurecolumns and label columns. All feature columns must be numeric
    model_name a name to identify the Model
    label_column_name the name of the label columns as a two part qualified name, input_df.columns
    params Set as None to enable Auto Mode.
    Otherwise, specify the model and its parameters at PySpark MLlib
    mode a string such as ‘classification’, ‘regression’ or None to specify Auto Mode.
    In Auto Mode, it is not necessary to specify params other than None or {}.

    Returns

    Persists the model to disk.

    Model Evaluation

    Evaluate a model on a dataframe.

    Signature

    from incorta_ml import evaluate
    output_df=evaluate(input_df, model_name)

    Parameters

    Paramater Description
    input_df a Spark dataframe that contains the feature columns
    and the label column for which the schema should match that of the training data
    model_name the name of the model for evaluation

    Returns

    output_df: a dataframe that contains two columns, metric_name and value, where each row represents a metric value.

    Time Series Forecasting

    Builds a Time Series model, persists the model to disk, and returns a dataframe that includes predictions and a number of future points. For more information about frequency formulas, please refer to pandas documentation for time series offset aliases.

    Signature

    train_time_series_model(input_df, handle, time_column_name, value_column_name,  algorithm_names, horizon=0, intervals=None)

    Parameters

    Paramater Description
    input_df a Spark dataframe that contains a date column
    or timestamp column, and a values column
    handle a name to identify the process.
    Recommendation is to have the name reflect the prediction for the feature.
    time_column_name names of the date or timestamp columns
    available in the specified dataframe,
    each as a qualified name such as input_df.column
    value_column_name name of the values columns, each as a qualified name such as input_df.column
    algorithm_names names of the algorithms to use:
  • 'est' for simple exponential smoothing
  • 'auto_arima' for auto arima
  • 'fbprophet' for facebook prophet
  • horizon number of points that will be selected as a testing set.
    intervals the size of each period, a frequency formula,
    an integer representing the number of seconds, or None for auto.

    Returns

    Returns a dataframe that includes predictions and a number of future points.


    Additional Improvements and Enhancements

    In the 4.9 release, there are additional improvements and enhancements:

    Support for folders in the Google Sheets connector

    In this release, the connector for Google sheets now supports selecting a folder that contains one or more Google Sheet documents. To learn more, please review Connectors → Google Sheets.

    Support for more incremental data type columns for MS SQL Server

    For Microsoft SQL Server database tables, specify an incremental column of the type double, integer, or long Incremental Support

    In this release, for an incremental table of the type SQL Database with a data source that is MS SQL Server, an incremental column supports numeric types such as double, integer, and long. This is in addition to date and timestamp.

    Parallelize Apache Spark queries with split Parquet files

    In this release, you can inform the Loader Service to split Parquet files to take advantage of unused cores in Apache Spark. Only on a full second load will the splitting of files occur.

    The num.of.parquet.parts informs the Parquet writer to split an existing Parquet file where each part contains at maximum the number of records per part.

    You can define this property in the node.properties file, or for the Loader Service, in the service.properties file.

    Syntax

    num.of.parquet.parts=<INTEGER>

    Example that split the Parquet into a maximum of 10 parts:

    num.of.parquet.parts=10

    Additional considerations

    In order to determine the number of parts, after the first full load, review the row count. Next, using the core count available to Apache Spark.

    Note

    If migrating from one Apache Spark environment to another and the core counts differ, recalculate and reassign the num.of.parquet.parts property.

    Schema dependencies for inspector jobs

    In this release, an Inspector job will now generate a new file, schemaDependency.csv. The file contains schema dependencies as follows:

    • Tenant Name
    • Schema
    • Dependent Schema

    The new file identifies joins from one schema to another schema and includes alias tables that contain a formula column.

    Content Manager preserves the List view sorting

    In this release, the sort order of a select column in the List view of the Content (Catalog) Manager will be preserved.


    Known Issues

    Here are the known issues in this 4.9.0 release:

    • Multiple suspended schedules will show up as not scheduled.
    • Certain dashboard insights may show the message, The data is being refreshed, even though the Loader Service reports a successful load of the schema.
    • A PostgreSQL materialized view that refers to a formula column in a physical schema may fail during a load job.
    • For an Advanced Map insight, it is not possible to set a Base Field for a Geo Data measure.

    © Incorta, Inc. All Rights Reserved.