Incorta Security Guide
This guide summarizes the Incorta security model and optional security configurations. It also describes the common considerations for securing the Incorta Direct Data Platform.
An Incorta Cluster may consist of one or more host machines. Each host may run one or more applications and services. The following diagram describes a three host Incorta Cluster that requires Shared Storage.
As the diagram is for illustrative purposes, it does not show all applications and services. In addition, certain applications and services are optional such as the Notebook Add-on service. An enterprise cluster topology for Incorta typically supports high availability and disaster recovery. To learn more about configurations for high availability and disaster recovery, please review Configure High Availability.
Here is a detailed list of the application and configurations for each host in the diagram:
Security for Hosts, Applications, and Services
Incorta encourages administrators to implement security for all hosts, applications, and services in an Incorta Cluster wherever possible.
Secure Access to Linux Host
To access a host in an Incorta Cluster, only use Secure Shell access (SSH) that requires a .PEM or .PPK key.
Secure Linux Host Users
Create only the required Linux host users. Incorta does not recommend using the root user. Instead, create an Incorta user with the required permissions and access. Consider restricting the Bash Shell commands for the Incorta user.
Access to Shared Storage
Create a shared mount that only the Incorta Linux user has access to. The Incorta Linux user requires Read, Write, and Execute permissions.
Secure Apache ZooKeeper
For more details about how to best secure Apache ZooKeeper, please review Secure ZooKeeper.
Secure Apache Tomcat
For more details about how to best secure Apache Tomcat, please review Secure Tomact with TLS.
Secure Apache Spark
By default, security in Apache Spark is off. To learn more about security and Apache Spark, please review https://spark.apache.org/docs/2.4.3/security.html .
Secure MySQL 5.6
To learn more about securing MySQL 5.6, please review https://dev.mysql.com/doc/refman/5.6/en/security.html.
Required Host Ports
Incorta strongly encourages only exposing the required ports for an Incorta Cluster and limiting ports to a whitelist of Private or Public IPs. Please be aware that applications such as Apache Spark often require a range of ports for distributed processes such as broadcast and shuffle operations. Here are the list of common ports that an Incorta Cluster requires:
|Port Number||Application or Service|
|2049||NFS for Shared Storage|
|2888||Apache ZooKeeper internal|
|3306||MySQL 5.6 default|
|3888||Apache ZooKeeper internal|
|4040 - 4056||Spark Application Internal|
|4500||Incorta Node Agent|
|5436||SQLi Interface using the PostgreSQL protocol|
|5442||SQLi Interface direct to Apache Spark|
|5500||Apache Zeppelin Notebook Integration|
|6009||AJP Connector Port|
|6060||Web for Incorta Cluster Management Console|
|7077||Apache Spark Master Internal|
|7337||Apache Spark Shuffle Service (required if enabled)|
|8005||Tomcat Server port|
|8080||Web for Incorta Analytics Service|
|9091||Web for Apache Spark Master|
|9092||Web for Apache Spark Worker|
Each application and service that comprises the Incorta Direct Data Platform requires security configuration and administration. Security configuration and administration cover the following functional areas:
- Secure Communications
- Authentication, Authorization, and Access
- Audit Access
- Data Security
To ensure secure communications between users and Incorta, administrators must implement HTTPS. One way to enable HTTPS is to configure NGINX Web Server and to use this server as Web Proxy for Incorta. A common approach is to configure HTTPS with Let’s Encrypt SSL.
Let’s Encrypt is a Certificate Authority (CA) that provides free TLS/SSL certificates to enable HTTPS on web servers. Let’s Encrypt provides a Certbot client that automates most of the steps required to obtain a certificate and to configure it within the NGINX web server. Incorta recommends that organizations use their own Trusted Certificates.
To learn more about HTTPS with Nginx, please review Configuring HTTPS Servers.
To learn more about Let’s Encrypt, please visit https://letsencrypt.org/.
It is possible to configure a load balancer in front of Incorta with a public URL that redirects traffic to Incorta. This prevents direct access to the Incorta Analytics Service URL by normal users. For an example of how to configure Apache Web Server, please visit Configure an Apache Web Server.
Node Agent Communications
As a web application, the Cluster Management Console communicates to Node Agents in order to help start and stop services. These communications are Protocol Buffer messages that are encoded and decoded. Incorta does not send sensitive data using the NIO channel.
Public and Private Key Management
Currently, Incorta does not share public or private keys between hosts in an Incorta cluster. Incorta uses the same 128-Bit Advanced Encryption Standard (AES-128) bit cipher to encrypt passwords or secret keys for data sources.
Authentication, Authorization, and Access Configuration
There are two ways for a user to access data stored in the Incorta Direct Data Platform:
- Sign in to Incorta Analytics Service web application
- Connect to the Incorta Analytics Service with the SQLi interface using the PostgreSQL protocol.
In both cases, Incorta will authenticate the user using a tenant name, username, and password. Authentication is tenant specific in Incorta, and as such, is a Tenant Configuration in the Cluster Management Console.
Metadata that describes objects in Incorta such as tenants, schemas, business schema, session variables, and dashboards are accessible using the Incorta Command Line Interface (CLI). The Incorta CLI only allows for the import and export of metadata and does not expose data stored in Incorta.
Incorta supports various types of user authentication for a given tenant. To learn more, review Tenant Security Configurations.
Incorta supports various Authentication Types for the Incorta Analytics Services as a tenant configuration. In the 4.9 version of Incorta, the SQli interface only supports mixed mode authentication support for both the Incrota Analytics Services and the SQLi interface.
Incorta authentication consists of a username and password. For a given tenant, a CMC administrator can configure the password policy for Incorta Authentication. This includes the following policy properties:
- Minimum Password Length
- Password Cannot Include Username
- Require Lower Case Letters
- Require Upper Case Letters
- Require Digits
- Require Special Characters
SSO (Single Sign On)
The Incorta Analytics Service supports Security Assertion Markup Language Type 2 (SAML2) for Single Sign On:
LDAP (Lightweight Directory Access Protocol)
Incorta Analytics supports the Lightweight Directory Access Protocol (LDAP). You can also use SSO with LDAP.
To learn more, visit Configure LDAP in Incorta documentation.
Authorization and Access
Incorta’s security model is optimistic, meaning that Incorta enforces the least restrictive role permissions and access rights. The Incorta security model is based on two common approaches to enterprise security:
- Role Based Access Control (RBAC)
- Discretionary Access Control (DAC)
Role Based Access Control
Role Based Access Control (RBAC) enforces access to certain features and functionality within the Incorta Analytics Services. The Incorta Loader Services is not accessible. The Incorta Cluster Management console is a separate web interface, and allows for one single administrator user.
There is no direct way to assign a Role to a user, with two exceptions:
- All users inherit the User role
- A tenant administrator inherits the SuperUser role unless otherwise configured for the tenant
In Incorta, a user belongs to zero or more Groups, and a Group is assigned to zero or more Roles. Roles are immutable. You cannot create, edit, or delete a Role. Here are the available Roles in Incorta:
- Analyze User
Manages folders and dashboards and has access to the Analyzer screen. This role creates Dashboards with shared and personal (requires Schema Manager) schemas. This role also shares with the Share option, shares through email, or schedules Dashboards for sharing using email.
- Individual Analyzer
Creates new dashboards using shared or personal schemas (requires Schema Manager). This role cannot share or send dashboards via email.
- Dashboard Analyzer
In addition to viewing and sharing the dashboards available to the user role, this role will also be able to personalize the dashboards shared with them.
- Privileged User
Shares and schedules sending dashboards using emails.
Creates schemas and data sources and loads the data into the schemas. This role also shares the schemas with other users so they can create dashboards.
- SuperRole Manages users, groups, and roles. Can create users and groups. This role also creates schemas and dashboards without requiring any additional roles. This is the master Admin role.
The default roles assigned to an end-user assigned to a group. This role views any dashboard shared with them. This role can apply filters but cannot change the underlying metadata.
- User Manager
Creates and manages groups and users. Creates groups and adds roles. Adds users to groups.
The following table describes the Access Rights to feature and functionality for each Role.
|Analyze User||Can Manage: Catalog; Can View: Schema|
|Dashboard Analyzer||Can Share: Catalog|
|Individual Analyzer||Can Manage: Catalog; Can View: Schema|
|Privileged User||Can Share: Catalog|
|Schema Manager||Can Manage: Schema, Data|
|SuperRole||Can Manage: Security, Catalog, Schema, Data|
|User||Can View: Catalog|
|User Manager||Can Manage: Security|
Discretionary Access Control (DAC)
With Discretionary Access Control (DAC), a user who owns an object — schema, business schema, session variable, or dashboard — is able to control the access to the object. In other words, the object owner defines the Access Control List (ACL) associated with the object. An ACL is a list of users and groups. For each user and group, the owner can set and revoke the access rights. Only the owner of an object can delete the object.
For an object in Incorta, there are three possible access rights:
- Can View: Has view (read) access
- Can Share: Has view (read) and share access
- Can Edit/Manage: Has view (read), share, and edit access
Luke is the owner of a dashboard. Luke shares the dashboard with Jake, granting View access to Jake. Luke also gives Share access to the Analyst group and Edit access to Niki. Paul belongs to the Analyst group, and for that reason can both view and share the dashboard. Paul shares the dashboard to the Business group. Niki, who has Edit access to the dashboard, changes the access rights for the Business group, giving that group Edit access to the dashboard. Rachel belongs to the Business group and attempts to delete the dashboard. As Luke is the owner of the dashboard, Incorta prevents the deletion. Luke changes the access rights for the Analyst and Business groups from Edit to Share.
To learn more about best practice for DAC and RBAC in Incorta, please review Managing dashboards and folders.
It is also possible to review object permission history in Incorta. Incorta captures access right assignments in the Incorta Metadata database. The Permissions Dashboard provides a view to permission grants and revocations for all objects in Incorta.
Incorta tracks all user activities for a given tenant in an Audit.csv file. User activities include when a user:
- Sign ins
- Creates, edits, shares, and deletes an object such as a dashboard
- Loads data for a schema or a table
- Downloads data for an insight in a dashboard
- Signs outs
Incorta writes a log of all user activities for a given tenant to an Audit.csv file. To learn more about Incorta’s Audit capabilities, please review SOX Compliance.
All update and delete actions performed by users against Incorta objects are also captured and stored in Incorta’s metadata database in the Action table. Incorta provides an Audit Action dashboard for tracking this history.
As a Unified Data Analytics Platform, Incorta ingests data from external data sources and allows users to upload local files and folders. In this regard, Incorta mirrors and copies data from the specified source systems. Incorta users are unable to modify or edit data that Incorta ingests.
By default, Incorta encrypts sensitive data such as user passwords, data source passwords, data source security tokens, and data on disk using AES-128. For more information about AES encryption please review Advanced Encryption Standard (AES).
Incorta stores the secret key internally within application code. This key is not exposed and cannot be modified.
For data ingested into Incorta, there are several security considerations:
- Encryption of data source credentials
- Encryption of data at rest
- Encryption of defined table columns
- Row Level Security (RLS)
Encryption of data source credentials
A data source such as a MySQL database requires a username and password. Incorta encrypts the password that Incorta stores for the connection. When making a data source connection, Incorta decrypts the encrypted value.
Encryption of data at rest
Incorta ingests data from external data sources and allows users to upload local files and folders.
Local files and folders
A user may upload one or more files and one or more folders to Incorta. Incorta copies the files from the local machine and stores the files in the Tenants directory in Shared Storage. The default path to data directory is:
Incorta does not encrypt uploaded local files and folders. Incorta does support using password protected MS Excel Files.
External Data Source
Some data source connectors natively encrypt ingested data. For example, an Apache Kafka data source automatically encrypts data that Incorta consumes from the specified Kafka topic. The encrypted data is in CSV format.
Other data source connectors that ingest data from an application or service may not encrypt data. Oracle Fusion, Google Drive, Box, and DropBox are examples of data source connectors that currently do not encrypt ingested data.
To find out more about encryption support for ingested data, please review Supported Data Source Connectors and connect with Incorta support at email@example.com.
Encryption of defined table columns
Using the Table Editor for a table in a given schema, a schema developer can explicitly define an Incorta column for encryption.
Incorta suggests storing all sensitive data using the Column Encrypt property for a column in a table in a schema.
When loading data, Incorta encrypts these columns using 128-Bit AES encryption and stores the data in encrypted form on disk in Shared Storage. Shared Storage consists of Direct Data Mapping (DDM snapshot) files and Apache Parquet files. Only when reading the encrypted data does Incorta decrypt the data.
Row Level Security (RLS)
Using the Table Editor for a table in a given schema, a schema developer can implement a Runtime Security Filter. With a runtime security filter, it is possible to implement Row Level Security (RLS) with Incorta. Row level security typically determines which user or group of users has view access to one or more rows. To learn more about runtime security filters and their practical application for RLS, please review the following documentation and community article:
- Runtime Security Filters
Additional Security Considerations
Incorta is a Java Virtual Machine (JVM) web application that ingests potentially sensitive data and stores data in-memory and in shared storage. As a Unified Data Analytics Platform that potentially stores sensitive data, there are several additional security concerns that Incorta endeavors to address. In general, these security concerns are:
- Injection attacks
- Materialized views and Apache Spark
- Sensitive data in Heap Dumps
- User impersonation
An injection attack is an attacker’s attempt to send data to an application in a way that will change the meaning of commands being sent to an interpreter. Every interpreter has a parser. An injection attack attempts to trick a parser into interpreting data as command.
For example, consider the following change to SQL where an attacker appends “OR 1=1” to the predicate of an SQL query as in “WHERE ID=101 OR 1=1”. Now, instead of just one result being returned, all results are returned. The SQL parser interprets the untrusted data as a part of the SQL query command.
The injection context describes when an application uses untrusted data as part of a command, document, or other data structure. The goal of an injection attack is to:
- Break out of the command, document, or structure’s context
- Modify the meaning of the command, document, or structure so as to cause harm
As a web application whose purpose is to process enterprise application data for modern analytics, Incorta is subject to a variety of injection attacks based on the injection context such as:
- SQL queries
- LDAP queries (if configured)
- Operating system command interpreters
- XML documents (Incorta stores metadata as XML
- HTML documents (Incorta renders HTML)
- JSON structures (Incorta ingests JSON, for example, using Kafka, as well as renders JSON for data visualizations)
- HTTP headers
- File paths
Incorta employs various types of preventive measures to thwart a variety of injection attacks.
In this type of injection attack, a reference can be a database key, a URL, a filename, or some other kind of lookup index. Incorta prevents the injection and ensures that this type of injection does not allow for command execution.
Incorta embeds the grammar of the guest languages, for example SQL, into that of the application language, in this case, Java. In doing so, Incorta automatically generates code that maps the embedded language to constructs in the host language. The result is that Incorta reconstructs the embedded command adding escaping functions where appropriate.
HTML Document Injection
Incorta prevents the insertion of untrusted data into the HTML Document Object Model. Incorta properly employs output escaping and HTML encoding in order to prevent Cross-Site Scripting (XSS) attacks. Escaping is a standard technique to ensure that characters are treated as data, not as characters that are relevant to the interpreter’s parser.
Cross-Site Scripting (XSS)
Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications. XSS attacks enable an attacker to inject client-side scripts into web pages viewed by other users. Attackers exploit a cross-site scripting vulnerability so as to bypass access controls such as the same-origin policy.
In order to help prevent XSS attacks, Incorta sanitizes rendered data using common techniques for output escaping and input encoding. In addition, Incorta recommends the following:
- Enable HTTPS
- Enable Tomcat’s Secure Flag
Cross-Site Request Forgery (XSRF)
Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, Cross-Site Request Forgery (XSRF), also known as one-click attack or session riding, is a type of malicious exploit of a website where unauthorized commands are transmitted from a user that the web application trusts. XSRF commonly has the following characteristics:
- It involves sites that rely on a user’s identity.
- It exploits the site’s trust in that identity.
- It tricks the user’s browser into sending HTTP requests to a target site.
To fully protect from XSRF attacks, Incorta recommends that administrators implement HTTPS and set the Secure flag to true in Apache Tomcat to avoid sending the cookie value over HTTP. By default, Incorta does not set the secure flag on the JSESSIONID and XSRF-TOKEN cookies.
The Secure flag is a Tomcat configuration setting. The flag informs a web browser that a cookie should only be sent to the web server application using HTTPS. Even with HTTPS enabled, if the Secure flag is not set to true in Tomcat, a web browser will send the cookie value over HTTP. To learn more about how to set the Secure flag for Tomcat, please review the following Incorta documentation:
There are several places where a user that inherits the Schema Manager role can specify a SQL statement.
To help prevent SQL injection, Incorta only accepts SELECT statements. A table that supports an incremental load uses a parameterized syntax. Incorta bounds the parameter value using a bound parameter in the Java Database Connection (JDBC) itself. Bound parameters in JDBC protect against SQL Injection.
As additional protection, Incorta always recommends that data source connectors specify user credentials with read only access to the source data.
Materialized Views and Apache Spark
In Incorta, a materialized view is a type of table in a schema that requires Apache Spark for data materialization and enrichment. In Incorta 4.6, Incorta supports two programming languages for materialized views: Spark SQL and PySpark.
SparK SQL is a declarative language. Incorta only supports SELECT statements. However, because PySpark is Python for Apache Spark, there is the potential for direct command and reference injection within a materialized view using Python code. For example, in Python, a programmer can import the os library, and in doing so, view all the contents of a directory.
import os os.system('ls /home/incorta/incorta/ -ls')
To limit this exposure, Incorta recommends the following:
- Assign the Schema Manager role sparingly to groups as this role allows users to create and edit schemas.
- Regularly analyze PySpark code within a materialized view
- For a Linux host that manages an Incorta Node that runs the Incorta Analytics Service and/or the Notebook Add-On service, restrict the available Bash Shell commands for the Incorta user. For example, remove the Secure Copy command, scp, from the Incorta user bash.
Sensitive data in Heap Dumps
A heap dump is a snapshot of all the objects in the Java Virtual Machine (JVM) heap at the time of collection. The JVM allocates memory for objects from the heap for all class instances and arrays. The garbage collector reclaims the heap memory when an object is no longer needed and there are no references to the object.
The Incorta Cluster Management Console offers On Heap and Off Heap configuration settings for both the Loader and Analytics Service.
By examining the heap, it is possible to locate created objects and their references in the source. Tools such as Eclipse Heap Analyzer, Eclipse Memory Analyzer, or Java VisualVM can help you view the objects in the heap using a heap dump file.
For this reason, depending when the heap is generated, it is possible to reveal an object’s attribute value using many popular heap analysis tools.
JVM applications allow for both local and remote monitoring using Java Management Extensions (JMX).
A common approach is for administrators to query data from Managed Beans (Mbeans) exposed on a JVM port. Many administrators are familiar with this type of monitoring using jconsole. jconsole tool is a JMX-compliant graphical tool for monitoring a Java virtual machine. It can monitor both local and remote JVMs. It can also monitor and manage an application. Tools such as jconsole have the ability to perform a heap dump and to read a heap dump remotely.
By default, Incorta does not supply the start parameters for JMX monitoring. Security administrators should regularly monitor the Java processes for the Incorta Loader and Incorta Analytics services to determine the presence of unwanted monitoring parameters.
A user that inherits the SuperRole has the ability to impersonate a user. An impersonated user receives an email notifying them of their impersonation. However, this requires SMTP configuration for the Incorta Cluster.
To limit the possibility of unwanted user impersonation, Incorta strongly encourages that security administrators limit the number of users that inherit the SuperRole as well as configure SMTP for the Incorta Cluster.
To learn more about SMTP configuration, please review Email Configuration.