Concepts → Physical Schema
About a Physical Schema
A physical schema defines one or more entity objects and the relationships between entity objects. It also defines how the Loader Service extracts, loads, and enriches data for the physical schema, and how the Analytics Service loads data into memory for the physical schema. The Incorta Metadata database captures this distinction. In the Incorta Metadata database, in the SCHEMA
table, the SCHEMADATA
field captures the XML metadata for the physical schema definition that includes all entity objects and join relationships. The LOADERDATA
field captures the XML metadata for how the Loader Service loads the entity objects into Shared Storage and how the Analytics Service loads data from Shared Storage into memory.
A physical schema serves as a base object for other dependent objects such as a join to another physical schema entity object, a business schema, an insight on a dashboard tab, or an internal session variable.
A physical schema owner or another privileged user can grant access rights to a physical schema to one or more users or groups.
About entity objects in a physical schema
There are five types of entity objects in a physical schema:
In this release, an Incorta SQL table is an Incorta Labs feature.
Depending on individual configurations and related data sources, an entity object in a physical schema may require significant resources for compute (CPU), memory (on-heap and off-heap), network, and shared storage.
As a result, schema developers and related administrators (tenant, Cluster Management Console (CMC), Apache Spark, and Linux host administrators) must optimize the design and load configuration of each physical schema design in a given tenant. The goal is to ensure that the Incorta cluster has adequate resources to efficiently and reliably support all workloads.
Because a physical schema represents a workload that will consume compute, memory, network, and shared storage, each physical schema requires testing and optimization before migration to a production environment.
About joins in a physical schema
A join defines how one or more entity objects relate. A join can be a self-join, a join between two entity objects in the same physical schema, or a join between two entity objects in different physical schemas.
To learn more about a join and how to create a join, start with Concepts → Join.
About a physical schema diagram
Using the Schema Diagram Viewer, a user can visually review the entity objects in the physical schema and the relationships between entity objects. To learn more, see Tools → Schema Diagram Viewer
Properties of a physical schema
Some properties for a physical schema are configurable. Other properties are read-only, but are helpful to monitor and understand. The three categories of properties for a physical schema are:
- Object properties
- Load job properties
- In-memory properties
Object properties
A physical schema has the following object properties:
Property | Description |
---|---|
Name | The name of the physical schema. This is immutable. |
Owner | The login name (username) of the user that initially creates the physical schema |
Description | The description for the physical schema |
Performance optimization | Enables each Analytics Service in an Incorta cluster to load the physical schema into memory |
Load order | Configure the load order of physical schema entity objects (except Aliases) by one or more groups and and materialized views by one or more groups |
Name property
A physical schema name must adhere to the following name validation rules:
- Must be unique to the tenant
- Must be between 1 and 250 characters in length
- Must begin with an alpha character, lower or upper case
- After the first alpha character, can contain zero or more alphanumeric characters in lower, upper, or mixed case
- After the first alpha character, can contain zero or more underscore (_) or Dollar sign ($) characters
- Besides underscore (_) and Dollar sign ($), cannot contain special characters, symbols, or spaces
Load job properties
Representing a workload, a physical schema also has aggregated load job properties:
Property | Description |
---|---|
Last Load Status | Describes the status of the last load as a date and time, Finished with errors or, during a load job, shows the current status such as In Queue, Extraction, Enrichment, Load, or Post-load |
Loading Time | Displays duration for the load |
Tables | The number of entity objects in the physical schema |
Joins | The number of joins between object in the physical schema |
Rows | The number of rows available to load into the memory of each Analytics Service in an Incorta cluster. When applicable, a Load Filter typically influences this value. |
Data Size (Data on Disk) | Size in KBs, MBs, or GBs for all files related to the physical schema load in shared storage, including Direct Data Mapping (DDM) files and Parquet files. |
In-Memory properties
In the Cluster Management Console, a CMC Administrator can view the memory usage details for an Analytics Service or Loader Service for one or more Incorta Nodes in a given Incorta cluster. At a high-level, the CMC depicts memory usage in terms of both On-Heap and Off-Heap memory. In this release, the CMC also details the Off-Heap segmentation between Active memory and Pooled memory.
Memory usage details
The Memory Usage details dialog show Off-Heap memory.
Property | Description |
---|---|
Name | The object name such as a tenant, a table, data column, calculated column, or join column |
Type | The data type for a column or join column such as Boolean, Date, Double, Integer, Long, Null, Timestamp, or String) |
Is In Memory | True, False, and Partial are the possible values. True means all data and all join columns from all applicable entity objects are in memory. False means that none of the data or the join columns from all applicable entity objects are in memory. Partial means that some of the data (indexes or data columns) or of the joins are in-memory. |
Loaded Memory | The total off-heap memory. |
It is also possible to view a more granular view of memory diagnostic details for Off-Heap memory using the memory.jsp
page for a given Analytics or Load Service on a given Incorta Node. The memory diagnostics page details the following:
Property | Description |
---|---|
Name | Describes the name of the object such as a tenant, a table, data column, calculate column, join column |
Type | The data type for a column or join column such as Boolean, Date, Double, Integer, Long, Null, Timestamp, or String) |
Data | The combined size of the data as an index to the dictionary, the dictionary, and the hash table. |
Dictionary | The size of the actual data (key columns) as an array of unique values |
Hash Table | The size of hash table for the hash of key columns. |
String Pool | The size of of all key columns that are of type String stored as as a String Pool. A String pool is an array of characters that contains the number of characters in string followed by the string of characters. |
Is In Memory | True, False, and Partial are the possible values. True means all data and all join columns from all applicable entity objects are in memory. False means that none of the data or the join columns from all applicable entity objects are in memory. Partial means that some of the data (indexes or data columns) or some of the joins are in-memory. |
Loaded Memory | The total off-heap memory. |
Access rights for a physical schema
A SuperUser (tenant administrator) or a user that belongs to a group assigned the SuperRole or the Schema Manager role can create, view, and edit the physical schemas in a given tenant.
Only a SuperUser (tenant administrator) or a user that belongs to a group assigned the SuperRole can view, edit, and share all physical schemas, regardless of ownership.
The following are the available access rights for a physical schema:
Property | Icon | Description |
---|---|---|
Can View | eye | The user can View the physical schema and the loaded data for specific entity objects, for example, in the Analyzer |
Can Share | node tree | The user is able to Share the physical with other users and groups |
Can Edit | pen and paper | The user is able to Edit the physical schema as well as execute a Load job for the physical schema |
The Edit access right for a physical schema also allows a user to execute a Load job for the physical schema or Load table job for given entity object.
Additional considerations for a physical schema
There are numerous considerations for designing and managing a physical schema ranging from referential integrity, data consistency, denormalization, formula column calculations, data duplication, pre-aggregation, enrichment, incremental loads, load job schedules, load durations, runtime security, migration, versioning, cross-schema joins, object dependency, and dashboard performance.