Connectors → Amazon Web Services (AWS) DynamoDB

About Amazon Web Services (AWS) DynamoDB

AWS DynamoDB is a fully managed, proprietary NoSQL database service that supports key-value and document data structures and is offered as part of the Amazon Web Services (AWS) portfolio. It is a multi-region, multi-active, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications.

About The AWS DynamoDB Connector

The AWS DynamoDB connector uses the incorta.connector.dynamodb.jar driver. The AWS DynamoDB connector supports the following Incorta specific functionality:

Feature Supported
Chunking
Data Agent
Encryption at Ingest
Incremental Load
Multi-Source
OAuth
Performance Optimized
Remote
Single-Source
Spark Extraction
Webhook Callbacks

The AWS DynamoDB connector authentication methods

The AWS DynamoDB connector supports two methods for authentication:

For more information, see Understanding and getting your AWS credentials

Steps to connect AWS DynamoDB and Incorta

To connect AWS DynamoDB and Incorta, here are the high level steps, tools, and procedures:

Create an external data source

Here are the steps to create a external data source with the AWS DynamoDB connector:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Data.
  • In the Action bar, select + NewAdd Data Source.
  • In the Choose a Data Source dialog, in Application, select DynamoDB.
  • In the New Data Source dialog, specify the applicable connector properties.
  • To test, select Test Connection.
  • Select Ok to save your changes.

AWS DynamoDB connector properties

Here are the properties for the AWS DynamoDB connector:

Property Control Description
Data Source Name text box Enter the name of the data source
Authentication Method drop down list Select the authentication method for Incorta to get access to resources in your AWS account. Select between Access Key and Temporary Session Token.
Access Key ID text box Enter the Access Key ID for your AWS account or the temporary Access Key ID depending upon the authentication method you select
Secret Access Key text box Enter the Secret Access Key for your AWS account or the temporary Secret Access Key depending upon the authentication method you select
Temporary Session Token text box Select Temporary Session Token for the Authentication Method to configure this property. Enter the temporary session token associated with the temporary Access Key ID and Secret Access Key you entered.
Region drop down list Select the region defined for your AWS DynamoDB service

Create a schema with the Schema Wizard

Here are the steps to create an AWS DynamoDB schema with the Schema Wizard:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + NewSchema Wizard
  • In (1) Choose a Source, specify the following:

    • For Enter a name, enter the schema name.
    • For Select a Datasource, select the DynamoDB external data source.
    • Optionally, create a description.
  • In the Schema Wizard footer, select Next.
  • In (2) Manage Tables, in the Data Panel, first select the name of the Data Source, and then check the Select All checkbox.
  • In the Schema Wizard footer, select Next.
  • In (3) Finalize, in the Schema Wizard footer, select Create Schema.
Important

As DynamoDB is not designed as a relational database and does not support join operations, Incorta does not automatically create joins between the schema tables. You need to define them manually.

Create a schema with the Schema Designer

Here are the steps to create an AWS DynamoDB schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + NewCreate Schema.
  • In Name, specify the schema name, and select Save.
  • In Start adding tables to your schema, select DynamoDB.
  • In the Data Source dialog, specify the DynamoDB table data source properties.
  • Select Add.
  • In the Table Editor, in the Table Summary section, enter the table name.
  • To save your changes, select Done in the Action bar.

DynamoDB table data source properties

For a schema table in Incorta, you can define the following DynamoDB specific data source properties as follows:

Property Control Description Comment / Example
Type drop down list Default is DynamoDB
Data Source drop down list Select the DynamoDB external data source
Select Table drop down list Select the table from the selected data source
Filter Expression text box When you select this text box, it invokes the Query Editor. Enter the filter expression to refine the table query results. Rows that do not match the filter conditions are not returned. Use the : (colon) character in the expression to dereference an attribute value. Price >= :num and ProductStatus IN (:avail, :back, :disc)
Expression Attribute Values text box When you select this text box, it invokes the Query Editor. Enter an expression to specify one or more values that can be substituted in an expression. {":num":{"N":"200"}, ":city":{"S":"New York"}, ":active":{"BOOL":"true"}}
Expression Attribute Names text box When you select this text box, it invokes the Query Editor. Enter an expression to specify one or more substitution tokens for attribute names in an expression. {"#P":"Percentile"} where #P is the attribute substitution and Percentile is the attribute
Projection Expression text box When you select this text box, it invokes the Query Editor. Enter a string that identifies one or more attributes (columns) to retrieve from the specified table or index. Enter the names of the attributes separated by commas. ProductCategory, Description, Price
Timestamp and Date Columns text box When you select this text box, it invokes the Query Editor. Enter a JSON-formatted string to describe the date and timestamp columns that the table may have. {"OrderDate": {"type": "timestamp", "format": "epoch-milliseconds", "timezone": "-05:00"}}
Incremental toggle Enable the incremental load configuration for the schema table
Incremental Column drop down list Enable Incremental to configure this property. Select the column to use for incremental loading. You can select from only the columns that you have defined in the Timestamp and Date Columns.
Number of Workers text box The number of threads to load the table’s data in parallel The default is 5.
Maximum Number of Items per Worker text box The maximum number of items to load from this table This is similar to the SQL limit clause. Leave this property blank to retrieve all items.
Sample Size text box The number of items (rows) to sample from the table while discovering the table schema The default is 5000. Enter -1 to include all items.
Support Multisource Tables toggle Enable this option if you are planning to use this DynamoDB data set with other data sets for the same table Disabling this option will make the table faster to load.
Callback toggle Enable post extraction callback, that is, enable callback on the data source data set(s) by invoking a certain callback URL with parameters containing details about the load job.
Callback URL text box Enable Callback to configure this property. Specify the callback URL.

For more information about the table properties, see Additional Considerations.

View the schema diagram with the Schema Diagram Viewer

Here are the steps to view the schema diagram using the Schema Diagram Viewer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the DynamoDB schema.
  • In the Schema Designer, in the Action bar, select Diagram.
Note

Only joins that you manually create appear on the diagram as there are no joins automatically created between the schema tables.

Load the schema

Here are the steps to perform a Full Load of the DynamoDB schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the DynamoDB schema.
  • In the Schema Designer, in the Action bar, select LoadLoad NowFull.
  • To review the load status, in Last Load Status, select the date.

Explore the schema

With the full load of the DynamoDB schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.

To open the Analyzer from the schema, follow these steps:

  • In the Navigation bar, select Schema.
  • In the Schema Manager, in the List view, select the DynamoDB schema.
  • In the Schema Designer, in the Action bar, select Explore Data.

For more information about how to use the Analyzer to create insights, see Analyzer and Visualizations.

Additional Considerations

Filter Expressions

  • A filter expression is applied after a query finishes, but before the results are returned. Therefore, a query consumes the same amount of read capacity, regardless of whether a filter expression is present.
  • A Query operation can retrieve a maximum of 1 MB of data. This limit applies before the filter expression is evaluated.
  • In the table properties, in the filter expression, you can reference primary key or sort key attributes.
  • The syntax of a filter expression consists of the attribute (column or field), the operator or function, and the pointer to or placeholder of the attribute values that you define in the Expression Attribute Values.

The following is an example of a filter expression:

Price >= :num and ProductStatus IN (:avail, :back, :disc)

  • Price and ProductStatus are the columns or attributes
  • >=, and and IN are the operators and functions in the expression
  • :num, :avail, :back, :disc are the pointers to the expression attribute values

For more information, refer to Working with Queries in DynamoDB.

Expression Attribute Values

If you need to compare an attribute with a value, define an expression attribute value as a placeholder. Expression attribute values in AWS DynamoDB are substitutes for the actual values that you want to compare. An expression attribute value must begin with a colon : and be followed by one or more alphanumeric characters.

Examples of expression attribute values:

{":num":{"N":"200"}, ":avail":{"S":"Available"}, ":active":{"BOOL":"true"}}

  • :num, :avail and active are pointers to the attribute values.
  • N, S and BOOL are the data types of the attribute values, which are number, string, and boolean, respectively. For more information, see Supported Data Types.
  • 200, Available, and true are the dereferenced attribute values.

You can then use these expression attribute values in an expression, for example, ProductStatus IN (:avail, :back, :disc).

Expression Attribute Names

An expression attribute name is a placeholder that you use in an AWS DynamoDB expression as an alternative to an actual attribute name. An expression attribute name must begin with a pound sign # and be followed by one or more alphanumeric characters.

The following are some use cases for using Expression Attribute Names:

  • To access an attribute whose name conflicts with a DynamoDB reserved word.
  • To create a placeholder for repeating occurrences of an attribute name in an expression.
  • To prevent special characters in an attribute name from being misinterpreted in an expression.

{"#P":"Percentile"} and {"#N":"Name"} are examples of expression attribute names:

  • #P and #N are the attribute substitutions.
  • Percentile and Name are the actual attributes.

For more information, see Expression Attribute Names in DynamoDB.

Projection Expressions

A Projection Expression is a string that identifies one or more attributes (columns or fields) to retrieve from the specified table or index. These attributes can include scalars, sets, or elements of a JSON document. The attributes in the expression must be separated by commas. If you do not specify any attributes, all attributes will be returned. If any of the requested attributes are not found, they will not appear in the result. For more information, see Projection Expressions.

Timestamp and Date Columns

Since DynamoDB doesn’t natively support date or timestamp data types, date and timestamp data is represented using either string or number attributes. For more information, see Naming Rules and Data Types.

In order to manipulate date and timestamp data in Incorta, you need to define each date or timestamp column or attribute in the table properties.

For any date or timestamp column, you have to specify the following information:

  • Type: date or timestamp
  • Format:

    • Epoch-seconds
    • Epoch-milliseconds
    • Any Java-compliant timestamp pattern, for example yyyy-MM-dd'T'HH:mm:ss.SSS. For more information, see the Patterns for Formatting and Parsing section on the Class DateTimeFormatter page.
  • Timezone: Optional. Set a valid Java timezone ID. If you don’t set it, it is defaulted to the timezone of the Incorta server. For more information about the timezones, see the of method on the Class ZoneOffset page.

The following is an example of the definition of multiple date and timestamp columns or attributes in the table properties.

{
    "UpdateTime": {
        "type": "timestamp",
        "format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
        "timezone": "-07:00"
    },
    "OrderTime": {
        "type": "timestamp",
        "format": "epoch-milliseconds",
        "timezone": "-07:00"
    },
    "Birthdate": {
        "type": "date",
        "format": "yyyy-MM-dd",
        "timezone": "-07:00"
    }
}

Supported Incremental Loads

You can enable Incremental Load for a DynamoDB table data source. Incremental load will depend upon the maximum value in the Incremental Column you select in the table properties. Make sure you define a key column in the table before using the incremental load; otherwise, when you run an incremental load, both new and updated data will be added to the existing data resulting in duplicate rows.


© Incorta, Inc. All Rights Reserved.