Importing and Loading Data
When you bring data into Incorta, there are two steps:
- Import data (also called extracting data) from your data source.
- Load data to write the data into memory.
You can extract data from your data sources using these strategies:
- All at once
- Incremental data extracts using a chunking strategy
If you use a chunking strategy, Incorta uses the size of the chunks to extract large tables. Incorta determines the number of chunks based on the number of available threads that can run in parallel (this setting is in the Cluster Management Console, or CMC).
Incremental data extracts from source tables, especially from replicated source systems, can result in certain transactions being updated in source systems after the incremental extract in Incorta is complete. If this happens, new incremental loads can miss these records. Choose the desired timestamp column for incremental extracts. If you perform an incremental extract using the desired timestamp column, Incorta extracts records based on the maximum value of the incremental timestamp column. By default, incremental extracts are based on the last successful extract time for a table.
- Incremental column support is currently limited to timestamp columns only.
- Support in the first phase is limited to Oracle and mySQL databases.
After you load data, you can create schema and joins and visualize your data.
When you load data into Incorta, you can choose one of the following strategies:
- Full load to load all the data at once. This option overwrites any data with the same name in Incorta. You can disable this option so you do not accidentally overwrite existing data.
- Incremental load to add only new data to Incorta. Use this option to avoid overwriting existing data.
You can schedule loads, upload data manually, or write a Python script to import and load data from a source you specify.
When you run an incremental load, you must specify which column defines the incremental timestamp. Incorta extracts records based on the maximum value of the incremental timestamp column to determine when the last schema loaded.