Connectors → File System

About the File System connector

A file system is a way of organizing information on an electronic storage device such as a computer hard drive. In this sense, a file system stores not only files, but also information about each file such as the name, size, type, location within a directory hierarchy, and additional attributes.

For Incorta, the term Shared Storage best characterizes the various usages of a file system for a given tenant in an Incorta cluster. To learn more, please review Concepts → Shared Storage.

About the Data Manager and the LocalFiles data source

Using the Data Manager, a user can upload one or more data files and folders to shared storage. An uploaded file is a local data file and an uploaded folder is a local data folder.

Incorta exposes the local data files and local data folders as the LocalFiles data source.

With the File System connector, you can connect to the LocalFiles data source. The File System connector supports the following file extensions:

File Extension File Type Example Notes
.csv comma separated values sales.csv Can contain a header row
.tsv tab separated values sales.tsv Can contain a header row
.tab tab separated values sales.tsv Can contain a header row
.txt custom delimiter for separated values sales.txt Can contain a header row
.xlsx Microsoft Excel 2000 and above sales.xlsx Must be .xslx.
Supports Worksheet selection.

You can access all folders and files that you own and any folders or files that someone shares with you.

The File System connector supports the following Incorta specific functionality:

Feature Supported
Directory with Union All
Encryption at Ingest
Incremental Loading
Multi-Source
Performance Optimization
Single-Source
Webhook Callbacks


Steps to use File System connector

Here are the high level steps, tools, and procedures to use the File System connector with the LocalFiles data source:

Upload one or more data files and folders, including subfolders and files

A folder can contain zero or more files with zero or more subfolders. Incorta preserves the hierarchy of folders. Incorta only uploads files with the following supported file extensions. After upload, Incorta will unzip compressed folders and files.

Here are the steps to create and one or more local data folders and/or local data files, including subfolders and files:

  • In the Navigation bar, select Data.
  • In the Action bar, select + NewAdd Data Source.
  • In the Choose a Data Source dialog, in Data Files, select Upload Data Folder.
  • In the Upload Data Folder dialog, in Upload Options, optionally select Overwrite existing file.
  • Drag and drop one or more files or parent folders to the Upload Data Folder dialog.
Note

The Upload Data Folder option and dialog allow you to upload both data files and folders.

Create a schema with the Schema Wizard

Here are the steps to create a File System schema with the Schema Wizard:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + New → Schema Wizard
  • In (1) Choose a Source, specify the following:

    • For Enter a name, enter the schema name.
    • For Select a Datasource, select LocalFiles.
    • Optionally create a description.
  • In the Schema Wizard footer, select Next.
  • In (2) Manage Tables, in the Data panel, navigate the directory tree as necessary to select your folder, file, or if an .xlsx file, select a worksheet.
  • In the Schema Wizard footer, select Next.
  • In (3) Finalize, in the Schema Wizard footer, select Create Schema.

Create a schema with the Schema Designer

Here are the steps to create a File System schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + New → Create Schema.
  • In Name, specify the schema name, and select Save.
  • In Start adding tables to your schema, select File System.
  • In the Data Source dialog, specify the various properties table data source properties.
  • Select Add.
  • In the Table Editor, in the Table Summary section, enter the table name.
  • To save your changes, select Done in the Action bar.

File System table data source properties

You can specify a single file or folder in the Data Source dialog. Both the Schema Designer and Table Editor represent a single file and folder data source as a single-source table. In order to select a folder, you must enable Union Files.

Note

This release has limited support for Union Files for Excel (.xlsx) Workbook files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties.

Common properties for a local data file and a local data folder

Here are some of the common properties for both the selection a file and a folder:

Property Control Description
Type drop down list Default is File System
Data Source drop down list Select LocalFiles
File Type drop down list Select the Text (.csv, .tsv, .tab, .txt) or Excel (.xslx)
Has Header? toggle Select if first row contains column header values
Callback toggle Enables the Callback URL field
Callback URL text box This property appears when the Callback toggle is enabled. Specify the URL.

Common local data file properties

Here are some of the common properties specifically related to selecting a file of either type Text (.csv, .tsv, .tab, .txt) or Excel (.xslx):

Property Control Description
Incremental toggle Enable to support incremental loading. For a single file, you must specify both a File and Update file.
File button Select a file opens the Add File from Local dialog. The dialog shows the files from your local data files and local data folders in shared storage. Select a single file and select Add.
Update File button With Incremental enabled, Update File is available. Select a file opens the Add File from Local dialog. The dialog shows the files from your local data files and local data folders in shared storage. Select a single file and select Add
Note

With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.

Properties for an Excel Workbook file

Here are the specific properties for an Excel Workbook (.xlsx) file:

Property Control Description
Worksheet drop down list Select a given worksheet for the Excel Workbook

Properties for a Text file

Here are the properties specific to a Text (.csv, .tsv, .tab, .txt) file:

Property Control Description
Date Format drop down list Select a specific format for date columns. Date formats are Java date format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Timestamp Format drop down list Select a specific format for timestamp columns. Timestamp formats are Java data and time format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Character Set drop down list Select a supported character set.
Separator drop down list Available when the selected File Type is Text. Specify a separator for columns in the row values. Comma and Tab are standard delimiters. Other requires that you specify a value such as |.
Other text box Available when the Separator is Other. Enter one or more characters to specify the column separator or delimiter between values in a row.
Enable Chunking toggle Enable for large file sizes
Chunk Size (MB) text box Enter a value in megabytes (MB) to specify the chunk size
Enable Spark Based Extraction toggle Configure Apache Spark to parallelize the ingest of the file
Max Number of Parallel File Extractors text box Enter the a value for the number of Extractors which typically reflects up to the number of available cores.
Memory Per Extractor text box Enter a value for memory in Gigabytes. This is typically the amount of dedicated memory divided by the number of available cores.

Common folder properties

Folder properties are available when you enable Union Files. It is not possible to select a parent folder.

Here are the properties specifically related to selecting a folder:

Property Control Description
Incremental toggle Enable to support incremental loading
Union Files toggle Enable to select all files within a given folder. When enabled, you will only be able to select a folder from LocalFiles.
Directory button Select a folder from your LocalFiles. It is not possible to select a parent folder.
Include text box Enter a keyword with a wildcard * symbol to include specific named files within the folder
Exclude text box Enter a keyword with a wildcard * symbol to exclude specific named files within the folder
Include Sub-Directories Files toggle Enable to include files from sub-folders
Add Filename as a column toggle Enable to add the filename of the file as a column. You will then need to specify a column name.
Filename column text box Enter a column name for the filename such as source_file_name
Note

With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.

Folder properties for Excel Workbook files
Important

This release has limited support for Union Files for Excel Workbook (.xlsx) files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties. For this reason, each Excel Workbook file in the selected folder must have a common Worksheet tab name. You must select this common Worksheet name in the drop down list.

Here are the properties specifically related to selecting a folder with a file type as Excel Workbook (.xlsx) files:

Property Control Description
Worksheet drop down list Select a tab for a worksheet

View the schema diagram with the Schema Diagram Viewer

Here are the steps to view the schema diagram using the Schema Diagram Viewer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the File System schema.
  • In the Schema Designer, in the Action bar, select Diagram.

Load the schema

Here are the steps to perform a Full Load of the File System schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the File System schema.
  • In the Schema Designer, in the Action bar, select Load → Load Now → Full.
  • To review the load status, in Last Load Status, select the date.

Explore the schema

With the full load of the File System schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.

To open the Analyzer from the schema, follow these steps:

  • In the Navigation bar, select Schema.
  • In the Schema Manager, in the List view, select the File System schema.
  • In the Schema Designer, in the Action bar, select Explore Data.

© Incorta, Inc. All Rights Reserved.