public interface DataSet
A data set represents an extraction unit for a single table. It is the Incorta equivalent of a JDBC prepared statement.
A data set is used during data extraction to actually extract data records from the data source.
A data set is also used during manual schema editing for discovering table columns.
Modifier and Type | Method and Description |
---|---|
void |
cancelQuery()
This method is used during data extraction if the job is canceled while
queryData(...) or queryDataUpdates(...) is running. |
com.incorta.io.Record.ColumnDef[] |
discover()
This method is used during manual schema editing to discover table columns.
|
default long |
getTimeDifference()
This method should only be implemented when there is a data discrepancy caused by time zone difference.
|
java.util.List<com.incorta.io.Record.RecordSet> |
queryData(com.incorta.io.Record.ColumnDef[] columns)
This method is used during data extraction, during a full load job.
|
java.util.List<com.incorta.io.Record.RecordSet> |
queryDataUpdates(com.incorta.io.Record.ColumnDef[] columns,
long lastUpdated)
This method is used during data extraction, during an incremental load job.
|
com.incorta.io.Record.ColumnDef[] discover() throws ConnectorException
Record.ColumnDef
objectsConnectorException
java.util.List<com.incorta.io.Record.RecordSet> queryData(com.incorta.io.Record.ColumnDef[] columns) throws ConnectorException
Record.RecordSet
objects.
If this connector supports parallel extraction, it should return one record set for each parallel extraction thread.
In a typical (sequential) data set implementation, the returned list should contain a single record set.columns
- Columns to be queriedRecord.RecordSet
objects, typically containing just oneConnectorException
java.util.List<com.incorta.io.Record.RecordSet> queryDataUpdates(com.incorta.io.Record.ColumnDef[] columns, long lastUpdated) throws ConnectorException
Record.RecordSet
objects.
If this connector supports parallel extraction, it should return one record set for each parallel extraction thread.
In a typical (sequential) data set implementation, the returned list should contain a single record set.columns
- Columns to be queriedlastUpdated
- The timestamp (as a Unix epoch in milliseconds) of the previous successful extractionRecord.RecordSet
objects, typically containing just oneConnectorException
void cancelQuery() throws ConnectorException
queryData(...)
or queryDataUpdates(...)
is running.
It is used to notify the DataSet
object to terminate the running query if possible.ConnectorException
default long getTimeDifference() throws ConnectorException
This method should only be implemented when there is a data discrepancy caused by time zone difference. This can happen sometimes when there is a time zone difference between the machine hosting Incorta and the machine hosting the data being extracted. The purpose of this method is to return a time offset (in milliseconds).
Note that a time zone difference does not automatically mean that a data discrepancy will occur or that this method must be implemented. This is only the case when the driver used to connect to the data source is unable to handle this time zone difference correctly (e.g. Oracle database when the incremental column type is DATE or TIMESTAMP instead of TIMESTAMP WITH TIME ZONE).
By default, this method returns 0.
ConnectorException