Data Connectors
Source Connectors
AWS S3 Reader
Reads from s3 and writes to a datastream.
Connection Parameters
{
"accessKey": "ACCESS_KEY_HERE",
"secretKey": "SECRET_KEY_HERE"
}
Data Source Parameters
{
"fileWithFullPath": "s3://path/to/file.csv",
"delimiter": ",",
"ignoreAndContinue": true,
"outputTopicAliasName": "topicName",
"calculateColumnSize": true,
"maxReadLines": 100,
"hasHeader": true
}
Binary Folder - Google Drive
This source connector will read Binary files inside a Google Drive folder.
Data Source Parameters
{
"folderId": "<folder_id>",
"maxFilesToRead": 10,
"fileNameRegex": ""
}
Output Parameters
{
"dataOutputColumn": "file_data",
"filePathColumn": "filePath",
"outputTopicAliasName": "drive_data"
}
CSV - HDFS
This source connector will read a CSV file from a HDFS instance of your choice.
Data Source Parameters
{
"fileWithFullPath": "/path/to/file.csv",
"delimiter": ",",
"ignoreAndContinue": true,
"outputTopicAliasName": "topicName",
"calculateColumnSize": true,
"maxReadLines": 100,
"hasHeader": true
}
Connection Parameters
{
"port": "50070",
"hostName": "localhost"
}
CSV - My Space
This source connector will read a CSV file from the My Space of your Workspace.
Data Source Parameters:
{
"fileName": "file_name.csv",
"delimiter": ",",
"ignoreAndContinue": true,
"outputTopicAliasName": "topicName",
"maxReadLines": 100,
"calculateColumnSize": true,
"hasHeader": true
}
CSV - Community Space
This source connector will read a CSV file made public in the Community of your Workspace.
User Details
{
"emailId": "username@domain.com"
}
Data Source Parameters:
{
"fileName": "filename.csv",
"maxReadLines": 100,
"delimiter": ",",
"ignoreAndContinue": true,
"outputTopicAliasName": "topicName",
"calculateColumnSize": true,
"hasHeader": true
}
CSV/Sheet - Google Drive
This source connector will read a Google Spread sheet or CSV file from Google Drive.
Connection Parameters
{
"email": "username@domain.com"
}
Data Source Parameters:
{
"fileId": "<file_id>",
"hasHeader": true,
"outputTopicAliasName": "gSheetData",
"ignoreAndContinue": true,
"calculateColumnSize": true,
"sheet": "sheet_number",
"delimiter": ","
}
MySQL Reader
Reads data from MySQL and pushes it to a datastream.
Data Extraction Query Parameters
{
"query": "select * from TestTable;",
"maxReadLines": 100
}
Connection Parameters
{
"host": "localhost",
"port": "3306",
"username": "root",
"password": "root",
"databaseName": "Test",
"ssl": "false"
}
Structured Data Generator
Generates random data and pushes it to a data stream. Allow only discrete in allowed_values as ["1-100"]. expressions can be specified as in_range(low, high) for INTEGER & FLOAT.
Data Source Parameters:
{
"maxReadLines": 1000,
"seed": 1234
}
Output Configuration
{
"outputTopicAliasName": "topicName"
}
Column Configuration
{
"columns": [
{
"name": "COLUMN_1",
"type": "INTEGER",
"allowed_values": "",
"expression": "in_range(1, 25)"
},
{
"name": "COLUMN_2",
"type": "FLOAT",
"allowed_values": [
1.2,
3.4
],
"expression": "in_range(1.2, 3.4)"
},
{
"name": "COLUMN_3",
"type": "STRING",
"allowed_values": [
"MALE",
"FEMALE"
]
}
]
}