Data Connectors

Source Connectors

AWS S3 Reader

Reads from s3 and writes to a datastream.

Connection Parameters

{
  "accessKey": "ACCESS_KEY_HERE",
  "secretKey": "SECRET_KEY_HERE"
}

Data Source Parameters

{
  "fileWithFullPath": "s3://path/to/file.csv",
  "delimiter": ",",
  "ignoreAndContinue": true,
  "outputTopicAliasName": "topicName",
  "calculateColumnSize": true,
  "maxReadLines": 100,
  "hasHeader": true
}

Binary Folder - Google Drive

This source connector will read Binary files inside a Google Drive folder.

Data Source Parameters

{
  "folderId": "<folder_id>",
  "maxFilesToRead": 10,
  "fileNameRegex": ""
}

Output Parameters

{
  "dataOutputColumn": "file_data",
  "filePathColumn": "filePath",
  "outputTopicAliasName": "drive_data"
}

CSV - HDFS

This source connector will read a CSV file from a HDFS instance of your choice.

Data Source Parameters

{
  "fileWithFullPath": "/path/to/file.csv",
  "delimiter": ",",
  "ignoreAndContinue": true,
  "outputTopicAliasName": "topicName",
  "calculateColumnSize": true,
  "maxReadLines": 100,
  "hasHeader": true
}

Connection Parameters

{
  "port": "50070",
  "hostName": "localhost"
}

CSV - My Space

This source connector will read a CSV file from the My Space of your Workspace.

Data Source Parameters:

{
  "fileName": "file_name.csv",
  "delimiter": ",",
  "ignoreAndContinue": true,
  "outputTopicAliasName": "topicName",
  "maxReadLines": 100,
  "calculateColumnSize": true,
  "hasHeader": true
}

CSV - Community Space

This source connector will read a CSV file made public in the Community of your Workspace.

User Details

{
  "emailId": "username@domain.com"
}

Data Source Parameters:

{
  "fileName": "filename.csv",
  "maxReadLines": 100,
  "delimiter": ",",
  "ignoreAndContinue": true,
  "outputTopicAliasName": "topicName",
  "calculateColumnSize": true,
  "hasHeader": true
}

CSV/Sheet - Google Drive

This source connector will read a Google Spread sheet or CSV file from Google Drive.

Connection Parameters

{
  "email": "username@domain.com"
}

Data Source Parameters:

{
  "fileId": "<file_id>",
  "hasHeader": true,
  "outputTopicAliasName": "gSheetData",
  "ignoreAndContinue": true,
  "calculateColumnSize": true,
  "sheet": "sheet_number",
  "delimiter": ","
}

MySQL Reader

Reads data from MySQL and pushes it to a datastream.

Data Extraction Query Parameters

{
  "query": "select * from TestTable;",
  "maxReadLines": 100
}

Connection Parameters

{
  "host": "localhost",
  "port": "3306",
  "username": "root",
  "password": "root",
  "databaseName": "Test",
  "ssl": "false"
}

Structured Data Generator

Generates random data and pushes it to a data stream. Allow only discrete in allowed_values as ["1-100"]. expressions can be specified as in_range(low, high) for INTEGER & FLOAT.

Data Source Parameters:

{
  "maxReadLines": 1000,
  "seed": 1234
}

Output Configuration

{
  "outputTopicAliasName": "topicName"
}

Column Configuration

{
  "columns": [
    {
      "name": "COLUMN_1",
      "type": "INTEGER",
      "allowed_values": "",
      "expression": "in_range(1, 25)"
    },
    {
      "name": "COLUMN_2",
      "type": "FLOAT",
      "allowed_values": [
        1.2,
        3.4
      ],
      "expression": "in_range(1.2, 3.4)"
    },
    {
      "name": "COLUMN_3",
      "type": "STRING",
      "allowed_values": [
        "MALE",
        "FEMALE"
      ]
    }
  ]
}