Row Blocks

Arithmetic Operation

This Block allows you to run any Python arithmetic expressions or functions like sin(), etc on your data.

Arithmetic Expression

[
  {
    "outputColumn": "ID_new",
    "outputType": "FLOAT",
    "onErrorDefaultValue": 0,
    "expression": "np.sin(column('ID'))"
  }
]

Constant Key

{
  "key1": "val1",
  "key2": "val2"
}

Data Join

This Block takes 2 data inputs and joins them based on Given columns. Current Version supports join only on single columns. Join type can be inner, outer, left or right.

Join Paramerters

{
  "firstDatasourceColumnName": "id",
  "secondDatasourceColumnName": "id",
  "targetColumnName": "CREDIT_SCORE_JOINED",
  "joinType": "left"
}

Date Operation

This Block performs operations on dates and pushes to queue.

Date Expression

[
  {
    "outputColumn": "Date_New",
    "outputType": "DATE",
    "onErrorDefaultValue": 0,
    "expression": "change_format('DATE1', '%y')"
  }
]

Drop Duplicates

This Block allows you to drop dubplicate rows.

Drop Parameters

{
  "columns": [
    "<column_name>"
  ]
}

Filter Records

This Block allows you to select only a set of rows from your data based on the matching expression provided by you.

Filter Expression

[
  {
    "expression": "column('id') > 4",
    "outputTopicAliasName": "filter_op2"
  }
]

{
  "key1": "val1",
  "key2": "val2"
}

Impute Missing Values

This Block allows you to replace missing values such as null, ?, blank etc. or any user-defined missing value for example <null>, with custom value of your choice or with some value inferred from a previous Block.

Missing Value Parameters

[
  {
    "column": "ID",
    "replaceValue": "23"
  }
]

Constant Key

{
  "key1": "val1",
  "key2": "val2"
}

Merge

Reads data from multiple datastream and write to single datastream.

Data Source Parameters

[
  {
    "queueTopicName": "a3422ed4-85af-46d2-aa2f-851ff9c186a2"
  },
  {
    "queueTopicName": "2aa1f73d-c6f0-4810-89b4-d11030e53334"
  }
]

Merge Parameters

{
  "MergeTopicsInSequence": true
}

Normalize

This Block will noramlize the column values with the type you choose. Type can be ZSCORE or MIN_MAX

Normalization Parameters

{
  "type": "ZSCORE",
  "columns": [
    "MOVIEID",
    "TITLE"
  ]
}

Aggregates Parameters

{
  "AVERAGE": {
    "MOVIEID": {
      "result": 2
    },
    "TITLE": {
      "result": 2
    }
  },
  "STD_DEV": {
    "MOVIEID": {
      "result": 4.031128874149275
    },
    "TITLE": {
      "result": 2
    }
  }
}

One Hot Encode

This Block will one hot encode a list of columns you mention from your data.

Encode Details

{
  "columnList": [
    "gender"
  ],
  "stopOnLimit": 1000
}

Output Transformer

This Block allows you to Randomize the given data.

Transformer Parameters

{
  "expression": "key('input[0]')"
}

Randomized Splits

This Block allows you to Randomize the given data into given no of splits.

Randomization Parameters

{
  "random_seed": null,
  "no_of_splits": 1
}

Schema Modifier

This Block allows you to modify the schema of your data with operations like drop, rename or datatype change of columns.

Schema

{
  "DROP_COLUMN": [
    "col1",
    "id"
  ],
  "SELECT": [
    "CREDIT_SCORE"
  ],
  "RENAME_COLUMN": [
    {
      "oldColName": "col1",
      "newColName": "hello"
    },
    {
      "oldColName": "name",
      "newColName": "world"
    }
  ],
  "UPDATE_DATATYPE": [
    {
      "colName": "ID",
      "dataType": "STRING"
    },
    {
      "colName": "gender",
      "dataType": "str"
    }
  ]
}

Text Operation

This Block allows you to apply Python String functions on a column in your data.

Text Expression

[
  {
    "outputColumn": "ID_new",
    "outputType": "STRING",
    "onErrorDefaultValue": 0,
    "expression": "str(column('input_column'))"
  }
]

# Row Blocks

# Arithmetic Operation

# Data Join

# Date Operation

# Drop Duplicates

# Filter Records

# Impute Missing Values

# Merge

# Normalize

# One Hot Encode

# Output Transformer

# Randomized Splits

# Schema Modifier

# Text Operation