Row Blocks
Arithmetic Operation
This Block allows you to run any Python arithmetic expressions or functions like sin(), etc on your data.
Arithmetic Expression
[
{
"outputColumn": "ID_new",
"outputType": "FLOAT",
"onErrorDefaultValue": 0,
"expression": "np.sin(column('ID'))"
}
]
Constant Key
{
"key1": "val1",
"key2": "val2"
}
Data Join
This Block takes 2 data inputs and joins them based on Given columns. Current Version supports join only on single columns. Join type can be inner, outer, left or right.
Join Paramerters
{
"firstDatasourceColumnName": "id",
"secondDatasourceColumnName": "id",
"targetColumnName": "CREDIT_SCORE_JOINED",
"joinType": "left"
}
Date Operation
This Block performs operations on dates and pushes to queue.
Date Expression
[
{
"outputColumn": "Date_New",
"outputType": "DATE",
"onErrorDefaultValue": 0,
"expression": "change_format('DATE1', '%y')"
}
]
Drop Duplicates
This Block allows you to drop dubplicate rows.
Drop Parameters
{
"columns": [
"<column_name>"
]
}
Filter Records
This Block allows you to select only a set of rows from your data based on the matching expression provided by you.
Filter Expression
[
{
"expression": "column('id') > 4",
"outputTopicAliasName": "filter_op2"
}
]
{
"key1": "val1",
"key2": "val2"
}
Impute Missing Values
This Block allows you to replace missing values such as null, ?, blank etc. or any user-defined missing value for example <null>
, with custom value of your choice or with some value inferred from a previous Block.
Missing Value Parameters
[
{
"column": "ID",
"replaceValue": "23"
}
]
Constant Key
{
"key1": "val1",
"key2": "val2"
}
Merge
Reads data from multiple datastream and write to single datastream.
Data Source Parameters
[
{
"queueTopicName": "a3422ed4-85af-46d2-aa2f-851ff9c186a2"
},
{
"queueTopicName": "2aa1f73d-c6f0-4810-89b4-d11030e53334"
}
]
Merge Parameters
{
"MergeTopicsInSequence": true
}
Normalize
This Block will noramlize the column values with the type you choose. Type can be ZSCORE
or MIN_MAX
Normalization Parameters
{
"type": "ZSCORE",
"columns": [
"MOVIEID",
"TITLE"
]
}
Aggregates Parameters
{
"AVERAGE": {
"MOVIEID": {
"result": 2
},
"TITLE": {
"result": 2
}
},
"STD_DEV": {
"MOVIEID": {
"result": 4.031128874149275
},
"TITLE": {
"result": 2
}
}
}
One Hot Encode
This Block will one hot encode a list of columns you mention from your data.
Encode Details
{
"columnList": [
"gender"
],
"stopOnLimit": 1000
}
Output Transformer
This Block allows you to Randomize the given data.
Transformer Parameters
{
"expression": "key('input[0]')"
}
Randomized Splits
This Block allows you to Randomize the given data into given no of splits.
Randomization Parameters
{
"random_seed": null,
"no_of_splits": 1
}
Schema Modifier
This Block allows you to modify the schema of your data with operations like drop, rename or datatype change of columns.
Schema
{
"DROP_COLUMN": [
"col1",
"id"
],
"SELECT": [
"CREDIT_SCORE"
],
"RENAME_COLUMN": [
{
"oldColName": "col1",
"newColName": "hello"
},
{
"oldColName": "name",
"newColName": "world"
}
],
"UPDATE_DATATYPE": [
{
"colName": "ID",
"dataType": "STRING"
},
{
"colName": "gender",
"dataType": "str"
}
]
}
Text Operation
This Block allows you to apply Python String functions on a column in your data.
Text Expression
[
{
"outputColumn": "ID_new",
"outputType": "STRING",
"onErrorDefaultValue": 0,
"expression": "str(column('input_column'))"
}
]