Databricks ML Evaluator (deprecated)
Supported pipeline types:
|
With the Databricks ML Evaluator processor, you can create pipelines that produce data-driven insights in real time. For example, you can design pipelines that detect fraudulent transactions or that perform natural language processing as data passes through the pipeline.
To use the Databricks ML Evaluator processor, you first build and train the model with Apache Spark MLlib. You then export the trained model with Databricks ML Model Export and save the exported model directory on the Data Collector machine that runs the pipeline.
When you configure the Databricks ML Evaluator processor, you specify the path to the exported model saved on the Data Collector machine. You also specify the root field in the input data to send the model, the output columns to return from the model, and the record field to store the model output.
Prerequisites
- Build and train a machine learning model with Apache Spark MLlib.
- Export the trained model with Databricks ML Model Export. For more information, see the Databricks documentation.
- Save the exported directory on the Data Collector machine that runs the pipeline. StreamSets recommends storing the model directory in the Data Collector resources directory, $SDC_RESOURCES.
Databricks Model as a Microservice
External clients can use a model exported with Databricks ML Model Export to perform computations when you include a Databricks ML Evaluator processor in a microservice pipeline.
For example, in the following microservice pipeline, a REST API client sends a request with input data to the REST Service origin. The Databricks ML Evaluator processor uses a machine learning model to generate predictions from the data. The processor passes records that contain the model's predictions to the Send Response to Origin destination, labeled Send Predictions, which sends the records back to the REST Service origin. The origin then transmits JSON-formatted responses back to the originating REST API client.
Example: Ground Cover Model
For example, suppose you use Apache Spark MLlib to build and train a model that predicts ground cover in a forest, and then you export the model with Databricks ML Model Export. The model predicts the ground cover based on inputs about soil types, topography, and tree coverage.
{
"origLabel": -1.0,
"features": {
"type": 0,
"size": 13,
"indices": [0,2,3,4,6,7,8,9,10,11,12],
"values": [74.0,2.0,120.0,269.0,2.0,121.0,1.0,0.2,1.0,1.0,3.0]
}
}
Label | Prediction | Probability |
---|---|---|
Moss | 0 | 0 – 0.86 1 – 0.14 |
To include this model in a pipeline, save the model on the Data Collector machine, add the Databricks ML Evaluator processor to the pipeline, and then configure the processor to use the saved model, to read the needed input, and to include the generated output columns in a field in the record. The following image shows the processor configuration: