Supervised Classification

In supervised classification, you are given an input dataset in which instances are labeled with a certain class. The goal is to build a model that predicts the class for future unlabeled instances. The model is evaluated using a train-test procedure, e.g. cross-validation.

To make results by different users comparable, you are given the exact train-test folds to be used, and you need to return at least the predictions generated by your model for each of the test instances. OpenML will use these predictions to calculate a range of evaluation measures on the server.

You can also upload your own evaluation measures, provided that the code for doing so is available from the implementation used. For extremely large datasets, it may be infeasible to upload all predictions. In those cases, you need to compute and provide the evaluations yourself.

Optionally, you can upload the model trained on all the input data. There is no restriction on the file format, but please use a well-known format or PMML.

Inputs

cost_matrix	CostMatrix	A matrix describing the cost of miss-classifications per type.	optional
estimation_procedure	Estimation Procedure	The estimation procedure used to validate the generated models	required
evaluation_measures	String	The evaluation measures to optimize for, e.g., cpu time, accuracy	optional
source_data	Dataset	The input data for this task	required
target_feature	String	The name of the dataset feature to be used as the target feature.	required

Outputs

evaluations	KeyValue	A list of user-defined evaluations of the task as key-value pairs.	optional
model	File	A file containing the model built on all the input data.	optional
predictions	Predictions	The desired output format	optional

Attribution

Author(s)	Joaquin Vanschoren, Jan van Rijn, Luis Torgo, Bernd Bischl
Contributor(s)	Bo Gao, Simon Fischer, Venkatesh Umaashankar, Michael Berthold, Bernd Wiswedel ,Patrick Winter

Sign in

Supervised Classification

Inputs

Outputs

Attribution