People
Joaquin Vanschoren
Search these datasets in more detail

Joaquin's datasets

This is the famous Australian dataset, retrieved 2014-11-14 from the libSVM site. It was normalized. The original version is from…
0 runs0 likes0 downloads0 reach11 impact
690 instances - 15 features - 2 classes - 0 missing values
Data on tree growth used in the Case Study published in the September, 1995 issue of the Canadian Journal of Statistics. This data set was been provided by Dr. Fernando Camacho, Ontario Hydro…
0 runs0 likes0 downloads0 reach11 impact
2796 instances - 34 features - 6 classes - 68100 missing values
Process delays known as cylinder banding in rotogravure printing were substantially mitigated using control rules discovered by decision tree induction. Attribute Information: > 1. timestamp:…
0 runs0 likes0 downloads0 reach11 impact
540 instances - 38 features - 2 classes - 999 missing values
Data from the Kaggle Amazon Employee Access Challenge: https://www.kaggle.com/c/amazon-employee-access-challenge When an employee at any company starts work, they first need to obtain the computer…
0 runs0 likes0 downloads0 reach11 impact
32769 instances - 10 features - 2 classes - 0 missing values
Data from the Kaggle Bioresponse challenge: https://www.kaggle.com/c/bioresponse The objective of the competition is to help us build as good a model as possible so that we can, as optimally as this…
0 runs0 likes0 downloads0 reach11 impact
3751 instances - 1777 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
0 runs0 likes0 downloads0 reach11 impact
48842 instances - 15 features - 2 classes - 6465 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 0.1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
0 runs0 likes0 downloads0 reach11 impact
39948 instances - 10 features - 2 classes - 0 missing values
This dataset represents a set of possible advertisements on Internet pages. The features encode the geometry of the image (if available) as well as phrases occurring in the URL, the image's URL and…
0 runs0 likes0 downloads0 reach10 impact
Dataset from the MLRR repository: http://axon.cs.byu.edu:5000/
0 runs0 likes0 downloads0 reach11 impact
19020 instances - 11 features - 2 classes - 0 missing values
Dataset from the MLRR repository: http://axon.cs.byu.edu:5000/
0 runs0 likes0 downloads0 reach11 impact
6598 instances - 168 features - 2 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element…
0 runs0 likes0 downloads0 reach11 impact
The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn). Churn (wikipedia…
0 runs0 likes0 downloads0 reach10 impact
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
0 runs0 likes0 downloads0 reach11 impact
1109 instances - 22 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
0 runs0 likes0 downloads0 reach11 impact
2109 instances - 22 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
65 runs0 likes0 downloads0 reach11 impact
522 instances - 22 features - 2 classes - 0 missing values
This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. If you publish material based…
0 runs0 likes0 downloads0 reach11 impact
10885 instances - 22 features - 2 classes - 25 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable,…
0 runs0 likes0 downloads0 reach11 impact
1563 instances - 38 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable,…
0 runs0 likes0 downloads0 reach11 impact
1458 instances - 38 features - 2 classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
0 runs0 likes0 downloads0 reach11 impact
15545 instances - 6 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
0 runs0 likes0 downloads0 reach11 impact
4562 instances - 49 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
0 runs0 likes0 downloads0 reach11 impact
3468 instances - 971 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
0 runs0 likes0 downloads0 reach11 impact
14395 instances - 217 features - 2 classes - 0 missing values
The following are data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices…
65 runs0 likes0 downloads0 reach11 impact
500 instances - 22 features - 15 classes - 0 missing values
PRO FOOTBALL SCORES (raw data appears after the description below) How well do the oddsmakers of Las Vegas predict the outcome of professional football games? Is there really a home field advantage -…
0 runs0 likes0 downloads0 reach11 impact
672 instances - 10 features - 2 classes - 1200 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach11 impact
797 instances - 5 features - 6 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach11 impact
841 instances - 71 features - 4 classes - 0 missing values
Irish Educational Transitions Data Below are shown data on educational transitions for a sample of 500 Irish schoolchildren aged 11 in 1967. The data were collected by Greaney and Kelleghan (1984),…
0 runs0 likes0 downloads0 reach11 impact
500 instances - 6 features - 2 classes - 32 missing values
This data consists of synthetically generated control charts. This dataset contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos (1999). There are…
0 runs0 likes0 downloads0 reach11 impact
600 instances - 61 features - 6 classes - 0 missing values
This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. The data was collected for examining our newly developed classifier for multidimensional curves…
0 runs0 likes0 downloads0 reach11 impact
9961 instances - 15 features - 9 classes - 0 missing values
This file contains 9 sets of sanitized user data drawn from the command histories of 8 UNIX computer users at Purdue over the course of up to 2 years (USER0 and USER1 were generated by the same…
0 runs0 likes0 downloads0 reach10 impact
9100 instances - 3 features - 9 classes - 0 missing values
The Monk's Problems: Problem 3 This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks.…
0 runs0 likes0 downloads0 reach11 impact
554 instances - 7 features - 2 classes - 0 missing values
The Monk's Problems: Problem 2 This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks.…
0 runs0 likes0 downloads0 reach11 impact
601 instances - 7 features - 2 classes - 0 missing values
The Monk's Problems: Problem 1 This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks.…
0 runs0 likes0 downloads0 reach12 impact
556 instances - 7 features - 2 classes - 0 missing values
In my work on context-sensitive learning, I used the "Deterding Vowel Recognition Data", but I found it necessary to reformulate the data. Implicit in the original data is contextual information on…
0 runs0 likes0 downloads0 reach11 impact
990 instances - 13 features - 11 classes - 0 missing values