OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

optdigits

active ARFF Publicly available Visibility: public Uploaded 06-04-2014 by Jan van Rijn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Please cite: 1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr July 1998 3. Past Usage: C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their Applications to Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University. E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika, to appear. ftp://ftp.icsi.berkeley.edu/pub/ai/ethem/kyb.ps.Z 4. Relevant Information: We used preprocessing programs made available by NIST to extract normalized bitmaps of handwritten digits from a preprinted form. From a total of 43 people, 30 contributed to the training set and different 13 to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of 4x4 and the number of on pixels are counted in each block. This generates an input matrix of 8x8 where each element is an integer in the range 0..16. This reduces dimensionality and gives invariance to small distortions. For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G. T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C. L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469, 1994. 5. Number of Instances optdigits.tra Training 3823 optdigits.tes Testing 1797 The way we used the dataset was to use half of training for actual training, one-fourth for validation and one-fourth for writer-dependent testing. The test set was used for writer-independent testing and is the actual quality measure. 6. Number of Attributes 64 input+1 class attribute 7. For Each Attribute: All input attributes are integers in the range 0..16. The last attribute is the class code 0..9 8. Missing Attribute Values None 9. Class Distribution Class: No of examples in training set 0: 376 1: 389 2: 380 3: 389 4: 387 5: 376 6: 377 7: 387 8: 380 9: 382 Class: No of examples in testing set 0: 178 1: 182 2: 177 3: 183 4: 181 5: 182 6: 181 7: 179 8: 174 9: 180 Accuracy on the testing set with k-nn using Euclidean distance as the metric k = 1 : 98.00 k = 2 : 97.38 k = 3 : 97.83 k = 4 : 97.61 k = 5 : 97.89 k = 6 : 97.77 k = 7 : 97.66 k = 8 : 97.66 k = 9 : 97.72 k = 10 : 97.55 k = 11 : 97.89

65 features

class (target)	nominal	10 unique values 0 missing
input48	numeric	7 unique values 0 missing
input34	numeric	16 unique values 0 missing
input47	numeric	17 unique values 0 missing
input46	numeric	17 unique values 0 missing
input45	numeric	17 unique values 0 missing
input44	numeric	17 unique values 0 missing
input43	numeric	17 unique values 0 missing
input42	numeric	17 unique values 0 missing
input41	numeric	8 unique values 0 missing
input40	numeric	1 unique values 0 missing
input39	numeric	15 unique values 0 missing
input38	numeric	17 unique values 0 missing
input37	numeric	17 unique values 0 missing
input36	numeric	17 unique values 0 missing
input35	numeric	17 unique values 0 missing
input33	numeric	2 unique values 0 missing
input49	numeric	9 unique values 0 missing
input50	numeric	17 unique values 0 missing
input51	numeric	17 unique values 0 missing
input52	numeric	17 unique values 0 missing
input53	numeric	17 unique values 0 missing
input54	numeric	17 unique values 0 missing
input55	numeric	17 unique values 0 missing
input56	numeric	13 unique values 0 missing
input57	numeric	2 unique values 0 missing
input58	numeric	11 unique values 0 missing
input59	numeric	17 unique values 0 missing
input60	numeric	17 unique values 0 missing
input61	numeric	17 unique values 0 missing
input62	numeric	17 unique values 0 missing
input63	numeric	17 unique values 0 missing
input64	numeric	17 unique values 0 missing
input17	numeric	5 unique values 0 missing
input2	numeric	9 unique values 0 missing
input3	numeric	17 unique values 0 missing
input4	numeric	17 unique values 0 missing
input5	numeric	17 unique values 0 missing
input6	numeric	17 unique values 0 missing
input7	numeric	17 unique values 0 missing
input8	numeric	17 unique values 0 missing
input9	numeric	4 unique values 0 missing
input10	numeric	17 unique values 0 missing
input11	numeric	17 unique values 0 missing
input12	numeric	17 unique values 0 missing
input13	numeric	17 unique values 0 missing
input14	numeric	17 unique values 0 missing
input15	numeric	17 unique values 0 missing
input16	numeric	15 unique values 0 missing
input1	numeric	1 unique values 0 missing
input18	numeric	17 unique values 0 missing
input19	numeric	17 unique values 0 missing
input20	numeric	17 unique values 0 missing
input21	numeric	17 unique values 0 missing
input22	numeric	17 unique values 0 missing
input23	numeric	17 unique values 0 missing
input24	numeric	9 unique values 0 missing
input25	numeric	2 unique values 0 missing
input26	numeric	17 unique values 0 missing
input27	numeric	17 unique values 0 missing
input28	numeric	17 unique values 0 missing
input29	numeric	17 unique values 0 missing
input30	numeric	17 unique values 0 missing
input31	numeric	17 unique values 0 missing
input32	numeric	3 unique values 0 missing

Show all 65 features

107 properties

NumberOfInstances

5620

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

AutoCorrelation

0.09

Average class difference between consecutive instances.

CfsSubsetEval_DecisionStumpAUC

0.69

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpErrRate

0.8

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpKappa

0.11

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesAUC

0.99

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesKappa

0.9

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NAUC

0.99

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NErrRate

0.02

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NKappa

0.98

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

ClassEntropy

3.32

Entropy of the target attribute values.

DecisionStumpAUC

0.69

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpErrRate

0.8

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpKappa

0.11

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Dimensionality

0.01

Number of attributes divided by the number of instances.

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

J48.00001.AUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.ErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.Kappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.0001.AUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.ErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.Kappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.001.AUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.ErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.Kappa

0.86

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MajorityClassPercentage

10.18

Percentage of instances belonging to the most frequent class.

MajorityClassSize

572

Number of instances belonging to the most frequent class.

MaxAttributeEntropy

Maximum entropy among attributes.

MaxKurtosisOfNumericAtts

2807.5

Maximum kurtosis among attributes of the numeric type.

MaxMeansOfNumericAtts

11.99

Maximum of means among attributes of the numeric type.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MaxSkewnessOfNumericAtts

Maximum skewness among attributes of the numeric type.

MaxStdDevOfNumericAtts

6.52

Maximum standard deviation of attributes of the numeric type.

MeanAttributeEntropy

Average entropy of the attributes.

MeanKurtosisOfNumericAtts

168.55

Mean kurtosis among attributes of the numeric type.

MeanMeansOfNumericAtts

4.91

Mean of means among attributes of the numeric type.

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

MeanSkewnessOfNumericAtts

5.45

Mean skewness among attributes of the numeric type.

MeanStdDevOfNumericAtts

3.69

Mean standard deviation of attributes of the numeric type.

MinAttributeEntropy

Minimal entropy among attributes.

MinKurtosisOfNumericAtts

-1.65

Minimum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

Minimum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-1.3

Minimum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

Minimum standard deviation of attributes of the numeric type.

MinorityClassPercentage

9.86

Percentage of instances belonging to the least frequent class.

MinorityClassSize

554

Number of instances belonging to the least frequent class.

NaiveBayesAUC

0.98

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesKappa

0.9

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NumberOfBinaryFeatures

Number of binary attributes.

PercentageOfBinaryFeatures

Percentage of binary attributes.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

PercentageOfMissingValues

Percentage of missing values.

PercentageOfNumericFeatures

98.46

Percentage of numeric attributes.

PercentageOfSymbolicFeatures

1.54

Percentage of nominal attributes.

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile1KurtosisOfNumericAtts

-1.37

First quartile of kurtosis among attributes of the numeric type.

Quartile1MeansOfNumericAtts

0.26

First quartile of means among attributes of the numeric type.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

Quartile1SkewnessOfNumericAtts

-0.33

First quartile of skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

0.97

First quartile of standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

Quartile2KurtosisOfNumericAtts

0.08

Second quartile (Median) of kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

4.57

Second quartile (Median) of means among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

0.56

Second quartile (Median) of skewness among attributes of the numeric type.

Quartile2StdDevOfNumericAtts

4.3

Second quartile (Median) of standard deviation of attributes of the numeric type.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

Quartile3KurtosisOfNumericAtts

20.3

Third quartile of kurtosis among attributes of the numeric type.

Quartile3MeansOfNumericAtts

9.05

Third quartile of means among attributes of the numeric type.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

Quartile3SkewnessOfNumericAtts

4.07

Third quartile of skewness among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

5.87

Third quartile of standard deviation of attributes of the numeric type.

REPTreeDepth1AUC

0.69

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1ErrRate

0.8

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1Kappa

0.11

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth2AUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2ErrRate

0.65

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2Kappa

0.28

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth3AUC

0.89

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3ErrRate

0.43

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3Kappa

0.52

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

RandomTreeDepth1AUC

0.69

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1ErrRate

0.82

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1Kappa

0.09

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth2AUC

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2ErrRate

0.69

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2Kappa

0.24

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth3AUC

0.86

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3ErrRate

0.49

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3Kappa

0.46

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

kNN1NAUC

0.99

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NErrRate

0.02

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NKappa

0.98

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

Show all 107 properties

11 tasks

Supervised Classification on optdigits

0 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: 20% Holdout (Ordered) - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: 10% Holdout set - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: 33% Holdout set - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: Leave one out - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: Test on Training Data - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10 times 10-fold Learning Curve - target_feature: class

Supervised Data Stream Classification on optdigits

0 runs - estimation_procedure: Interleaved Test then Train - target_feature: class

Define a new task

Sign in

optdigits

65 features

107 properties

11 tasks