98 LED-display-domain-7digit 1 1. Title of Database: LED display domain 2. Sources: (a) Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group: Belmont, California. (see pages 43-49). (b) Donor: David Aha (c) Date: 11/10/1988 3. Past Usage: (many) 1. CART book (above): -- Optimal Bayes classification rate: 74% -- CART decision tree algorithm: 71% (resubstitution estimate) -- Nearest Neighbor Algorithm: 71% -- 200 training and 5000 test instances 2. Quinlan,J.R. (1987). Simplifying Decision Trees. In International Journal of Man-Machine Studies (to appear). -- C4 decision tree algorithm: 72.6% (using pessimistic pruning) -- 2000 training and 500 test instances 3. Tan,M. & Eshelman,L. (1988). Using Weighted Networks to Represent Classification Knowledge in Noisy Domains. In Proceedings of the 5th International Conference on Machine Learning, 121-134, Ann Arbor, Michigan: Morgan Kaufmann. -- IWN system: 73.3% (using the And-OR classification algorithm) -- 400 training and 500 test cases 4. Relevant Information Paragraph: This simple domain contains 7 Boolean attributes and 10 concepts, the set of decimal digits. Recall that LED displays contain 7 light-emitting diodes -- hence the reason for 7 attributes. The problem would be easy if not for the introduction of noise. In this case, each attribute value has the 10% probability of having its value inverted. It's valuable to know the optimal Bayes rate for these databases. In this case, the misclassification rate is 26% (74% classification accuracy). 5. Number of Instances: 500. But in the original URL you can find a C script and run it choosing the number of instances to be generated. 6. Number of Attributes: 7 (all Boolean-valued) 7. Attribute Information: -- All attribute values are either 0 or 1, according to whether the corresponding light is on or not for the decimal digit. -- Each attribute (excluding the class attribute, which is an integer ranging between 0 and 9 inclusive) has a 10% percent chance of being inverted. 8. Missing Attribute Values: None 9. Class Distribution: 10% (Theoretical) -- Each concept (digit) has the same theoretical probability distribution. The program randomly selects the attribute. 1 ARFF 2016-07-29T20:36:10 Public https://test.openml.org/data/v1/download/98/LED-display-domain-7digit.arff 98 Class 1 study_14 public http://sci2s.ugr.es/keel/dataset.php?cod=63, https://archive.ics.uci.edu/ml/datasets/LED+Display+Domain active 2024-01-10 13:53:03 4509a210be57d02a8e468ad8c95ea277