Data
lowbwt

lowbwt

active ARFF Publicly available Visibility: public Uploaded 23-04-2014 by Jan van Rijn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • study_182 study_615 study_293 study_251 study_380 study_163 study_109 study_134
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Please cite: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identification code deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems. Singapore: Springer-Verlag. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: LOW BIRTH WEIGHT DATA KEYWORDS: Logistic Regression SIZE: 189 observations, 11 variables NOTE: These data come from Appendix 1 of Hosmer and Lemeshow (1989). These data are copyrighted and must be acknowledged and used accordingly. DESCRIPTIVE ABSTRACT: The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy. SOURCE: Data were collected at Baystate Medical Center, Springfield, Massachusetts, during 1986. NOTE: This data set consists of the complete data. A paired data set created from this low birth weight data may be found in plowbwt.dat and a 3 to 1 matched data set created from the low birth weight data may be found in mlowbwt.dat. Table: Code Sheet for the Variables in the Low Birth Weight Data Set. Columns Variable Abbreviation ----------------------------------------------------------------------------- 2-4 Identification Code ID 10 Low Birth Weight (0 = Birth Weight ge 2500g, LOW l = Birth Weight < 2500g) 17-18 Age of the Mother in Years AGE 23-25 Weight in Pounds at the Last Menstrual Period LWT 32 Race (1 = White, 2 = Black, 3 = Other) RACE 40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE 48 History of Premature Labor (0 = None, 1 = One, etc.) PTL 55 History of Hypertension (1 = Yes, 0 = No) HT 61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI 67 Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) 73-76 Birth Weight in Grams BWT ----------------------------------------------------------------------------- PEDAGOGICAL NOTES: These data have been used as an example of fitting a multiple logistic regression model. STORY BEHIND THE DATA: Low birth weight is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman's behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight. The variables identified in the code sheet given in the table have been shown to be associated with low birth weight in the obstetrical literature. The goal of the current study was to ascertain if these variables were important in the population being served by the medical center where the data were collected. References: 1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).

10 features

class (target)numeric133 unique values
0 missing
LOWnominal2 unique values
0 missing
AGEnumeric24 unique values
0 missing
LWTnumeric75 unique values
0 missing
RACEnominal3 unique values
0 missing
SMOKEnominal2 unique values
0 missing
PTLnominal4 unique values
0 missing
HTnominal2 unique values
0 missing
UInominal2 unique values
0 missing
FTVnominal6 unique values
0 missing

107 properties

189
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
3
Number of numeric attributes.
7
Number of nominal attributes.
-44.39
Average class difference between consecutive instances.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Entropy of the target attribute values.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0.05
Number of attributes divided by the number of instances.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Maximum entropy among attributes.
2.4
Maximum kurtosis among attributes of the numeric type.
2944.66
Maximum of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
6
The maximum number of distinct values among attributes of the nominal type.
1.4
Maximum skewness among attributes of the numeric type.
729.02
Maximum standard deviation of attributes of the numeric type.
Average entropy of the attributes.
0.98
Mean kurtosis among attributes of the numeric type.
1032.57
Mean of means among attributes of the numeric type.
Average mutual information between the nominal attributes and the target attribute.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
3
Average number of distinct values among the attributes of the nominal type.
0.64
Mean skewness among attributes of the numeric type.
254.97
Mean standard deviation of attributes of the numeric type.
Minimal entropy among attributes.
-0.08
Minimum kurtosis among attributes of the numeric type.
23.24
Minimum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
2
The minimal number of distinct values among attributes of the nominal type.
-0.21
Minimum skewness among attributes of the numeric type.
5.3
Minimum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
4
Number of binary attributes.
40
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
30
Percentage of numeric attributes.
70
Percentage of nominal attributes.
First quartile of entropy among attributes.
-0.08
First quartile of kurtosis among attributes of the numeric type.
23.24
First quartile of means among attributes of the numeric type.
First quartile of mutual information between the nominal attributes and the target attribute.
-0.21
First quartile of skewness among attributes of the numeric type.
5.3
First quartile of standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
0.62
Second quartile (Median) of kurtosis among attributes of the numeric type.
129.81
Second quartile (Median) of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.72
Second quartile (Median) of skewness among attributes of the numeric type.
30.58
Second quartile (Median) of standard deviation of attributes of the numeric type.
Third quartile of entropy among attributes.
2.4
Third quartile of kurtosis among attributes of the numeric type.
2944.66
Third quartile of means among attributes of the numeric type.
Third quartile of mutual information between the nominal attributes and the target attribute.
1.4
Third quartile of skewness among attributes of the numeric type.
729.02
Third quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
1.53
Standard deviation of the number of distinct values among attributes of the nominal type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

7 tasks

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - target_feature: class
0 runs - estimation_procedure: Test on Training Data - target_feature: class
0 runs - estimation_procedure: 33% Holdout set - target_feature: class
0 runs - estimation_procedure: Leave one out - target_feature: class
0 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: class
0 runs - estimation_procedure: 10% Holdout set - target_feature: class
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class
Define a new task