Data
lowbwt

lowbwt

active ARFF Publicly available Visibility: public Uploaded 23-04-2014 by Jan van Rijn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • study_136 study_2838 study_5693 study_8366 study_17352 study_20005 study_3198 study_4146 study_8366 study_18661 study_3419 study_4562 study_5402 study_5736 study_6861 study_8366 study_11381 study_688 study_1544 study_7398 study_8366 study_11067 study_13322 study_18995 study_3051 study_3686 study_3900 study_5035 study_5972 study_6303 study_6668 study_8366 study_11388 study_11961 study_1715 study_2102 study_8366 study_12250 study_12664 study_14237 study_1081 study_1876 study_2225 study_4606 study_7135 study_7223 study_12831 study_13398 study_13624 study_15920 study_17069
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Please cite: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identification code deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems. Singapore: Springer-Verlag. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: LOW BIRTH WEIGHT DATA KEYWORDS: Logistic Regression SIZE: 189 observations, 11 variables NOTE: These data come from Appendix 1 of Hosmer and Lemeshow (1989). These data are copyrighted and must be acknowledged and used accordingly. DESCRIPTIVE ABSTRACT: The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy. SOURCE: Data were collected at Baystate Medical Center, Springfield, Massachusetts, during 1986. NOTE: This data set consists of the complete data. A paired data set created from this low birth weight data may be found in plowbwt.dat and a 3 to 1 matched data set created from the low birth weight data may be found in mlowbwt.dat. Table: Code Sheet for the Variables in the Low Birth Weight Data Set. Columns Variable Abbreviation ----------------------------------------------------------------------------- 2-4 Identification Code ID 10 Low Birth Weight (0 = Birth Weight ge 2500g, LOW l = Birth Weight < 2500g) 17-18 Age of the Mother in Years AGE 23-25 Weight in Pounds at the Last Menstrual Period LWT 32 Race (1 = White, 2 = Black, 3 = Other) RACE 40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE 48 History of Premature Labor (0 = None, 1 = One, etc.) PTL 55 History of Hypertension (1 = Yes, 0 = No) HT 61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI 67 Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) 73-76 Birth Weight in Grams BWT ----------------------------------------------------------------------------- PEDAGOGICAL NOTES: These data have been used as an example of fitting a multiple logistic regression model. STORY BEHIND THE DATA: Low birth weight is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman's behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight. The variables identified in the code sheet given in the table have been shown to be associated with low birth weight in the obstetrical literature. The goal of the current study was to ascertain if these variables were important in the population being served by the medical center where the data were collected. References: 1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).

10 features

class (target)numeric133 unique values
0 missing
LOWnominal2 unique values
0 missing
AGEnumeric24 unique values
0 missing
LWTnumeric75 unique values
0 missing
RACEnominal3 unique values
0 missing
SMOKEnominal2 unique values
0 missing
PTLnominal4 unique values
0 missing
HTnominal2 unique values
0 missing
UInominal2 unique values
0 missing
FTVnominal6 unique values
0 missing

107 properties

189
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
3
Number of numeric attributes.
7
Number of nominal attributes.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
23.24
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
4
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
1.53
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
3
Average number of distinct values among the attributes of the nominal type.
-0.21
First quartile of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.64
Mean skewness among attributes of the numeric type.
5.3
First quartile of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
254.97
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.62
Second quartile (Median) of kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-0.08
Minimum kurtosis among attributes of the numeric type.
129.81
Second quartile (Median) of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
2.4
Maximum kurtosis among attributes of the numeric type.
23.24
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
2944.66
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0.72
Second quartile (Median) of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.05
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
2
The minimal number of distinct values among attributes of the nominal type.
40
Percentage of binary attributes.
30.58
Second quartile (Median) of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
6
The maximum number of distinct values among attributes of the nominal type.
-0.21
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
1.4
Maximum skewness among attributes of the numeric type.
5.3
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
2.4
Third quartile of kurtosis among attributes of the numeric type.
-44.39
Average class difference between consecutive instances.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
729.02
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
30
Percentage of numeric attributes.
2944.66
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
70
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.98
Mean kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
1.4
Third quartile of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
1032.57
Mean of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
-0.08
First quartile of kurtosis among attributes of the numeric type.
729.02
Third quartile of standard deviation of attributes of the numeric type.

7 tasks

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - target_feature: class
0 runs - estimation_procedure: 10% Holdout set - target_feature: class
0 runs - estimation_procedure: 33% Holdout set - target_feature: class
0 runs - estimation_procedure: Test on Training Data - target_feature: class
0 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: class
0 runs - estimation_procedure: Leave one out - target_feature: class
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class
Define a new task