{ "data_id": "17", "name": "credit-g", "exact_name": "credit-g", "version": 1, "version_label": "1", "description": "**Author**: \n**Source**: Unknown - \n**Please cite**: \n\nDescription of the German credit dataset.\n \n 1. Title: German Credit data\n \n 2. Source Information\n \n Professor Dr. Hans Hofmann \n Institut f\"ur Statistik und \"Okonometrie \n Universit\"at Hamburg \n FB Wirtschaftswissenschaften \n Von-Melle-Park 5 \n 2000 Hamburg 13 \n \n 3. Number of Instances: 1000\n \n Two datasets are provided. the original dataset, in the form provided\n by Prof. Hofmann, contains categorical\/symbolic attributes and\n is in the file \"german.data\". \n \n For algorithms that need numerical attributes, Strathclyde University \n produced the file \"german.data-numeric\". This file has been edited \n and several indicator variables added to make it suitable for \n algorithms which cannot cope with categorical variables. Several\n attributes that are ordered categorical (such as attribute 17) have\n been coded as integer. This was the form used by StatLog.\n \n \n 6. Number of Attributes german: 20 (7 numerical, 13 categorical)\n Number of Attributes german.numer: 24 (24 numerical)\n \n \n 7. Attribute description for german\n \n Attribute 1: (qualitative)\n \t Status of existing checking account\n A11 : ... < 0 DM\n \t A12 : 0 <= ... < 200 DM\n \t A13 : ... >= 200 DM \/\n \t\t salary assignments for at least 1 year\n A14 : no checking account\n \n Attribute 2: (numerical)\n \t Duration in month\n \n Attribute 3: (qualitative)\n \t Credit history\n \t A30 : no credits taken\/\n \t\t all credits paid back duly\n A31 : all credits at this bank paid back duly\n \t A32 : existing credits paid back duly till now\n A33 : delay in paying off in the past\n \t A34 : critical account\/\n \t\t other credits existing (not at this bank)\n \n Attribute 4: (qualitative)\n \t Purpose\n \t A40 : car (new)\n \t A41 : car (used)\n \t A42 : furniture\/equipment\n \t A43 : radio\/television\n \t A44 : domestic appliances\n \t A45 : repairs\n \t A46 : education\n \t A47 : (vacation - does not exist?)\n \t A48 : retraining\n \t A49 : business\n \t A410 : others\n \n Attribute 5: (numerical)\n \t Credit amount\n \n Attibute 6: (qualitative)\n \t Savings account\/bonds\n \t A61 : ... < 100 DM\n \t A62 : 100 <= ... < 500 DM\n \t A63 : 500 <= ... < 1000 DM\n \t A64 : .. >= 1000 DM\n A65 : unknown\/ no savings account\n \n Attribute 7: (qualitative)\n \t Present employment since\n \t A71 : unemployed\n \t A72 : ... < 1 year\n \t A73 : 1 <= ... < 4 years \n \t A74 : 4 <= ... < 7 years\n \t A75 : .. >= 7 years\n \n Attribute 8: (numerical)\n \t Installment rate in percentage of disposable income\n \n Attribute 9: (qualitative)\n \t Personal status and sex\n \t A91 : male : divorced\/separated\n \t A92 : female : divorced\/separated\/married\n A93 : male : single\n \t A94 : male : married\/widowed\n \t A95 : female : single\n \n Attribute 10: (qualitative)\n \t Other debtors \/ guarantors\n \t A101 : none\n \t A102 : co-applicant\n \t A103 : guarantor\n \n Attribute 11: (numerical)\n \t Present residence since\n \n Attribute 12: (qualitative)\n \t Property\n \t A121 : real estate\n \t A122 : if not A121 : building society savings agreement\/\n \t\t\t\t life insurance\n A123 : if not A121\/A122 : car or other, not in attribute 6\n \t A124 : unknown \/ no property\n \n Attribute 13: (numerical)\n \t Age in years\n \n Attribute 14: (qualitative)\n \t Other installment plans \n \t A141 : bank\n \t A142 : stores\n \t A143 : none\n \n Attribute 15: (qualitative)\n \t Housing\n \t A151 : rent\n \t A152 : own\n \t A153 : for free\n \n Attribute 16: (numerical)\n Number of existing credits at this bank\n \n Attribute 17: (qualitative)\n \t Job\n \t A171 : unemployed\/ unskilled - non-resident\n \t A172 : unskilled - resident\n \t A173 : skilled employee \/ official\n \t A174 : management\/ self-employed\/\n \t\t highly qualified employee\/ officer\n \n Attribute 18: (numerical)\n \t Number of people being liable to provide maintenance for\n \n Attribute 19: (qualitative)\n \t Telephone\n \t A191 : none\n \t A192 : yes, registered under the customers name\n \n Attribute 20: (qualitative)\n \t foreign worker\n \t A201 : yes\n \t A202 : no\n \n \n \n 8. Cost Matrix\n \n This dataset requires use of a cost matrix (see below)\n \n \n 1 2\n ----------------------------\n 1 0 1\n -----------------------\n 2 5 0\n \n (1 = Good, 2 = Bad)\n \n the rows represent the actual classification and the columns\n the predicted classification.\n \n It is worse to class a customer as good when they are bad (5), \n than it is to class a customer as bad when they are good (1).\n \n\n\n\n\n Relabeled values in attribute checking_status\n From: A11 To: '<0' \n From: A12 To: '0<=X<200' \n From: A13 To: '>=200' \n From: A14 To: 'no checking' \n\n\n Relabeled values in attribute credit_history\n From: A30 To: 'no credits\/all paid'\n From: A31 To: 'all paid' \n From: A32 To: 'existing paid' \n From: A33 To: 'delayed previously'\n From: A34 To: 'critical\/other existing credit'\n\n\n Relabeled values in attribute purpose\n From: A40 To: 'new car' \n From: A41 To: 'used car' \n From: A42 To: furniture\/equipment \n From: A43 To: radio\/tv \n From: A44 To: 'domestic appliance'\n From: A45 To: repairs \n From: A46 To: education \n From: A47 To: vacation \n From: A48 To: retraining \n From: A49 To: business \n From: A410 To: other \n\n\n Relabeled values in attribute savings_status\n From: A61 To: '<100' \n From: A62 To: '100<=X<500' \n From: A63 To: '500<=X<1000' \n From: A64 To: '>=1000' \n From: A65 To: 'no known savings' \n\n\n Relabeled values in attribute employment\n From: A71 To: unemployed \n From: A72 To: '<1' \n From: A73 To: '1<=X<4' \n From: A74 To: '4<=X<7' \n From: A75 To: '>=7' \n\n\n Relabeled values in attribute personal_status\n From: A91 To: 'male div\/sep' \n From: A92 To: 'female div\/dep\/mar'\n From: A93 To: 'male single' \n From: A94 To: 'male mar\/wid' \n From: A95 To: 'female single' \n\n\n Relabeled values in attribute other_parties\n From: A101 To: none \n From: A102 To: 'co applicant' \n From: A103 To: guarantor \n\n\n Relabeled values in attribute property_magnitude\n From: A121 To: 'real estate' \n From: A122 To: 'life insurance' \n From: A123 To: car \n From: A124 To: 'no known property' \n\n\n Relabeled values in attribute other_payment_plans\n From: A141 To: bank \n From: A142 To: stores \n From: A143 To: none \n\n\n Relabeled values in attribute housing\n From: A151 To: rent \n From: A152 To: own \n From: A153 To: 'for free' \n\n\n Relabeled values in attribute job\n From: A171 To: 'unemp\/unskilled non res'\n From: A172 To: 'unskilled resident'\n From: A173 To: skilled \n From: A174 To: 'high qualif\/self emp\/mgmt'\n\n\n Relabeled values in attribute own_telephone\n From: A191 To: none \n From: A192 To: yes \n\n\n Relabeled values in attribute foreign_worker\n From: A201 To: yes \n From: A202 To: no \n\n\n Relabeled values in attribute class\n From: 1 To: good \n From: 2 To: bad", "format": "ARFF", "uploader": "Jan van Rijn", "uploader_id": 1, "visibility": "public", "creator": null, "contributor": null, "date": "2014-04-06 23:21:47", "update_comment": null, "last_update": "2014-04-06 23:21:47", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/31\/dataset_31_credit-g.arff", "default_target_attribute": "class", "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "credit-g", "Description of the German credit dataset. 1. Title: German Credit data 2. Source Information Professor Dr. Hans Hofmann Institut f\"ur Statistik und \"Okonometrie Universit\"at Hamburg FB Wirtschaftswissenschaften Von-Melle-Park 5 2000 Hamburg 13 3. Number of Instances: 1000 Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical\/symbolic attributes and is in the file \"german.data\". For algorithms that need numerical attributes, Strathclyde Unive " ], "weight": 5 }, "qualities": { "NumberOfInstances": 1000, "NumberOfFeatures": 21, "NumberOfClasses": 2, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 7, "NumberOfSymbolicFeatures": 14, "AutoCorrelation": 0.5695695695695696, "CfsSubsetEval_DecisionStumpAUC": 0.7208285714285714, "CfsSubsetEval_DecisionStumpErrRate": 0.273, "CfsSubsetEval_DecisionStumpKappa": 0.3056968463886062, "CfsSubsetEval_NaiveBayesAUC": 0.7208285714285714, "CfsSubsetEval_NaiveBayesErrRate": 0.273, "CfsSubsetEval_NaiveBayesKappa": 0.3056968463886062, "CfsSubsetEval_kNN1NAUC": 0.7208285714285714, "CfsSubsetEval_kNN1NErrRate": 0.273, "CfsSubsetEval_kNN1NKappa": 0.3056968463886062, "ClassEntropy": 0.8812908992306927, "DecisionStumpAUC": 0.6647619047619048, "DecisionStumpErrRate": 0.3, "DecisionStumpKappa": 0, "Dimensionality": 0.021, "EquivalentNumberOfAtts": 43.593036016305035, "J48.00001.AUC": 0.6617476190476189, "J48.00001.ErrRate": 0.279, "J48.00001.Kappa": 0.24431202600216673, "J48.0001.AUC": 0.6617476190476189, "J48.0001.ErrRate": 0.279, "J48.0001.Kappa": 0.24431202600216673, "J48.001.AUC": 0.6617476190476189, "J48.001.ErrRate": 0.279, "J48.001.Kappa": 0.24431202600216673, "MajorityClassPercentage": 70, "MajorityClassSize": 700, "MaxAttributeEntropy": 2.6666777598518516, "MaxKurtosisOfNumericAtts": 4.292590308048501, "MaxMeansOfNumericAtts": 3271.257999999999, "MaxMutualInformation": 0.09473884155264, "MaxNominalAttDistinctValues": 10, "MaxSkewnessOfNumericAtts": 1.9496276798326246, "MaxStdDevOfNumericAtts": 2822.7368759604396, "MeanAttributeEntropy": 1.433893225502841, "MeanKurtosisOfNumericAtts": 0.9242775257981102, "MeanMeansOfNumericAtts": 476.58385714285697, "MeanMutualInformation": 0.02021632305905615, "MeanNoiseToSignalRatio": 69.92749860170595, "MeanNominalAttDistinctValues": 4, "MeanSkewnessOfNumericAtts": 0.9203791257169068, "MeanStdDevOfNumericAtts": 407.04761882821174, "MinAttributeEntropy": 0.2283640258405646, "MinKurtosisOfNumericAtts": -1.3814485027493755, "MinMeansOfNumericAtts": 1.1549999999999998, "MinMutualInformation": 0.00096366001491, "MinNominalAttDistinctValues": 2, "MinSkewnessOfNumericAtts": -0.5313481143125632, "MinStdDevOfNumericAtts": 0.3620857717531919, "MinorityClassPercentage": 30, "MinorityClassSize": 300, "NaiveBayesAUC": 0.786047619047619, "NaiveBayesErrRate": 0.248, "NaiveBayesKappa": 0.3724696356275304, "NumberOfBinaryFeatures": 3, "PercentageOfBinaryFeatures": 14.285714285714285, "PercentageOfInstancesWithMissingValues": 0, "PercentageOfMissingValues": 0, "PercentageOfNumericFeatures": 33.33333333333333, "PercentageOfSymbolicFeatures": 66.66666666666666, "Quartile1AttributeEntropy": 0.9089779148834296, "Quartile1KurtosisOfNumericAtts": -1.2104731179379757, "Quartile1MeansOfNumericAtts": 1.407, "Quartile1MutualInformation": 0.005310005973835, "Quartile1SkewnessOfNumericAtts": -0.2725698140337198, "Quartile1StdDevOfNumericAtts": 0.5776544682460991, "Quartile2AttributeEntropy": 1.5321036281187235, "Quartile2KurtosisOfNumericAtts": 0.9197813600546327, "Quartile2MeansOfNumericAtts": 2.9730000000000003, "Quartile2MutualInformation": 0.01275318647802, "Quartile2SkewnessOfNumericAtts": 1.09418417155554, "Quartile2StdDevOfNumericAtts": 1.1187146743126737, "Quartile3AttributeEntropy": 1.8749120301493503, "Quartile3KurtosisOfNumericAtts": 1.6492736936699308, "Quartile3MeansOfNumericAtts": 35.54600000000001, "Quartile3MutualInformation": 0.02650410754481, "Quartile3SkewnessOfNumericAtts": 1.909444721297511, "Quartile3StdDevOfNumericAtts": 12.058814452756392, "REPTreeDepth1AUC": 0.7002928571428572, "REPTreeDepth1ErrRate": 0.269, "REPTreeDepth1Kappa": 0.22254335260115596, "REPTreeDepth2AUC": 0.7002928571428572, "REPTreeDepth2ErrRate": 0.269, "REPTreeDepth2Kappa": 0.22254335260115596, "REPTreeDepth3AUC": 0.7002928571428572, "REPTreeDepth3ErrRate": 0.269, "REPTreeDepth3Kappa": 0.22254335260115596, "RandomTreeDepth1AUC": 0.6580238095238096, "RandomTreeDepth1ErrRate": 0.298, "RandomTreeDepth1Kappa": 0.2781007751937982, "RandomTreeDepth2AUC": 0.6580238095238096, "RandomTreeDepth2ErrRate": 0.298, "RandomTreeDepth2Kappa": 0.2781007751937982, "RandomTreeDepth3AUC": 0.6580238095238096, "RandomTreeDepth3ErrRate": 0.298, "RandomTreeDepth3Kappa": 0.2781007751937982, "StdvNominalAttDistinctValues": 2.0380986614602725, "kNN1NAUC": 0.649047619047619, "kNN1NErrRate": 0.286, "kNN1NKappa": 0.30447470817120614 }, "tags": [ { "tag": "study_14", "uploader": "1" }, { "tag": "study_1", "uploader": "0" }, { "tag": "study_129", "uploader": "0" }, { "tag": "study_234", "uploader": "0" }, { "tag": "study_143", "uploader": "0" } ], "features": [ { "name": "class", "index": "20", "type": "nominal", "distinct": "2", "missing": "0", "target": "1", "distr": [ [ "good", "bad" ], [ [ "700", "0" ], [ "0", "300" ] ] ] }, { "name": "residence_since", "index": "10", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "4", "mean": "3", "stdev": "1" }, { "name": "foreign_worker", "index": "19", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "yes", "no" ], [ [ "667", "296" ], [ "33", "4" ] ] ] }, { "name": "own_telephone", "index": "18", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "none", "yes" ], [ [ "409", "187" ], [ "291", "113" ] ] ] }, { "name": "num_dependents", "index": "17", "type": "numeric", "distinct": "2", "missing": "0", "min": "1", "max": "2", "mean": "1", "stdev": "0" }, { "name": "job", "index": "16", "type": "nominal", "distinct": "4", "missing": "0", "distr": [ [ "unemp\/unskilled non res", "unskilled resident", "skilled", "high qualif\/self emp\/mgmt" ], [ [ "15", "7" ], [ "144", "56" ], [ "444", "186" ], [ "97", "51" ] ] ] }, { "name": "existing_credits", "index": "15", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "4", "mean": "1", "stdev": "1" }, { "name": "housing", "index": "14", "type": "nominal", "distinct": "3", "missing": "0", "distr": [ [ "rent", "own", "for free" ], [ [ "109", "70" ], [ "527", "186" ], [ "64", "44" ] ] ] }, { "name": "other_payment_plans", "index": "13", "type": "nominal", "distinct": "3", "missing": "0", "distr": [ [ "bank", "stores", "none" ], [ [ "82", "57" ], [ "28", "19" ], [ "590", "224" ] ] ] }, { "name": "age", "index": "12", "type": "numeric", "distinct": "53", "missing": "0", "min": "19", "max": "75", "mean": "36", "stdev": "11" }, { "name": "property_magnitude", "index": "11", "type": "nominal", "distinct": "4", "missing": "0", "distr": [ [ "real estate", "life insurance", "car", "no known property" ], [ [ "222", "60" ], [ "161", "71" ], [ "230", "102" ], [ "87", "67" ] ] ] }, { "name": "checking_status", "index": "0", "type": "nominal", "distinct": "4", "missing": "0", "distr": [ [ "<0", "0<=X<200", ">=200", "no checking" ], [ [ "139", "135" ], [ "164", "105" ], [ "49", "14" ], [ "348", "46" ] ] ] }, { "name": "other_parties", "index": "9", "type": "nominal", "distinct": "3", "missing": "0", "distr": [ [ "none", "co applicant", "guarantor" ], [ [ "635", "272" ], [ "23", "18" ], [ "42", "10" ] ] ] }, { "name": "personal_status", "index": "8", "type": "nominal", "distinct": "4", "missing": "0", "distr": [ [ "male div\/sep", "female div\/dep\/mar", "male single", "male mar\/wid", "female single" ], [ [ "30", "20" ], [ "201", "109" ], [ "402", "146" ], [ "67", "25" ], [ "0", "0" ] ] ] }, { "name": "installment_commitment", "index": "7", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "4", "mean": "3", "stdev": "1" }, { "name": "employment", "index": "6", "type": "nominal", "distinct": "5", "missing": "0", "distr": [ [ "unemployed", "<1", "1<=X<4", "4<=X<7", ">=7" ], [ [ "39", "23" ], [ "102", "70" ], [ "235", "104" ], [ "135", "39" ], [ "189", "64" ] ] ] }, { "name": "savings_status", "index": "5", "type": "nominal", "distinct": "5", "missing": "0", "distr": [ [ "<100", "100<=X<500", "500<=X<1000", ">=1000", "no known savings" ], [ [ "386", "217" ], [ "69", "34" ], [ "52", "11" ], [ "42", "6" ], [ "151", "32" ] ] ] }, { "name": "credit_amount", "index": "4", "type": "numeric", "distinct": "921", "missing": "0", "min": "250", "max": "18424", "mean": "3271", "stdev": "2823" }, { "name": "purpose", "index": "3", "type": "nominal", "distinct": "10", "missing": "0", "distr": [ [ "new car", "used car", "furniture\/equipment", "radio\/tv", "domestic appliance", "repairs", "education", "vacation", "retraining", "business", "other" ], [ [ "145", "89" ], [ "86", "17" ], [ "123", "58" ], [ "218", "62" ], [ "8", "4" ], [ "14", "8" ], [ "28", "22" ], [ "0", "0" ], [ "8", "1" ], [ "63", "34" ], [ "7", "5" ] ] ] }, { "name": "credit_history", "index": "2", "type": "nominal", "distinct": "5", "missing": "0", "distr": [ [ "no credits\/all paid", "all paid", "existing paid", "delayed previously", "critical\/other existing credit" ], [ [ "15", "25" ], [ "21", "28" ], [ "361", "169" ], [ "60", "28" ], [ "243", "50" ] ] ] }, { "name": "duration", "index": "1", "type": "numeric", "distinct": "33", "missing": "0", "min": "4", "max": "72", "mean": "21", "stdev": "12" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }