- Access to Resources 2634
- Active Advertising Systems 104
- Arts 453
- Audiobooks 61
- Banner networks 87
- Courses, Lessons 1496
- Databases 3915
- Base of Trust sites 392
- Others 3523
- Design 3835
- Game Accounts 26588
- Gift Cards 1183
- Hosting 112
- Invites 199
- iTunes & App Store 5975
- Miscellaneous 1077
- Mobile Phones 1489
- Photos 1104
- Social Networks 1164
- Sports Predictions 91
- Templates 671
Test data for testing scoring models
Refunds: 0
Uploaded: 23.07.2010
Content: data_for_research_and_building_scoring_models.rar 2028,25 kB
Product description
The archive contains the anonymous test data for testing a variety of statistical scoring models, as well as for research to find a variety of statistical regularities.
Model Data (Modeling_Data.txt within the file Modeling_Data.zip, located in the main archive) containing 50,000 records, record fields are separated by tabs. Each record represents a depersonalized information about 31-m parameter (regressor) of the borrower, and information he gave a loan or not. Despite the fact that these are impersonal, they contain all the laws of the real domain.
The file contained in the archive Variables_List.zip, describes the fields of modeling data.
Additional information
All data in the archive in English! Consequently, in order to use them need a minimum of his knowledge (or desire to understand). The data available to the public at one of the international competitions of Data Mining-y.
Some of the fields of model data:
ID_CLIENT - Customer ID (borrower)
ID_SHOP - Identifier loan store where you purchased the credit product
SEX - Sex (M - male, F - female)
MARITAL_STATUS - Marital status (S - Single / Single, On - Single / Single, D - divorced, V - a widower / widow, O - Other)
AGE - Age
QUANT_DEPENDANTS - Number of dependents in the borrower
EDUCATION - educational level (can be specified)
FLAG_RESIDENCIAL_PHONE - Is there a permanent phone number (Y - yes, N - no)
AREA_CODE_RESIDENCIAL_PHONE - Changed the area code phone borrower
PAYMENT_DAY - Fixed day of the month of the regular payment of the loan repayment
SHOP_RANK - Rating vendor loan product, presented in financial terms
RESIDENCE_TYPE - Type of housing (P - own, A - leased, C - at home parents, O - Other)
MONTHS_IN_RESIDENCE - while staying in the current location in months
FLAG_MOTHERS_NAME - Does the application form the name of the borrower's mother (Y - yes, N - no)
FLAG_FATHERS_NAME - Does the questionnaire borrower name of the father (Y - yes, N - no)
and so on until the last field:
TARGET_LABEL_BAD - I gave you in the end the borrower the loan (1 - not to give 0 - handed)
Possible areas of applied research, which may be based on these data:
- Scoring.
- Mathematical statistics (including non-classical sections, for example, non-numeric objects of nature Statistics).
- Neural Networks
In addition, the archive has two sets of data (files and Prediction_Data.zip LeaderBoard_Data.zip) 10,000 records each without indication of the borrower paid or not. These datasets can be used to verify that you have created statistical models. Of particular value is the fact that these two dataset contains data for other time periods (there is even a field does not coincide), which will verify the robustness (resistance) of your scoring a mathematical model to minor opportunistic socio-economic changes taking place over time. This will help you to create models really reflect is hidden patterns domain, that is, the laws of nature.
UPD.
According to these data, for example, it is possible to establish that the fact that women - more conscientious payers not speculation and a statistical fact virtually any confidence level - with 95% and 99%.
Feedback
1Period | |||
1 month | 3 months | 12 months | |
0 | 0 | 0 | |
0 | 0 | 0 |