- Access to Resources 2193
- Active Advertising Systems 100
- Arts 505
- Audiobooks 63
- Banner networks 87
- Courses, Lessons 1494
- Databases 3728
- Base of Trust sites 395
- Others 3333
- Design 3758
- Game Accounts 23275
- Gift Cards 1140
- Hosting 101
- Invites 349
- iTunes & App Store 7475
- Miscellaneous 1100
- Mobile Phones 1509
- Photos 1107
- Social Networks 1451
- Sports Predictions 100
- Templates 691
Test data for testing scoring models
Refunds: 0
Uploaded: 23.07.2010
Content: data_for_research_and_building_scoring_models.rar 2028,25 kB
Product description
The archive contains the anonymous test data for testing a variety of statistical scoring models, as well as for research to find a variety of statistical regularities.
Model Data (Modeling_Data.txt within the file Modeling_Data.zip, located in the main archive) containing 50,000 records, record fields are separated by tabs. Each record represents a depersonalized information about 31-m parameter (regressor) of the borrower, and information he gave a loan or not. Despite the fact that these are impersonal, they contain all the laws of the real domain.
The file contained in the archive Variables_List.zip, describes the fields of modeling data.
Additional information
All data in the archive in English! Consequently, in order to use them need a minimum of his knowledge (or desire to understand). The data available to the public at one of the international competitions of Data Mining-y.
Some of the fields of model data:
ID_CLIENT - Customer ID (borrower)
ID_SHOP - Identifier loan store where you purchased the credit product
SEX - Sex (M - male, F - female)
MARITAL_STATUS - Marital status (S - Single / Single, On - Single / Single, D - divorced, V - a widower / widow, O - Other)
AGE - Age
QUANT_DEPENDANTS - Number of dependents in the borrower
EDUCATION - educational level (can be specified)
FLAG_RESIDENCIAL_PHONE - Is there a permanent phone number (Y - yes, N - no)
AREA_CODE_RESIDENCIAL_PHONE - Changed the area code phone borrower
PAYMENT_DAY - Fixed day of the month of the regular payment of the loan repayment
SHOP_RANK - Rating vendor loan product, presented in financial terms
RESIDENCE_TYPE - Type of housing (P - own, A - leased, C - at home parents, O - Other)
MONTHS_IN_RESIDENCE - while staying in the current location in months
FLAG_MOTHERS_NAME - Does the application form the name of the borrower's mother (Y - yes, N - no)
FLAG_FATHERS_NAME - Does the questionnaire borrower name of the father (Y - yes, N - no)
and so on until the last field:
TARGET_LABEL_BAD - I gave you in the end the borrower the loan (1 - not to give 0 - handed)
Possible areas of applied research, which may be based on these data:
- Scoring.
- Mathematical statistics (including non-classical sections, for example, non-numeric objects of nature Statistics).
- Neural Networks
In addition, the archive has two sets of data (files and Prediction_Data.zip LeaderBoard_Data.zip) 10,000 records each without indication of the borrower paid or not. These datasets can be used to verify that you have created statistical models. Of particular value is the fact that these two dataset contains data for other time periods (there is even a field does not coincide), which will verify the robustness (resistance) of your scoring a mathematical model to minor opportunistic socio-economic changes taking place over time. This will help you to create models really reflect is hidden patterns domain, that is, the laws of nature.
UPD.
According to these data, for example, it is possible to establish that the fact that women - more conscientious payers not speculation and a statistical fact virtually any confidence level - with 95% and 99%.
Feedback
1Period | |||
1 month | 3 months | 12 months | |
0 | 0 | 0 | |
0 | 0 | 0 |