Home > Data Lab > Data Set
  • Purchase_Redemption_Forecasts

    Providers : Ant Financial Services

    Posted : 2015.08.07

    #Participants : 354

Data Set Description

Document (You can download after you login)

Format

ODPS User Manual.pdf

.pdf (3MB)

mfd_bank_shibor.csv

.csv (19KB)

mfd_day_share_interest.csv

.csv (9KB)

user_balance_table.zip

.zip (25MB)

user_profile_table.csv

.csv (728KB)

(example)tc_comp_predict_table.csv

.csv (84B)

Overview
In Data Lab, we provide data sets and evaluation system of previous competitions for people to test the warters of machine learning and data mining.
The followig data sets are from Season 1 of Purchase and Redemption Forecasts Competition. We provide a baseline (the result of the top team - 酸辣紫菜泡面) on the leaderboard and conduct the evaluation every other day. For walkthroughs and FAQs, please go to the competition Forum.


Introduction
Ant Financial Services Group (AFSG) processes cash inflow and outflow for millions of its members. As one can imagine, predicting future cash flows based on historical data is an important part of AFSG's business. Participants will be challenged to predict future cash flows based on users' historical purchase and redemption data to help Ant Financial Services Group (AFSG) improve its funds management abilities. (Purchases refers to funds inflow, while redemptions refers to funds outflow.)

Data Description
In many cases we need to develop an individualized recommendation system for a subset of all items. When fulfilling such task, besides utilizing the user behavior data in such subset of items, we also need to utilize more comprehensive user behavior data. Notations:


1. 
User profile table

We randomly selected a total number of 30,000 users. Those who appeared in September 2014 for the first time have been put into the test data set. So that the training data set includes about 28,000 users. The attributes info is shown in table 1:

Table 1: User profile table: user_profile_table


Attribute

Type

Description

Example

user_id

bigint

User ID

1234

Sex

bigint

Gender (1: male, 0: female)

0

City

bigint

Which city the user lives in

6081949

Constellation

string

Constellation

Sagittarius


2. Purchase and redemption sheet

It contains about 2.8 million records, which include the purchase and redemption behaviors during 2013.07.01-2014.08.31, and the information of all sub categories. The data has been desensitized on the premise of basically keeping the original trend. The data includes operation time and operation record (in both purchase and redemption). Amount is measured by fen (i.e. CNY 0.01). If consume_amt  = 0, the catogory1 to catogory4 are null.

Table 2: Purchase andredemption sheet: user_balance_table

Attribute

Type

Description

Example

user_id

bigint

User ID

1234

report_date

string

Date

20140407

tBalance

bigint

Today' s closing balance

109004

yBalance

bigint

Yesterday's closing balance

97389

total_purchase_amt

bigint

Today total purchase = direct purchase + revenue

21876

direct_purchase_amt

bigint

Today's direct purchase

21863

purchase_bal_amt

bigint

Today's purchase from Alipay balance

0

purchase_bank_amt

bigint

Today's purchase from bank cards

21863

total_redeem_amt

bigint

Today's total redemption amount = consumption + transfer amount

10261

consume_amt

bigint

Today's total consumption

0

transfer_amt

bigint

Toda y ' s total transfer amount

10261

tftobal_amt

bigint

Today's total transfer amount to Alipay balance

0

tftocard_amt

bigint

Today's total transfer amount into bank cards

10261

share_amt

bigint

Today's revenue

13

category1

bigint

Today's consumption for category 1

0

category2

bigint

Today's consumption for category 2

0

category3

bigint

Today's consumption for category 3

0

category4

bigint

Today's consumption for category 4

0

Note1: The above data has been desensitized. Revenue has been recalculated through as implified calculation approach. We describe it in Section 4.

Note2: The desensitized data ensures that: today's closing balance = yesterday's closing balance + today's total purchase - today's total redemption. There is no negative value.

3. Yields ratetable

Yields rate table includes rate of revenue of Yu 'E Bao within 14 months. Detailed descriptions are shown in table 3:

Table 3: mfd_day_share_interest

Attribute

Type

Description

Example

mfd_date

string

Date

20140102

mfd_daily_yield

double

Revenue per 1000,000 fen

1.5787

mfd_7daily_yield

double

7-DayAnnualizedYield (%)

6.307


4. 
Shanghai Interbank Offered Rate (Shibor) Table

Shanghai Interbank offered rate table includes 14-month lending rates per day among banks (all are annualized rates). Detailed descriptions are shown in table 4:

Table 4 : mdf_bank_shilbor

Attribute

Type

Description

Example

mfd_date

string

Date

20140102

Interest _ O_N

double

Overnight SHIBOR (%)

2.8

Interest _ 1_W

double

1-week SHIBOR (%)

4.25

Interest _ 2_W

double

2-week SHIBOR (%)

4.9

Interest _ 1_M

double

1-monthSHIBOR (%)

5.04

Interest _ 3_M

double

3-monthSHIBOR (%)

4.91

Interest _ 6_M

double

6-monthSHIBOR (%)

4.79

Interest _ 9_M

double

9-monthSHIBOR (%)

4.76

Interest _ 1_Y

double

1-YearSHIBOR (%)

4.78


5. How to Calculate the Yu’E Bao revenue

In this competition, the calculation of revenue on investment is mainly based on actual Yu'E Bao returns. For brevity, we made some simplification as follows:
1. We calculate the revenue according to the calendar day instead of the fund-trading day. That is, the actions of purchase and redemption happen before 0:00 A.M. will be counted into yesterday; actions happen after 0:00 A.M. will be counted into today.
2. When the revenue will be added into the user's account? The principles are shown in Table 5. As an example, we consider a user who deposits 1000,000 fen on Monday. It will start to generate profit on Tuesday. This user's balance will be still 1000,000 fen on Tuesday. The revenue of Tuesday will be added into the user account on Wednesday. Therefore, on Wednesday, this user finally has 1000,110 fen.

Table 5: The simplifiedYu’E Bao revenue sheet

Purchase time

Date when revenue shows in the account for the first time

Monday,

Wednesday

Tuesday

Thursday

Wednesday

Friday

Thursday

Saturday

Friday

Next Tuesday

Saturday

Next Wednesday

Sunday

Next Wednesday


6. Submit the result




Table 6: Results table: tc_comp_predict_table 

Attribute

Type

 Description

Example

report_date 

bigint

Date

20140901

purchase

bigint

Prediction Amount of Purchase

40000000

redeem  

bigint

Prediction Amount of Redemption

30000000


Participants should submit the results in the following format:

Please put your results into tc_comp_predict_table (the format should be  the same with the example file format)

Each row fills in one attribute: date, prediction of purchase, and prediction of redemption in September 2014 (a total of 30 days). Please make sure that purchase and redemption value have the accuracy of cent.


7. Evaluation

We expect to gain accurate forecasting of the daily amount of purchase and redemption in the next 30 days. The higher accuracy is preferred. However, we need to take different situations into account. For instance, some participants may have a result that shows 29 days’ accurate predictions but one day’s inaccurate prediction. Some participants may have a result that 30 days’ predictions are not accurate enough, with an average error of delta (delta is some small value). Adopt absolute error as the evaluation metric may result in a poorer score for the former than that of the latter. However, we prefer the model of the former in the real-world business. Therefore, we use the following evaluation metric. The final score-counting method is given as follows: We use the relative error of purchase and redemption to compute the daily score based on a scoring function. Then, summarize the daily scores. Finally the scores are weighted according to the factors of real-world business. We provide the details of the evaluation metric as follows.

1) Calculating the relative error of purchase and redemption:



2) The scoring function f(*) (monotonous decreasing) is used to evaluate the purchase and redemption error. Purchase prediction score is related to the error Purchase i ; so does redemption. For example, if the purchase error Purchase i =0 on ith day, 10 points are obtained for this day. If Purchase i > 0.3 , the score is 0.

3) Finally, Total Score =



The scoring function f(*) is not for public.