Home > Data Lab > Data Set
  • Taobao_Clothes_Matching

    Providers : Alibaba

    Posted : 2015.12.09

    #Participants : 398

Data Set Description

Document (You can download after you login)

Format

dim_fashion_matchsets(new).txt

.txt (752KB)

dim_items(new).txt

.txt (49MB)

example_result.txt

.txt (129B)

test_items(new).txt

.txt (40KB)

user_bought_history(new).zip

.zip (84MB)

tianchi_fm_img3_1.zip

download

tianchi_fm_img3_2.zip

download

tianchi_fm_img3_3.zip

download

tianchi_fm_img3_4.zip

download

Overview
In Data Lab, we provide data sets and evaluation system of previous competitions for people to test the warters of machine learning and data mining. The following data sets are from Clothes Matching Challenge on taobao.com. We provide a baseline (the result of the top team) on the leaderboard and conduct the evaluation every other day. For walkthroughs and FAQs, please go to the competition Forum.

Introduction
Taobao is one of the most famous Chinese website for online shopping, which is similar to eBay and Amazon. It facilities C2C retail by providing a platform for small businesses and individual entrepreneurs to open online stores.

In Taobao, apparel and accessories industries occupy the market by the vast majority of the share. Clothing matching (e.g. find appropriate pants and shoes for a shirt) is a very important topic in shopping guide. The extension of this technology can be widely applied to varieties of scenarios of big data marketing, such as search, recommendation, and advertising etc.

In this competition, we provide data sets of clothing collocation from fashion experts, image data of Taobao items, and user behavior data. Participants are required to train their model, which provides personalized, quality, professional clothing collocation suggestion.

Data Description
The data includes three parts, the basic information of the item data (text, image), user historical behavior data and Collocation set data.

Collocation set data: dim_fashion_match_sets

Column

Type

Description

Comment

coll_id

bigint

Collocation set ID

1000

item_list

string

Item list (delimited by semicolon, every semicolon refers to a Collocation set. Every collocation set includes several goods, delimited by comma)

1002,1003,1004;439201;1569773,234303;19836

 

Item information data: dim_items

Column

Type

Description

Comment

item_id

bigint

Item ID

439201

cat_id

bigint

The category ID of item

16

terms

string

The results of the item Title after segmentation. The order is disrupted

5263,2541,2876263

img_data

string

image of the item (note: in season1, there is no such column. The image is available for download. The file name is item_id.jpg)

  

 

User historical behavior data: user_bought_history

Column

Type

Description

Comment

user_id

bigint

User ID

62378843278

item_id

bigint

Item ID

439201

create_at

string

Action date£¨buy£©

20140911

Note: all the user ID, item ID, category ID from the above table are desensitized. In season 1, the image data is available for download. In season 2, the image data is stored on the ODPS platform.Players are r equired not to use the external data.

We provide a list of items to be predicted, the players are required to predict the corresponding list of items for each item of collocation.

List of items to be predicted: test_items

Column

Type

Description

Comment

item_id

bigint

Item ID

90832747

 

The table to submit: fm_submissions

Column

Type

Description

Comment

item_id

bigint

Item ID

90832747

item_list

string

Predicted item list for the iterm_id

(each item list includes 200 items. If more than200 ,only the first 200 items count.  Items should be separated by comma)

1002,439201,364576,1569773,19836,437474ÿ

The file should be named as fm_submissions.txt, and make sure there are no repeat records. (Please make the submission file according to the following example).

Evaluation

Reference:
http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision