Home > Data Lab > Data Set
  • Sina_Weibo

    Providers : Weibo Corporation

    Posted : 2015.11.17

    #Participants : 656

Data Set Description

Document (You can download after you login)

Format

weibo_predict_data(new).zip

.zip (22MB)

weibo_train_data(new).zip

.zip (159MB)

Overview

In Data Lab, we provide data sets and evaluation system of previous competitions for people to test the warters of machine learning and data mining.

The following data sets are from Sina Weibo Interaction-prediction Competition. We provide a baseline (the result of the top team) on the leaderboard and conduct the evaluation every other day. For walkthroughs and FAQs, please go to the competition Forum.

Introduction

Weibo is a Chinese microblogging website. It is one of the most popular sites in China, with a market penetration similar to the Twitter. “Weibo” is a Chinese word for “microblog”. User behaviors such as forwarding, commenting and liking are important factors that can be used to estimate the quality of a certain weibo and implement the recommendation and feed controlling strategy. In this competition, participants are required to predict the forwarding, commenting and liking amount of a weibo based on the historical interaction data.

Data Description

1. Data sets for download & Evaluation

1.1 Training data (weibo_train_data).
We sample on users and take out the original weibos of each target user in half a year (from 20140701 to 20141231). User id and Weibo id are encrypted.

Attribute

Description

uid

user   id. Sampled and encrypted

mid

weibo   id. Sampled and encrypted

time

post   time. Format YYYYMMDD

forward_count

amount   of forward within one week after posting

comment_count

amount   of comment within one week after posting

like_count

amount   of comment within one week after posting

content

weibo   content


1.2 Predicting data (weibo_predict_data).
Predicting data is from 20150101 to 20150131.

Attribute

Description

uid

user id. Sampled and encrypted

mid

weibo id. Sampled and encrypted

time

post time. Format YYYYMMDD

content

weibo content

 

1.3 Predicting result (example: weibo_result_data)
Participants are required to predict the cumulated forwarding, commenting and liking amount within one week after the posting of each weibo in weibo_predict_data and submit this result. Submitting format is as following:

Attribute

Description

uid

user   id. Sampled and encrypted

mid

weibo   id. Sampled and encrypted

forward_count

amount of forward within one week   after posting

comment_count

amount of comment within one week   after posting

like_count

amount of comment within one week   after posting

 

1.4. Evaluation
Participants are required to submit the predicting result of the forwarding, commenting and liking amount of each weibo. The difference of each behavior to a certain weibo is calculated by:

where,
countfp - predicting forwarding amount
countfr - true forwarding amount
countcp - predicting commenting amount
countcr - true commenting amount
countlp - predicting liking amount
countlr - true liking amount.

The precision for each weibo:

The global precision:

where,
sgn(x) is a modified signal function defined as sgn(x)=1 if x>0 and sgn(x)=0 if x<=0.
counti is the sum amount of forwarding, commenting and liking of ith weibo. counti take the value of 100 if counti is larger than 100.

Participants should submit the prediction results into a txt file (file name must be within 20 characters) with specified format: 
uid mid forward_count , comment_count , like_count . 
For example:


2. Full Data (Please click the "Apply" button to apply for the full data sets)

The full data includes 3 million user data.

weibo_blog_data_train (110,541,018 records)

Attribute

Type

Description

uid

string

User ID

mid

string

Weibo ID

blog_time

string

Post time

blog

string

Weibo content

weibo_fans_data_train (183,400,526 records)

Attribute

Type

Description

uid

string

User ID

fans_id

string

Fan ID 

weibo_action_data_train (239,455,700 records)

Attribute

Type

Description

action_uid

string

The user who forward, comment or like   the weibo

action_time

string

The time when the action takes place

uid

string

The user who post the weibo

mid

string

Weibo ID

blog_time

string

The post time of the weibo

action_type

string

Action (Forward: 1, Comment:   2, Like: 3)