欢迎来到天池数据集!

天池数据集介绍

天池数据集是阿里集团对外开放的科研数据平台,由阿里巴巴集团业务团队和外部研究机构联合提供,覆盖了电商、娱乐、物流、医疗健康、交通、工业、自然科学、能源等十多个行业,涵盖了数据挖掘、机器学习、计算机视觉、自然语言处理、决策智能等经典的人工智能技术领域。天池数据集分为如下几类:

  • 官方数据集:官方数据集是由天池数据科学团队维护的优质数据集,主要由3部分组成:天池大赛的赛题数据,阿里集团科研论文数据以及天池和行业学会/高校共建的数据。官方数据集是目前被AI研究人员最为广泛使用的单元,截止2022.4月,全球学者基于天池官方数据集已发表2,700多篇论文。如:淘宝用户购物行为分析数据集阿里巴巴3D家居数据集
  • 打榜数据集:打榜数据集是一种官方数据集,其定位是面向算法研究人员提供行业/技术领域的评测基准。因此除了提供静态的数据集下载之外,还提供了Leaderboard打榜和排名功能。如:中文医疗信息处理评测基准CBLUE中文多模态评测基准MUGE
  • 聚合数据集:聚合数据集是一种官方数据集,按照技术/行业领域整理了常用的公开数据资源列表,目标是帮助AI研究人员能够方便、快速地找到需要的数据资源。
  • 公共数据集:公共数据集是天池平台的开发者上传的三方数据集。



如何使用天池数据集

您可以将数据集下载到本地,或使用天池Notebook提供的算力,在数据集协议的许可下开展研究工作。如果您发表的论文中使用了天池数据集,请引用数据集页面列出的论文,如果数据集页面没有提供论文,请按照如下格式引用:

@misc{
title={Spinal Disease Dataset}, # 替换成您所使用的天池数据集名称
url={https://tianchi.aliyun.com/dataset/dataDetail?dataId=79463}, # 替换成您所使用的天池数据集链接
author={Tianchi},
year={2020}, # 替换成您所使用的天池数据集页面对应的创建年份
}

如果您在研究工作中需要使用阿里巴巴/蚂蚁集团员工署名的论文数据集,但在数据获取过程中遇到困难,可以联系天池平台(tianchi_open_dataset@alibabacloud.com)寻求帮助。

最后我们欢迎您在天池平台贡献自己的数据集,一起建设良好的科研社区。


联系我们

您可以通过 tianchi_open_dataset@alibabacloud.com 联系我们。


Welcome to Tianchi Datasets.

Introduction of Tianchi Datasets


As an open research data platform of Alibaba Group, Tianchi contains datasets jointly provided by Alibaba Group's business teams and external research institutes, covering more than 10 industries such as e-commerce, entertainment, logistics, healthcare, transportation, industry, natural science, and energy, and several conventional artificial intelligence technologies such as data mining, machine learning, computer vision, natural language processing, and decision intelligence. Tianchi datasets include the following types:

  • Recommended Dataset: As a high-quality dataset maintained by the Tianchi Data Science Team, it is derived from three sources: datasets of Tianchi Big Data Competition, research paper dataset of the Alibaba Group/Ant Group, and data jointly built by Tianchi and academic associations and universities. By April 2022, scholars around the world have published more than 2,700 papers based on Tianchi's official dataset, such as User Behavior Data from Taobao for Recommendation and 3D-FUTURE: 3D Furniture Shape with Texture.
  • Leaderboard: Aiming to provide industry and technology benchmarks for researchers. Besides the static dataset download, it also provides leaderboard and ranking systems, such as Chinese Biomedical Language Understanding Evaluation (CBLUE) and Multimodal Understanding and Generation Evaluation (MUGE).
  • Awesome List: It is an official dataset that organizes commonly used public data resources by technology and industry, with the aim of helping AI researchers conveniently find required data resources.
  • Public Dataset: Third-party dataset uploaded by developers of the Tianchi platform.


How to Use Tianchi Datasets

You can download the datasets to your local device, or conduct research using the computing resource of Tianchi Notebook under the data license. If you use Tianchi datasets in your paper, please cite the papers listed on the dataset page; if the papers are not provided on the page, please cite them using the following format:

@misc{
title={Spinal Disease Dataset}, # replace the dataset name
url={https://tianchi.aliyun.com/dataset/dataDetail?dataId=79463}, # replace the dataset link
author={Tianchi},
year={2020}, # replace with the dataset created year.
}

If you have any difficulties in obtaining the paper dataset released by employees of Alibaba Group or Ant Group for research purposes, please contact Tianchi Platform by sending an email to tianchi_open_dataset@alibabacloud.com for help.
You are welcome to share your datasets on the Tianchi platform. Let's build a good research community together.


Contact Us


You can contact us by sending an email to tianchi_open_dataset@alibabacloud.com.