天池数据集

Large-scale Dataset for Prediction of Server Failures due to DRAM Errors

描述

The dataset is about Dynamic Random Access Memory (DRAM) errors and server failures due to DRAM errors. It includes DRAM error logs and trouble tickets due to DRAM errors collected from more than 250K servers. The dataset is provided by Alibaba.

数据列表

  • 数据名称上传日期大小删除下载
  • dramdata.zip2022-06-20410.69MB

文档

Dataset Name

Large-scale Dataset for Prediction of Server Failures due to DRAM Errors

Description

The dataset is about Dynamic Random Access Memory (DRAM) errors and server failures due to DRAM errors. It includes DRAM error logs and trouble tickets due to DRAM errors collected from more than 250K servers. The dataset is provided by Alibaba.

Documents

Introduction

DRAMs are typically adopted as main memory in modern data centers. However, DRAM errors become prevalent in large-scale production environments. What's worse, DRAM errors also correlate with server failures. To encourage researchers to explore the characteristics of DRAM errors as well as correlation between DRAM errors and server failures, we release a dataset including more than 70 million DRAM errors, thousands of trouble tickets that describe the server failures caused by DRAM errors, and hardware configuration inventory logs. Our dataset is collected from more than 250K servers and 3 million DIMMs over an eight-month span at Alibaba.

Dataset Description

The dataset has three files:

  • mcelog.tar.gz includes the DRAM errors collected via mcelog in 8 columns. The columns are defined as follows:
Field Type Description
sid string The server ID
memoryid integer The DIMM ID, range from 0 to 23, note that a server attaches at most 24 DIMMs
rankid integer The rank ID, range from 0 to 1, each DIMM has 1 or 2 ranks
bankid integer The bank ID, range from 0 to 15, each rank has 16 banks
row integer The row ID, range from 0 to 2172^{17}
col integer The column ID, range from 0 to 2102^{10}
error_type integer The error type: 1 for read error, for scrubbing error, 3 for write error
error_time string The time when the error is detected in format YYYY-MM-DD hh:mm:ss
  • inventory.tar.gz: includes the hardware configuration of each server in this dataset in 4 columns. The columns are defined as follows:
Field Type Description
sid string The server ID
server_manufacturter string The server manufacturer, in annoymized format
DRAM_model string The DRAM model, in anonymized format
DIMM_num string The number of DIMMs attached to the server, should be 8 or 12 or 16 or 24
  • trouble_ticket.tar.gz: includes the hardware configuration of each server in this dataset in 4 columns. The columns are defined as follows:
Field Type Description
sid string The server ID
failure_type integer The server failure type, 1 for UE-driven failures, 2 for CE-driven failures, and 3 for miscellaneous failures.
failure_time string The time when the server failures happened, in format "YYYY-MM-DD hh:mm:ss"

Note that we have anonymized the exact dates, the server manufacturer, and the DRAM model to avoid sensitive information being inferred. Specially, the date starts from the year 0001 month 01 day 01. For the manufacturer, we use M1, M2, M3, and M4 to represents the four server manufacturer vendors, respectively. Finally, for DRAM model, we use A1, A2, B1, B2, B3, C1, and C2 to represent the seven different DRAM models where A, B, and C represent three main DRAM vendors, respectively, and numbers 1, 2, 3 denote the different models from the same DRAM vendor.

Citation

Please cite our paper if you use this dataset.

@inproceedings {cheng2022,
title = {An In-Depth Correlative Study Between DRAM Errors and Server Failures in Production Data Centers},
author = {Cheng, Zhinan and Han, Shujie and Lee, Patrick PC and Li, Xin and Liu, Jiongzhou and Li, Zhan}
booktitle = {41st International Symposium on Reliable Distributed Systems ({SRDS} 2022)},
year = {2022}
}

License

The dataset is distributed under the CC BY-SA 4.0 license.

目录

Dataset Name

Description

Documents

Introduction

Dataset Description

Citation

License