天池数据集

ImageNet

描述

ImageNet is provided by Standford University.本数据集由斯坦福大学提供。

数据列表

  • 数据名称上传日期大小删除下载
  • imagenet_ILSVRC2017_datasets.zip2021-02-287.05MB
  • imagenet_object_localization_patched2019 (1).tar.gz2021-03-12154.61GB

文档

ImageNet
1.Overview
The ImageNet dataset is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently there are an average of over five hundred images per node. It contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection.
enter image description here
Figure1 example photo

2.Description
The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

  • Total number of non-empty WordNet synsets: 21841
  • Total number of images: 14197122
  • Number of images with bounding box annotations: 1,034,908
  • Number of synsets with SIFT features: 1000
  • Number of images with SIFT features: 1.2 million

ILSVRC2017

  • imagenet_object_localization.tar.gz contains the image data and ground truth for the train and validation sets, and the image data for the test set.

    • The image annotations are saved in XML files in PASCAL VOC format. Users can parse the annotations using the PASCAL Development Toolkit.
    • Annotations are ordered by their synsets (for example, "Persian cat", "mountain bike", or "hot dog") as their wnid. These id's look like n00141669. Each image's name has direct correspondance with the annotation file name. For example, the bounding box for n02123394/n02123394_28.xml is n02123394_28.JPEG.
    • You can download all the bounding boxes of a particular synset here.
    • The training images are under the folders with the names of their synsets. The validation images are all in the same folder. The test images are also all in the same folder.
    • ImageSet folder contains text files specifying lists of images for the main localization task.
  • LOC_sample_submission.csv is the correct format of the submission file. It contains two columns:

    • ImageId: the id of the test image, for example ILSVRC2012_test_00000001
    • PredictionString: the prediction string should be a space delimited of 5 integers. For example, 1000 240 170 260 240 means it's label 1000, with a bounding box of coordinates (x_min, y_min, x_max, y_max). We accept up to 5 predictions. For example, if you submit 862 42 24 170 186 862 292 28 430 198 862 168 24 292 190 862 299 238 443 374 862 160 195 294 357 862 3 214 135 356 which contains 6 bounding boxes, we will only take the first 5 into consideration.
  • LOC_train_solution.csv and LOC_val_solution.csv: These information are available in imagenet_object_localization.tar.gz already, but we are providing them in csv format to be consistent with LOC_sample_submission.csv. Each file contains two columns:

    • ImageId: the id of the train/val image, for example n02017213_7894 or ILSVRC2012_val_00048981
    • PredictionString: the prediction string is a space delimited of 5 integers. For example, n01978287 240 170 260 240 means it's label n01978287, with a bounding box of coordinates (x_min, y_min, x_max, y_max). Repeated bounding boxes represent multiple boxes in the same image: n04447861 248 177 417 332 n04447861 171 156 251 175 n04447861 24 133 115 254
  • LOC_synset_mapping.txt: The mapping between the 1000 synset id and their descriptions. For example, Line 1 says n01440764 tench, Tinca tinca means this is class 1, has a synset id of n01440764, and it contains the fish tench.

3.Citation
If you have published papers using our dataset, please send to tianchi_open_dataset@alibabacloud.com with the publication URL. We will make statistic about the citation and contact you to send Tianchi gift.

@ARTICLE{Geiger2013IJRR,  
        author = {Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei},  
        title = {ImageNet Large Scale Visual Recognition Challenge},  
        journal = {IJCV},  
        year = {2015}  
}

4.License

The dataset is distributed under the CC BY-NC-SA 4.0 license.

目录