ImageNet is provided by Standford University.本数据集由斯坦福大学提供。
ImageNet
1.Overview
The ImageNet dataset is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently there are an average of over five hundred images per node. It contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection.
Figure1 example photo
2.Description
The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
ILSVRC2017
imagenet_object_localization.tar.gz contains the image data and ground truth for the train and validation sets, and the image data for the test set.
wnid
. These id's look like n00141669. Each image's name has direct correspondance with the annotation file name. For example, the bounding box for n02123394/n02123394_28.xml is n02123394_28.JPEG.LOC_sample_submission.csv is the correct format of the submission file. It contains two columns:
ImageId
: the id of the test image, for example ILSVRC2012_test_00000001
PredictionString
: the prediction string should be a space delimited of 5 integers. For example, 1000 240 170 260 240
means it's label 1000, with a bounding box of coordinates (x_min, y_min, x_max, y_max). We accept up to 5 predictions. For example, if you submit 862 42 24 170 186 862 292 28 430 198 862 168 24 292 190 862 299 238 443 374 862 160 195 294 357 862 3 214 135 356
which contains 6 bounding boxes, we will only take the first 5 into consideration.LOC_train_solution.csv and LOC_val_solution.csv: These information are available in imagenet_object_localization.tar.gz already, but we are providing them in csv
format to be consistent with LOC_sample_submission.csv. Each file contains two columns:
ImageId
: the id of the train/val image, for example n02017213_7894
or ILSVRC2012_val_00048981
PredictionString
: the prediction string is a space delimited of 5 integers. For example, n01978287 240 170 260 240
means it's label n01978287, with a bounding box of coordinates (x_min, y_min, x_max, y_max). Repeated bounding boxes represent multiple boxes in the same image: n04447861 248 177 417 332 n04447861 171 156 251 175 n04447861 24 133 115 254
LOC_synset_mapping.txt: The mapping between the 1000 synset id and their descriptions. For example, Line 1 says n01440764 tench, Tinca tinca
means this is class 1, has a synset id of n01440764
, and it contains the fish tench.
3.Citation
If you have published papers using our dataset, please send to tianchi_open_dataset@alibabacloud.com with the publication URL. We will make statistic about the citation and contact you to send Tianchi gift.
@ARTICLE{Geiger2013IJRR,
author = {Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei},
title = {ImageNet Large Scale Visual Recognition Challenge},
journal = {IJCV},
year = {2015}
}
4.License
The dataset is distributed under the CC BY-NC-SA 4.0 license.