天池数据集

Synthetic PAN Card Dataset 合成 PAN 卡数据集

描述

包含 OCR 微调/测试的 PAN 编号、名称和日期

数据列表

  • 数据名称上传日期大小删除下载
  • synthetic-pan-card-dataset.zip2021-03-10239.00MB

文档

Synthetic PAN Card Dataset

Contains PAN numbers, names and dates for OCR finetuning/testing

Overview

PAN is a ten-digit unique alphanumeric number issued by the Income Tax Department of India. PAN is

issued in the form of a laminated plastic card as given below (commonly known as PAN card):

Data

The structure of dataset is divided into 3 parts based of the category of image. Each directory contains images encoded in PNG format with the respective ground truth images as .gt.txt extension files. This structure is chosen as it is the required format for training/finetuning the popular open source OCR software Tesseract using tesstrain.

Directories and files:

PAN numbers containing 17,000 .png image files of cropped numbers and ground truth text files

PAN dates containing 10,954 .png image files of cropped dates and ground truth text files

PAN names containing 14,968 .png image files of cropped names and ground truth text files

简介

PAN 是印度所得税部门颁发的十位数唯一字母数字。PAN

以层压塑料卡的形式发行(通常称为 PAN 卡):

数据

数据集的结构根据图像类别分为3个部分。每个目录都包含以 PNG 格式编码的图像,并分别将地面真相图像编码为 。gt.txt扩展文件。这种结构的选择,因为它是培训/微调流行的开源OCR软件泰瑟拉使用苔丝兰所需的格式。

文件:

包含裁剪数字和地面真相文本文件的 17,000 .png图像文件的 PAN 数字

PAN 日期包含 10,954 .png裁剪日期和地面真相文本文件的图像文件

PAN 名称包含 14,968 个.png裁剪名称和地面真相文本文件的图像文件

目录

Overview

Data

简介

数据