1st International Workshop on Deep Learning
for Document Analysis and Recognition

ICPR 2018, Beijing, China, August 20th, 14:00-18:00

  • Home
  • Invited Speakers
  • People
  • Program


The technology of document analysis and recognition (DAR) aims to automatically extract information from document images and handwriting by analyzing the structure and textual contents. It has tremendous applications such as digitization of books and financial notes and information extraction from Web document images. Recognizing text from images, known as Optical Character Recognition (OCR) is the core task of DAR. Recently, OCR has achieved a great success in both scientific research and practical application for different scenes. A traditional OCR system is heavily pipelined, with hand-designed and highly-tuned modules, usually composed of line extraction, word detection, letter segmentation, and then applying different techniques to each piece of a character to figure out what the character is. Nowadays, we have entered a new era of big data, which offers both opportunities and challenges to the field of OCR and DAR. We should seek new OCR and DAR methods to be adaptive to big data, and also push forward new OCR and DAR applications benefited from big data.

Deep learning, which is considered as one of the most significant breakthrough in recent pattern recognition and computer vision fields, has greatly affected these fields and achieved impressive progress in both academy and industry. Currently, deep learning is widely accepted as an effective OCR solution, which first learns to detect text lines or words from images, then recognize the sequence of characters directly from extracted text lines or words. The hand-built and highly tuned modules are avoided in the deep learning-based OCR system. It is expected that the development of deep learning theories and applications would further influence the field of OCR and DAR.

Duguang is an OCR cloud product built by the Vision and Beauty team of Alibaba Group. Over the years, we have been integrating the frontier technology and industry experience, and developing the technological architecture that can be applied to multiple industrial applications, forming a complete technical system of image detection, recognition and understanding. Relying on this, Duguang plays an important role in the field of picture management, search, and intelligent audit within the Alibaba group. Duguang cloud products present high-quality and efficient recognition services, including document images, network pictures, form pictures and card pictures. The professional OCR solutions have been provided for the government, justice, finance and other related industries.

We organize this workshop to provide a forum for highlighting the current research, and discussing some future trends on deep learning for OCR and DAR.


1Deep learning for character and text recognition

2Deep learning for scene text detection and recognition

3­Deep learning for document image processing and segmentation

4­Deep learning for layout analysis

5­Deep learning for writer identification and signature analysis

6­Deep learning for document retrieval

7­Deep learning for context modeling

8­Deep learning for graphics and symbol recognition

9­Deep learning for other DAR tasks

Simone Marinai, University of Florence

Simone Marinai received the Master Degree in Electronic Engineering in 1992 and the PhD Degree in Computer Engineering in 1996 both from the University of Florence, Italy. In 1996 he has been visiting researcher at CENPARMI, Concordia University, Montréal. Currently he is Associate Professor of Computer Engineering at the Information Engineering department of the University of Florence. He is also visiting researcher of Osaka Prefecture University (Japan) in the Institute of Document Analysis and Knowledge Science.
His main research interests are in Artificial Intelligence and Pattern Recognition with a special focus on applications in Document Engineering and Document Image Analysis.
Prof. Marinai is President of the International Association for Pattern Recognition (IAPR), editor in chief of the International Journal on Document Analysis and Recognition (IJDAR) and of the Electronic Letters on Computer Vision and Image Analysis (ELCVIA) journal; has been 2nd vice-president of IAPR (2014-16) and Chair of the Conferences and Meetings (C&M) committee of IAPR (2008-2014); Past chair of the IAPR Technical Committee on Neural Networks and Computational Intelligence (TC3) (2004-2008); He is co-editor of the book "Machine Learning in Document Analysis and Recognition" published by Springer Verlag in 2008. He is author of more than 60 peer-reviewed publications and editor of four volumes.

Lianwen Jin, South China University of Technology

Lianwen Jin received the B.S. degree from the University of Science and Technology of China, Anhui, China, and the Ph.D. degree from the South China University of Technology, Guangzhou, China, in 1991 and 1996, respectively. He is a professor in the College of Electronic and Information Engineering at the South China University of Technology. His research interests include handwriting analysis and recognition, optical character recognition, scene text detection and recognition, deep learning, and intelligent systems. He has received the New Century Excellent Talent Program of MOE China Award and the Guangdong Pearl River Distinguished Professor Award. He has authored over 100 scientific papers which were published in peer-reviewed journals such as IEEE TPAMI, IEEE TNNLS, IEEE TCYBS, IEEE TCSVT, TII, IEEE TMM, IEEE TITS, Pattern Recognition, Neurocomputing, Pattern Recognition Letter, International Journal on Document Analysis and Recognition, et.al, and in main-stream international conferences including ICPR, ICDAR, ICFHR, CVPR, AAAI, IJCAI et.al.

Yongpan Wang, Alibaba Group

Yongpan Wang is a senior algorithm specialist from Alibaba Group. She joined Alibaba's "Image and Beauty" team in 2010. Founded in 2009, the "Image and Beauty" team is the longest-running image group in Alibaba, with expertise in hundreds of billions of image searches, text recognition, intelligent matching of clothing. Her main research is on text recognition and text understanding. She is now responsible for Alibaba's OCR cloud product “Duguang”, which plays an important role in the field of picture management, search, intelligent auditing in Alibaba Group. She is committed to providing high quality and efficient document images, web images, form images and card license image recognition services, providing professional OCR solutions for the government, justice, finance, automation and other industries.

C. V. Jawahar, IIIT Hyderabad

C. V. Jawahar is the Amazon Chair Professor at IIIT Hyderabad, India. He received his PhD from IIT Kharagpur and has been with IIIT Hyderabad since 2000. At IIIT Hyderabad, Jawahar leads a group focusing on computer vision, machine learning, document analysis and multimedia systems. In the recent years, he has been looking into a set of problems that overlap with vision, language and text. He is also interested in applications in road safety, assistive technologies, healthcare, education, cultural heritage and entertainment. He has served as a chair for previous editions of ACCV, WACV, IJCAI, ICDAR and ICVGIP. Presently, he is an area editor of CVIU and an associate editor of IEEE PAMI. He is also a program co-chair of ACCV 2018.

Weilin Huang, Malong Technologies

Dr. Weilin Huang is Chief Scientist of Malong Technologies. He was working as a postdoc researcher with Prof. Andrew Zisserman and Prof. Alison Noble in Visual Geometry Group (VGG), University of Oxford. He was an Assistant Professor with the Chinese Academy of Science. He received his Ph.D. degree from the University of Manchester, U.K. His research interests include scene text detection/recognition, large-scale image classification and medical image analysis. He has served as a PC Member or Reviewer for main computer vision conferences, including ICCV, CVPR, ECCV, MICCAI and AAAI. His team was the first runner-up at the ImageNet 2015 on scene recognition, and was the winner of WebVision Challenge in CVPR 2017.

Huasha Zhao, Alibaba Group

Huasha Zhao is a Staff Research Scientist and Tech Lead at Alibaba Group. He started and oversees the research and development of Alibaba Information Extraction platform, which serves a wide variety of customers both within and outside the company. Prior to Alibaba, he has received his MS and PhD from Electrical Engineering and Computer Sciences at University of California, Berkeley and B.Eng in Electrical Engineering from Tsinghua University. His research interest lies in natural language processing and machine learning in general. He served as PC members in top machine learning conferences/journals, and recently helped organize the E-Commerce workshop at SIGIR in both 2017 and 2018.


Yongpan Wang, Alibaba Group, China

­Xiang Bai, Huazhong University of Science and Technology, China

­Cheng-Lin Liu, Institute of Automation of Chinese Academy of Sciences, China

Program Committee

C. V. Jawahar (IIIT Hyderabad)

Dimosthenis Karatzas (Universitat Autónoma de Barcelona)

Lianwen Jin (South China University of Technology)

Xiang Bai (Huazhong University of Science and Technology)

Shijian Lu (Nanyang Technological University)

Cheng-Lin Liu (Institute of Automation of Chinese Academy of Sciences)

Weilin Huang (Malong Technologies)

114:10-14:30 Presentation by Simone Marinai

214:30-14:50 Presentation by Lianwen Jin

314:50-15:10 Presentation by Yongpan Wang

415:10-15:30 Presentation by C V Jawahar

515:30-16:00 Coffee Break

616:00-16:20 Presentation by Weilin Huang

716:20-16:40 Presentation by Huasha Zhao

816:40-17:40 Panel