Task1 : End-to-End Video Text Spotting
The objective of this task is to assess end-to-end system performance of video text spotting. It requires models to localize, track and recognize words simultaneously. In Task 1, the Normalized Edit Distance will be treated as the official ranking metric while the results of other metrics will be published for reference only.
Task2 : Video Text Question Answering
This task is the most generic and challenging one, since it requires the participants to combine video text spotting and video question answering technologies. The submitted methods for this task should be able to provide correct answers for the given questions by reading, tracking and comprehending all text instances in videos.