The ICDAR 2023 BDVT-QA Competition (Competition on Born Digital Video Text Question Answering) is coming. Textual information plays an important role in video understanding, as text instances are either direct indicators of scenes or lingual cues about ongoing stories. Numerous works have been proposed for related tasks such as video text recognition and image text QA. In this competition, we would like to go one step forward to explore the video text QA problem, which requires a holistic, precise and in-depth understanding of text information across space and time over video frames. Typical video text related applications include navigation in advanced driver assistance system, assistive shopping on the live stream, and conversation understanding in drama. Though widely used, video text has been rarely explored because of its challenging factors such as arbitrarily-shaped text trajectories, animation of text’s presentation and long-term text language processing. To arouse interest in tackling these challenges, we come up with a novel task of question answering by reading text in born digital videos, which are widely spread on the Internet. We refer to this problem as Born Digital Video Text QA.