Abstract: Understanding human interactions is extremely crucial in various applications, including robotics, automated systems, human-computer interaction, and video surveillance. Many studies have ...
Synaesthesia is a perceptual condition where one sense triggers an experience in another sense. For some people, sounds ...
Abstract: Text-based Visual Question Answering (TextVQA) focuses on answering questions about the scene text in images. Most works in this field uses transformer based models to modeling the ...