Abstract: Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating distinct modalities to improve the performance or robustness of the model.
Abstract: Audio-visual zero-shot learning (ZSL) leverages both video and audio information for model training, aiming to classify new video categories that were not seen during the training. However, ...
- checkpoints/ - audio-cond_animation/ - avsync15_audio-cond_cfg/ - landscapes_audio-cond_cfg/ - thegreatesthits_audio-cond_cfg/ - avsync/ - vggss_sync_contrast ...