research-papers
Adversarial Prompt Tuning for Vision-Language Models
Original Paper: https://arxiv.org/abs/2311.11261 By: Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang Abstract: With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language