Model Maker Visual Laser

Learning Visual Grounding from Generative Vision and Language Model

Abstract: Visual grounding tasks aim to localize image regions based on natural language references. In this work, we ex-plore whether generative VLMs predominantly trained on image-text data could be ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Learning Visual Grounding from Generative Vision and Language Model

Trending now