Exploring CLIP for Real World, Text-based Image Retrieval

NCJ Number

309744

Date Published

September 2023

Author(s)

Manal Sultan; Lia Jacobs; Abby Stylianou; Robert Pless

Length

6 pages

Annotation

In this paper, researchers explore using CLIP for image retrieval.

Abstract

In this paper, researchers consider the ability of CLIP features to support text-driven image retrieval and find that there is a sweet-spot of detail in the text that gives best results and find that words describing the "tone" of a scene (such as messy, dingy) are quite important in maximizing text-image similarity. Traditional image-based queries sometimes misalign with user intentions due to their focus on irrelevant image components. To overcome this, the researchers explore the potential of text-based image retrieval, specifically using Contrastive Language-Image Pretraining (CLIP) models. CLIP models, trained on large datasets of image-caption pairs, offer a promising approach by allowing natural language descriptions for more targeted queries. The authors explore the effectiveness of text-driven image retrieval based on CLIP features by evaluating the image similarity for progressively more detailed queries. (Published Abstract Provided)

Date Published: September 1, 2023

Downloads

HTML

Related Datasets

https://github.com/GWUvision/Hotels-50K

Exploring CLIP for Real World, Text-based Image Retrieval

Downloads

Related Datasets

Related Topics

Similar Publications

Exploring CLIP for Real World, Text-based Image Retrieval

Additional Details

Downloads

Related Datasets

Related Topics

Similar Publications