Model name: clip_local

About CLIP

CLIP (Contrastive Language-Image Pre-training) is a model that learns visual concepts from natural language supervision. It is a zero-shot learning model that can be used for a wide range of vision and language tasks.

Read more about CLIP on OpenAI's website.

Supported aidb operations

  • encode_text
  • encode_text_batch
  • encode_image
  • encode_image_batch

Supported models

  • openai/clip-vit-base-patch32 (default)

Register the default implementation

SELECT aidb.register_model('my_clip_model', 'clip_local');

There is only one model, the default openai/clip-vit-base-patch32, so we do not need to specify the model in the configuration. No credentials are required for the CLIP model.

Register another model

There are no other model configurations available for the CLIP model.

Model configuration settings

The following configuration settings are available for CLIP models:

  • model - The CLIP model to use. The default is openai/clip-vit-base-patch32 and is the only model available.
  • revision - The revision of the model to use. The default is refs/pr/15. This entry is a reference to the model revision in the HuggingFace repository, and is used to specify the model version to use, in this case this branch.
  • image_size - The size of the image to use. The default is 224.

Model credentials

No credentials are required for the CLIP model.


Could this page be better? Report a problem or suggest an addition!