"Owl" Vit - 搜索

约 396,000 个结果

在新选项卡中打开链接

时间不限

huggingface.co
https://huggingface.co › docs › transformers › model_doc › owlvit
OWL-ViT - Hugging Face
OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.
github.com
https://github.com › huggingface › transformers › blob › ...
OWL-ViT - GitHub
OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.
zhihu.com
https://zhuanlan.zhihu.com
47. OWLViT: 开放域目标检测 - 知乎 - 知乎专栏
使用 ViT，在大的图像文本对数据集上进行对比学习 pre-train。删除了最后的 token pooling layer，而将轻量级分类和 bbox 预测投附加到每个 transformer 的输出 token 上. 基于该模型结构也能做 one-shot detection，基于 imagederived embeddings 做 querying。 image-conditioned one-shot 功能是文本条件检测的一个强大扩展，因为它允许检测难以通过文本描述的对象（但很容 …
arxiv.org
https://arxiv.org › abs
[2205.06230] Simple Open-Vocabulary Object Detection with …
2022年5月12日 · In this paper, we propose a strong recipe for transferring image-text models to open-vocabulary object detection. We use a standard Vision Transformer architecture with minimal modifications, contrastive image-text pre-training, and end-to-end detection fine-tuning.
huggingface.co
https://huggingface.co › google
google/owlvit-base-patch32 · Hugging Face
OWL-ViT is a zero-shot text-conditioned object detection model that can be used to query an image with one or multiple text queries. OWL-ViT uses CLIP as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features.
medium.com
https://medium.com › @Mert.A › zero-shot-object...
Zero-Shot Object Detection with OWL-ViT and Huggingface
2024年4月12日 · OWL-ViT (Vision Transformer for Open-World Localisation): Pre-trained on a large dataset of image and text pairs, OWL-ViT learns to bridge the gap between language and vision. Instead...
google.com
https://colab.research.google.com › github › ...
Getting started with Owl-ViT
OWL-ViT is an open-vocabulary object detector. Given an image and one or multiple free-text queries, it finds objects matching the queries in the image. Unlike traditional object...
arxiv.org
https://arxiv.org › pdf
[PDF]
Simple Open-Vocabulary Object Detection with Vision …
In this paper, we propose a strong recipe for transferring image-text models to open-vocabulary object detection. We use a standard Vision Transformer architecture with minimal mod-ifications, contrastive image-text pre-training, and end-to-end detection fine-tuning.
google.com
https://colab.research.google.com › github › google...
OWL-ViT inference playground.ipynb - Colab - Google Colab
OWL-ViT is an open-vocabulary object detector. Given a free-text query, it will find objects matching that query. It can also do one-shot object detection, i.e. detect objects based on a...
shaodongdatamind.github.io
https://shaodongdatamind.github.io › owl-ViT.html
OWL-ViT - Open-Vocabulary Object Detection | Recent Advances …
2025年1月25日 · OWL-ViT offers a simple and effective way to adapt Vision Transformers for open-vocabulary object detection. By leveraging contrastive pretraining, it generalizes to novel objects using text-based queries, making it useful for real-world applications where a predefined object list is impractical.
分页
- 1
- 2
- 3
- 4
- 下一页

OWL-ViT - Hugging Face

OWL-ViT - GitHub

47. OWLViT: 开放域目标检测 - 知乎 - 知乎专栏

[2205.06230] Simple Open-Vocabulary Object Detection with …

google/owlvit-base-patch32 · Hugging Face

Zero-Shot Object Detection with OWL-ViT and Huggingface

Getting started with Owl-ViT

Simple Open-Vocabulary Object Detection with Vision …

OWL-ViT inference playground.ipynb - Colab - Google Colab

OWL-ViT - Open-Vocabulary Object Detection | Recent Advances …