
Title: CLIP-KD: An Empirical Study of CLIP Model Distillation
2023年7月24日 · CLIP-KD improves student CLIP models consistently over zero-shot ImageNet classification and cross-modal retrieval benchmarks. When using ViT-L/14 pretrained on Laion …
CVPR-2024 | CLIP-KD: CLIP模型知识蒸馏的全面探索 - 知乎
CLIP-KD在零样本的ImageNet分类和跨模态任务上提升了学生CLIP模型的性能。 当使用 Laion-400M数据集 上训练的教师CLIP模型 ViT-L/14 ,CLIP-KD分别在 ViT-B/16 和 ResNet-50 模型 …
【多模态】CLIP-KD: An Empirical Study of CLIP Model Distillation
2024年7月23日 · CLIP(Contrastive Language-Image Pretraining)是一种图像-语言预训练模型,它已经证明了从网络收集的图像-文本数据集学习视觉概念的能力。这篇文章提出了一 …
GitHub - winycg/CLIP-KD: [CVPR-2024] Official implementations of CLIP …
This repository contains the source code of CLIP-KD [CLIP-KD: An Empirical Study of CLIP Model Distillation]. OpenCLIP reads a CSV file with two columns: a path to an image, and a text …
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD improves student CLIP models consistently over zero-shot ImageNet classification and cross-modal retrieval benchmarks. When using ViT-L/14 pretrained on Laion-400M as the …
CLIP-KD/README.md at main · winycg/CLIP-KD - GitHub
This repository contains the source code of CLIP-KD [CLIP-KD: An Empirical Study of CLIP Model Distillation]. OpenCLIP reads a CSV file with two columns: a path to an image, and a text …
We pro-pose several distillation strategies, including relation, fea-ture, gradient and contrastive paradigms, to examine the ef-fectiveness of CLIP-Knowledge Distillation (KD). We show that a …
【276论文泛读】CLIP-KD: An Empirical Study of CLIP Model …
2025年3月22日 · 论文通过 知识蒸馏 技术,设计了多种蒸馏策略,将大型 CLIP 模型的知识迁移到小型模型中。 这些策略包括: 关系蒸馏 (CRD):通过对比分布来对齐教师和学生模型的输 …
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation …
This project extends CLIP for efficient knowledge distillation, by utilizing embeddings as teachers. Typical knowledge distillation frameworks require running forward passes through a teacher …
[2404.06170] CLIP-Embed-KD: Computationally Efficient …
2024年4月9日 · Contrastive Language-Image Pre-training (CLIP) has been shown to improve zero-shot generalization capabilities of language and vision models. In this paper, we extend …