
Title: CLIP-KD: An Empirical Study of CLIP Model Distillation
2023年7月24日 · CLIP-KD improves student CLIP models consistently over zero-shot ImageNet classification and cross-modal retrieval benchmarks. When using ViT-L/14 pretrained on Laion-400M as the teacher, CLIP-KD achieves 57.5\% and 55.4\% zero-shot top-1 ImageNet accuracy over ViT-B/16 and ResNet-50, surpassing the original CLIP without KD by 20.5\% and 20.1 ...
CVPR-2024 | CLIP-KD: CLIP模型知识蒸馏的全面探索 - 知乎
CLIP-KD在零样本的ImageNet分类和跨模态任务上提升了学生CLIP模型的性能。 当使用 Laion-400M数据集 上训练的教师CLIP模型 ViT-L/14 ,CLIP-KD分别在 ViT-B/16 和 ResNet-50 模型上获得了57.5%和55.4%的零样本top-1 ImageNet分类准确率提升,相比原始CLIP模型得到 …
【多模态】CLIP-KD: An Empirical Study of CLIP Model Distillation
2024年7月23日 · CLIP(Contrastive Language-Image Pretraining)是一种图像-语言预训练模型,它已经证明了从网络收集的图像-文本数据集学习视觉概念的能力。这篇文章提出了一个CLIP4ClipCLIP For video Clip retrieval)模型用来将CLIP模型中的知识以一种端到端的形式迁移到视频-语言检索中。
GitHub - winycg/CLIP-KD: [CVPR-2024] Official implementations of CLIP …
This repository contains the source code of CLIP-KD [CLIP-KD: An Empirical Study of CLIP Model Distillation]. OpenCLIP reads a CSV file with two columns: a path to an image, and a text caption. The names of the columns are passed as an argument to main.py. The script src/data/gather_cc.py will collect the Conceptual Captions 3M images.
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD improves student CLIP models consistently over zero-shot ImageNet classification and cross-modal retrieval benchmarks. When using ViT-L/14 pretrained on Laion-400M as the teacher CLIP-KD achieves 57.5% and 55.4% zero-shot top-1 ImageNet accuracy over ViT-B/16 and ResNet-50 surpassing the original CLIP without KD by 20.5% and 20.1% ...
CLIP-KD/README.md at main · winycg/CLIP-KD - GitHub
This repository contains the source code of CLIP-KD [CLIP-KD: An Empirical Study of CLIP Model Distillation]. OpenCLIP reads a CSV file with two columns: a path to an image, and a text caption. The names of the columns are passed as an argument to main.py. The script src/data/gather_cc.py will collect the Conceptual Captions 3M images.
We pro-pose several distillation strategies, including relation, fea-ture, gradient and contrastive paradigms, to examine the ef-fectiveness of CLIP-Knowledge Distillation (KD). We show that a simple feature mimicry with Mean Squared Error loss works surprisingly well.
【276论文泛读】CLIP-KD: An Empirical Study of CLIP Model …
2025年3月22日 · 论文通过 知识蒸馏 技术,设计了多种蒸馏策略,将大型 CLIP 模型的知识迁移到小型模型中。 这些策略包括: 关系蒸馏 (CRD):通过对比分布来对齐教师和学生模型的输出。 特征蒸馏 (FD):通过均方误差(MSE)损失直接对齐教师和学生的特征嵌入。 掩码特征蒸馏 (MFD):在学生模型输入中引入掩码,利用教师模型的特征来恢复掩码区域。 梯度蒸馏(GD):通过 MSE 损失对齐教师和学生模型的梯度信息。 交互式对比学习(ICL):通过交 …
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation …
This project extends CLIP for efficient knowledge distillation, by utilizing embeddings as teachers. Typical knowledge distillation frameworks require running forward passes through a teacher model, which is often prohibitive in the case of billion or trillion parameter teachers.
[2404.06170] CLIP-Embed-KD: Computationally Efficient …
2024年4月9日 · Contrastive Language-Image Pre-training (CLIP) has been shown to improve zero-shot generalization capabilities of language and vision models. In this paper, we extend CLIP for efficient knowledge distillation, by utilizing embeddings as teachers.