Llaba - 搜索

约 122,000 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › haotian-liu › LLaVA
LLaVA: Large Language and Vision Assistant - GitHub
We release LLaVA Bench for benchmarking open-ended visual chat with results from Bard and Bing-Chat. We also support and verify training with RTX 3090 and RTX A6000. Check out …
zhihu.com
https://zhuanlan.zhihu.com
LLaVA（Large Language and Vision Assistant）大模型 - 知乎
LLaVA（Large Language and Vision Assistant）是一个由威斯康星大学麦迪逊分校、微软研究院和哥伦比亚大学研究者共同发布的多模态大模型。该模型展示出了一些接近多模态 GPT-4 的图文理解能力：相对于 GPT-4 获得了 85.1% 的相对得分。当在科学问答（Science QA）上进行微调时，LLaVA 和 GPT-4 的协同作用实现了 92.53%准确率的新 SoTA。以下是机器之心的试用结果（更多结果见文末）：人类通过视觉和语言等多种渠道与世界交互，因为不同的渠道在代表 …
github.com
https://github.com › microsoft › LLaVA-Med
microsoft/LLaVA-Med - GitHub
[June 1, 2023] 🔥 We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper.
github.com
https://github.com › LLaVA-VL › LLaVA-NeXT
GitHub - LLaVA-VL/LLaVA-NeXT
We are excited to release LLaVA-Video-178K, a high-quality synthetic dataset for video instruction tuning. This dataset includes: Along with this, we’re also releasing the LLaVA-Video 7B/72B models, which deliver competitive performance on the latest video benchmarks, including Video-MME, LongVideoBench, and Dream-1K.
llava-vl.github.io
https://llava-vl.github.io
LLaVA
LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.
arxiv.org
https://arxiv.org › abs
LLaVA-Med: Training a Large Language-and-Vision Assistant for ...
2023年6月1日 · LLaVA-Med exhibits excellent multimodal conversational capability and can follow open-ended instruction to assist with inquiries about a biomedical image. On three standard biomedical visual question answering datasets, LLaVA-Med outperforms previous supervised state-of-the-art on certain metrics.
zhihu.com
https://zhuanlan.zhihu.com
LLaVA-1.5升级：只需训练一天的多模态加持的大模型11个基准上 …
2023年10月8日 · LLaVA (Large Language-and-Vision Assistant) 是一个能够进行视觉和语言多模态转换的模型，由视觉编码器和大型语言模型（Vicuna v1.5 13B）组成。它通过端到端的训练，实现了在视觉推理能力方面的高性能。 2.2 LLaVA 的挑战. 尽管 LLaVA 在视觉推理能力方面展现了卓越的性能，但在一些学术基准测试中，特别是那些需要短格式回答的测试中，其表现相对较低。这一挑战主要源于 LLaVA 没有在大规模数据上进行预训练。具体来说，LLaVA 使用 GPT …
huggingface.co
https://huggingface.co › liuhaotian
liuhaotian/llava-v1.5-13b - Hugging Face
Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, …
csdn.net
https://blog.csdn.net › article › details
多模态大语言模型 LlaVA 论文解读：Visual Instruction Tuning
2023年6月27日 · 通过对此类生成数据进行指令调整，推出了大型语言和视觉助手（Large Language and Vision Assistant， LLaVA）。一种端到端训练的大型多模态模型，连接视觉编码器和 LLM 以实现通用视觉和语言理解。背景等相关方法：大型语言模型（LLM）表明语言可以发挥更广泛的作用：通用助手的通用接口，各种任务指令可以用语言明确表示并指导端到端训练有素的神经助手切换到感兴趣的任务来解决它。例如，最近 ChatGPT 和 GPT-4 的成功证明了对 …
zhihu.com
https://zhuanlan.zhihu.com
【MM-LLM系列】Chinese LLaVA 开源中英文双语视觉-语言多模态 …
2023年8月4日 · 介绍一个用 LLaMA2 中文微调模型作为中文语言模型底座，加上图片理解能力的工作Chinese LLaVA。该工作follow LLaVA的结构使用中文数据做了两阶段训练。第一阶段pretrain from feature alignment，第二阶段end-to-end finetuning。数据集准备. LLaVA针对不同的任务，提出了构建instruction-following data的方法。
分页
- 1
- 2
- 3
- 4
- 5
- 下一页

LLaVA: Large Language and Vision Assistant - GitHub

LLaVA（Large Language and Vision Assistant）大模型 - 知乎

microsoft/LLaVA-Med - GitHub

GitHub - LLaVA-VL/LLaVA-NeXT

LLaVA

LLaVA-Med: Training a Large Language-and-Vision Assistant for ...

LLaVA-1.5升级：只需训练一天的多模态加持的大模型11个基准上 …

liuhaotian/llava-v1.5-13b - Hugging Face

多模态大语言模型 LlaVA 论文解读：Visual Instruction Tuning

【MM-LLM系列】Chinese LLaVA 开源中英文双语视觉-语言多模态 …