Rather than treating each modality separately, LMs map semantically similar inputs—such as translated sentences—closely together, allowing for generalization across modalities. If an LM is ...