Image-embodied Knowledge Representation Learning

清华大学2017年发表在IJCAI上的paper,IKRL,应该是第一个把图像信息注入到KGE中的方法。

基于TransE的思想,为不同的entity学习一个额外的image embedding,然后image embedding和原来的entity embedding通过\(h+r\approx t\)评估三元组是否成立。

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Imageembodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

阅读全文 »

Multi-Modal Knowledge Graph Construction and Application: A Survey

复旦大学计算机系在2022年出的关于MMKG的综述,主要针对图片和语言组成的MMKG,从construction和application两个方面进行描述。

Recent years have witnessed the resurgence of knowledge engineering which is featured by the fast growth of knowledge graphs. However, most of existing knowledge graphs are represented with pure symbols, which hurts the machine’s capability to understand the real world. The multi-modalization of knowledge graphs is an inevitable key step towards the realization of human-level machine intelligence. The results of this endeavor are Multi-modal Knowledge Graphs (MMKGs). In this survey on MMKGs constructed by texts and images, we first give definitions of MMKGs, followed with the preliminaries on multi-modal tasks and techniques. We then systematically review the challenges, progresses and opportunities on the construction and application of MMKGs respectively, with detailed analyses of the strength and weakness of different solutions. We finalize this survey with open research problems relevant to MMKGs.

阅读全文 »

MMML Tutorial Challenge 6: Quantification

定义:

Empirical and theoretical study to better understand heterogeneity, cross-modal interactions, and the multimodal learning process.

阅读全文 »

MMML Tutorial Challenge 4: Generation

generation的定义是生成raw modality,也就是说应该和input modalities是不同的modality:

Learning a generative process to produce raw modalities that reflects cross-modal interactions, structure, and coherence.

阅读全文 »

MMML Tutorial Challenge 5: Transference

Transference是指对于一个资源可能受限的主modality,使用另外的modality进行辅助。定义:

Transfer knowledge between modalities, usually to help the primary modality which may be noisy or with limited resources

存在两个可能的关键挑战:

阅读全文 »

MMML Tutorial Challenge 3: Reasoning

Reasoning的定义:

Combining knowledge, usually through multiple inferential steps, exploiting multimodal alignment and problem structure.

reasoning的基础是前面的representation和alignment,然后我们才可以考虑如何combine合适的不同模态的信息来得到理想的预测在值。

阅读全文 »

MMML Tutorial Challenge 2: Alignment

Alignment定义:

Identifying and modeling cross-modal connections between all elements of multiple modalities, building from the data structure.

存在三种可能的connection:

equivalence表示两个不同模态的element之间是完全相等的,correspondences表示两个element信息互相补充比如图像和对图像内容的描述,dependencies表示两个element之间存在关系。

阅读全文 »

MMML Tutorial Challenge 1: Representation

Challenge 1 Representation:

Learning representations that reflect cross-modal interactions between individual elements, across different modalities.

Representation challenge有三个sub-challenge,Fusion、Coordination和Fission。

阅读全文 »

CMU MML Tutorial Louis-Philippe Morency

MMML Tutorial: Introduction

多模态介绍

什么是multimodal?

在数学上,我们描述多模态是在概率上有不同的分布趋势。

但是现在,我们大多提到多模态,更多是在指multiple modalities。更准确的说是sensory modalities。

阅读全文 »

Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

SIGIR 2022,代码,Zhejiang University。

作者提出了一种基于Transformer的能够适用于不同多模态知识图谱预测任务的方法,MKGformer。对于不同的预测任务,作者通过定义输入数据和输出数据拥有相同的格式,从而到达不改变模型结构,还能够同时用于不同预测任务;其次,作者提出了一种在text和image模态之间,进行multi-level混合的Transformer结构。

作者在多模态KG补全、多模态关系抽取和多模态命名实体识别三个任务的有监督学习和低资源学习的场景上进行了实验。

Multimodal Knowledge Graphs (MKGs), which organize visualtext factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, different tasks and modalities require changes to the model architecture, and not all images/objects are relevant to text input, which hinders the applicability to diverse real-world scenarios. In this paper, we propose a hybrid transformer with multi-level fusion to address those issues. Specifically, we leverage a hybrid transformer architecture with unified input-output for diverse multimodal knowledge graph completion tasks. Moreover, we propose multi-level fusion, which integrates visual and text representation via coarse-grained prefix-guided interaction and fine-grained correlation-aware fusion modules. We conduct extensive experiments to validate that our MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.

阅读全文 »