Multi-Modal Knowledge Graph Construction and Application: A Survey

复旦大学计算机系在2022年出的关于MMKG的综述,主要针对图片和语言组成的MMKG,从construction和application两个方面进行描述。

Recent years have witnessed the resurgence of knowledge engineering which is featured by the fast growth of knowledge graphs. However, most of existing knowledge graphs are represented with pure symbols, which hurts the machine’s capability to understand the real world. The multi-modalization of knowledge graphs is an inevitable key step towards the realization of human-level machine intelligence. The results of this endeavor are Multi-modal Knowledge Graphs (MMKGs). In this survey on MMKGs constructed by texts and images, we first give definitions of MMKGs, followed with the preliminaries on multi-modal tasks and techniques. We then systematically review the challenges, progresses and opportunities on the construction and application of MMKGs respectively, with detailed analyses of the strength and weakness of different solutions. We finalize this survey with open research problems relevant to MMKGs.

阅读全文 »

MMML Tutorial Challenge 6: Quantification

定义:

Empirical and theoretical study to better understand heterogeneity, cross-modal interactions, and the multimodal learning process.

阅读全文 »

MMML Tutorial Challenge 4: Generation

generation的定义是生成raw modality,也就是说应该和input modalities是不同的modality:

Learning a generative process to produce raw modalities that reflects cross-modal interactions, structure, and coherence.

阅读全文 »

MMML Tutorial Challenge 5: Transference

Transference是指对于一个资源可能受限的主modality,使用另外的modality进行辅助。定义:

Transfer knowledge between modalities, usually to help the primary modality which may be noisy or with limited resources

存在两个可能的关键挑战:

阅读全文 »

MMML Tutorial Challenge 3: Reasoning

Reasoning的定义:

Combining knowledge, usually through multiple inferential steps, exploiting multimodal alignment and problem structure.

reasoning的基础是前面的representation和alignment,然后我们才可以考虑如何combine合适的不同模态的信息来得到理想的预测在值。

阅读全文 »

MMML Tutorial Challenge 2: Alignment

Alignment定义:

Identifying and modeling cross-modal connections between all elements of multiple modalities, building from the data structure.

存在三种可能的connection:

equivalence表示两个不同模态的element之间是完全相等的,correspondences表示两个element信息互相补充比如图像和对图像内容的描述,dependencies表示两个element之间存在关系。

阅读全文 »

MMML Tutorial Challenge 1: Representation

Challenge 1 Representation:

Learning representations that reflect cross-modal interactions between individual elements, across different modalities.

Representation challenge有三个sub-challenge,Fusion、Coordination和Fission。

阅读全文 »

CMU MML Tutorial Louis-Philippe Morency

MMML Tutorial: Introduction

多模态介绍

什么是multimodal?

在数学上,我们描述多模态是在概率上有不同的分布趋势。

但是现在,我们大多提到多模态,更多是在指multiple modalities。更准确的说是sensory modalities。

阅读全文 »

Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

SIGIR 2022,代码,Zhejiang University。

作者提出了一种基于Transformer的能够适用于不同多模态知识图谱预测任务的方法,MKGformer。对于不同的预测任务,作者通过定义输入数据和输出数据拥有相同的格式,从而到达不改变模型结构,还能够同时用于不同预测任务;其次,作者提出了一种在text和image模态之间,进行multi-level混合的Transformer结构。

作者在多模态KG补全、多模态关系抽取和多模态命名实体识别三个任务的有监督学习和低资源学习的场景上进行了实验。

Multimodal Knowledge Graphs (MKGs), which organize visualtext factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, different tasks and modalities require changes to the model architecture, and not all images/objects are relevant to text input, which hinders the applicability to diverse real-world scenarios. In this paper, we propose a hybrid transformer with multi-level fusion to address those issues. Specifically, we leverage a hybrid transformer architecture with unified input-output for diverse multimodal knowledge graph completion tasks. Moreover, we propose multi-level fusion, which integrates visual and text representation via coarse-grained prefix-guided interaction and fine-grained correlation-aware fusion modules. We conduct extensive experiments to validate that our MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.

阅读全文 »

Learning to Walk across Time for Interpretable Temporal Knowledge Graph Completion

T-GAP KDD 2021

https://github.com/sharkmir1/T-GAP

Static knowledge graphs (KGs), despite their wide usage in relational reasoning and downstream tasks, fall short of realistic modeling of knowledge and facts that are only temporarily valid. Compared to static knowledge graphs, temporal knowledge graphs (TKGs) inherently reflect the transient nature of real-world knowledge. Naturally, automatic TKG completion has drawn much research interests for a more realistic modeling of relational reasoning. However, most of the existing models for TKG completion extend static KG embeddings that do not fully exploit TKG structure, thus lacking in 1) accounting for temporally relevant events already residing in the local neighborhood of a query, and 2) path-based inference that facilitates multi-hop reasoning and better interpretability. In this paper, we propose T-GAP, a novel model for TKG completion that maximally utilizes both temporal information and graph structure in its encoder and decoder. T-GAP encodes query-specific substructure of TKG by focusing on the temporal displacement between each event and the query timestamp, and performs path-based inference by propagating attention through the graph. Our empirical experiments demonstrate that T-GAP not only achieves superior performance against state-of-the-art baselines, but also competently generalizes to queries with unseen timestamps. Through extensive qualitative analyses, we also show that T-GAP enjoys transparent interpretability, and follows human intuition in its reasoning process.

阅读全文 »