机器学习中的Sigmoid、Softmax与entropy

这篇文章期望总结与讨论机器学习中常见的sigmoid、softmax函数与entropy熵。

参考资料:

  1. 熵,维基百科
  2. sigmoid函数推导,知乎
  3. 一文详解Softmax函数,知乎
  4. S型函数,维基百科
  5. 信息熵越大,信息量到底是越大还是越小?,知乎
  6. softmax和cross-entropy是什么关系?

总结:

  1. sigmoid可以看做是神经网络输出\([p,0]\)的softmax变形\([e^x/(e^x+1), 1/(e^x+e^0)]\),只不过由于对应标签1的概率\(p\)是我们的期望值,另外一个0不做过多讨论。
  2. softmax+交叉熵基本是绑定的,这是因为会使得loss的计算和求导都更简单。
  3. 我们经常使用交叉熵,是因为它作为KL散度的核心变化部分,能够衡量输出分布和真实分布之间的差异。
  4. 使用softmax而不是hardmax的目的是期望能够让模型从不同类的预测值上获得更多的梯度。
阅读全文 »

Image-embodied Knowledge Representation Learning

清华大学2017年发表在IJCAI上的paper,IKRL,应该是第一个把图像信息注入到KGE中的方法。

基于TransE的思想,为不同的entity学习一个额外的image embedding,然后image embedding和原来的entity embedding通过\(h+r\approx t\)评估三元组是否成立。

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Imageembodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

阅读全文 »

Multi-Modal Knowledge Graph Construction and Application: A Survey

复旦大学计算机系在2022年出的关于MMKG的综述,主要针对图片和语言组成的MMKG,从construction和application两个方面进行描述。

Recent years have witnessed the resurgence of knowledge engineering which is featured by the fast growth of knowledge graphs. However, most of existing knowledge graphs are represented with pure symbols, which hurts the machine’s capability to understand the real world. The multi-modalization of knowledge graphs is an inevitable key step towards the realization of human-level machine intelligence. The results of this endeavor are Multi-modal Knowledge Graphs (MMKGs). In this survey on MMKGs constructed by texts and images, we first give definitions of MMKGs, followed with the preliminaries on multi-modal tasks and techniques. We then systematically review the challenges, progresses and opportunities on the construction and application of MMKGs respectively, with detailed analyses of the strength and weakness of different solutions. We finalize this survey with open research problems relevant to MMKGs.

阅读全文 »

MMML Tutorial Challenge 6: Quantification

定义:

Empirical and theoretical study to better understand heterogeneity, cross-modal interactions, and the multimodal learning process.

阅读全文 »

MMML Tutorial Challenge 4: Generation

generation的定义是生成raw modality,也就是说应该和input modalities是不同的modality:

Learning a generative process to produce raw modalities that reflects cross-modal interactions, structure, and coherence.

阅读全文 »

MMML Tutorial Challenge 5: Transference

Transference是指对于一个资源可能受限的主modality,使用另外的modality进行辅助。定义:

Transfer knowledge between modalities, usually to help the primary modality which may be noisy or with limited resources

存在两个可能的关键挑战:

阅读全文 »

MMML Tutorial Challenge 3: Reasoning

Reasoning的定义:

Combining knowledge, usually through multiple inferential steps, exploiting multimodal alignment and problem structure.

reasoning的基础是前面的representation和alignment,然后我们才可以考虑如何combine合适的不同模态的信息来得到理想的预测在值。

阅读全文 »

MMML Tutorial Challenge 2: Alignment

Alignment定义:

Identifying and modeling cross-modal connections between all elements of multiple modalities, building from the data structure.

存在三种可能的connection:

equivalence表示两个不同模态的element之间是完全相等的,correspondences表示两个element信息互相补充比如图像和对图像内容的描述,dependencies表示两个element之间存在关系。

阅读全文 »

MMML Tutorial Challenge 1: Representation

Challenge 1 Representation:

Learning representations that reflect cross-modal interactions between individual elements, across different modalities.

Representation challenge有三个sub-challenge,Fusion、Coordination和Fission。

阅读全文 »

CMU MML Tutorial Louis-Philippe Morency

MMML Tutorial: Introduction

多模态介绍

什么是multimodal?

在数学上,我们描述多模态是在概率上有不同的分布趋势。

但是现在,我们大多提到多模态,更多是在指multiple modalities。更准确的说是sensory modalities。

阅读全文 »