Re-eval-KGC

A Re-evaluation of Knowledge Graph Completion Methods

2019-11-10 ACL 2020

重新发现目前的KGC方法中存在的问题,提出了一个RANDOM的评估策略。

1 Introduction

最近出现的nn的model for Knowledge Graph Completion(KGC),的效果存在问题:

in ConvKB, there is a 21.8% improvement over ConvE on FB15k-237, but a degradation of 42.3% on WN18RR, which is surprising given the method is claimed to be better than ConvE.

它们在一个数据集(FB15K-237)上取得很好的结果,但是在另外的数据集(WRR18)上的效果反而下降了。本论文就针对这个问题进行了调查。发现是由于它们的评估策略的问题。

3 Observations

经过调查发现,在最后进行评估的时候,部分受到影响的模型如ConvKB,KBAT等,它们会对于很多的negative sample产生和valid triple一样的score。

On average, ConvKB and CapsE have 125 and 278 entities with exactly same score as the valid triplet over the entire evaluation dataset of FB15k-237, whereas ConvE has around 0.002,

在这样的情况下,如果一开始的valid triple是作为评估triple的开头的话,效果就会虚假的高。

4 Evaluation Method

因此,论文就提出了一个评估的策略:RANDOM

RANDOM:

In this, the correct triplet is placed randomly in \(\cal{T^{'}}\) .

其中, \[ \cal{T^{'}} = \{ (h, r, t^{'})\ |\ t^{'} \in \cal{E} \} \]

RANDOM is the best evaluation technique which is both rigorous and fair to the model.