GenRead

发表于 2023-09-15 更新于 2023-10-28 分类于 Paper ， LLM ， QA 本文字数： 3.7k 阅读时长 ≈ 3 分钟

Generate rather than Retrieve: Large Language Models are Strong Context Generators

University of Notre Dame和Microsoft，ICLR 2023，代码。

Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external corpus such as Wikipedia and then predicts an answer conditioned on the retrieved documents. In this paper, we present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextual documents based on a given question, and then reads the generated documents to produce the final answer. Furthermore, we propose a novel clustering-based prompting method that selects distinct prompts, in order to generate diverse documents that cover different perspectives, leading to better recall over acceptable answers. We conduct extensive experiments on three different knowledge-intensive tasks, including open-domain QA, fact checking, and dialogue system. Notably, GenRead achieves 71.6 and 54.4 exact match scores on TriviaQA and WebQ, significantly outperforming the state-of-the-art retrieve-thenread pipeline DPR-FiD by +4.0 and +3.9, without retrieving any documents from any external knowledge source. Lastly, we demonstrate the model performance can be further improved by combining retrieval and generation. Our code and generated documents can be found at https://github.com/wyu97/GenRead.

作者提出了使用LLM生成的question的documents，作为question的background来回答问题，generate-then-read。

阅读全文 »

Increasing-Diver-Acc-Data-Gen-LLM

发表于 2023-09-14 更新于 2023-10-28 分类于 Paper ， LLM ， Data Augment 本文字数： 4.6k 阅读时长 ≈ 4 分钟

Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions

密歇根大学与Microsoft，ACL 2023

Large language models (LLMs) can be used to generate text data for training and evaluating other models. However, creating high-quality datasets with LLMs can be challenging. In this work, we explore human-AI partnerships to facilitate high diversity and accuracy in LLM-based text data generation. We first examine two approaches to diversify text generation: 1) logit suppression, which minimizes the generation of languages that have already been frequently generated, and 2) temperature sampling, which flattens the token sampling probability. We found that diversification approaches can increase data diversity but often at the cost of data accuracy (i.e., text and labels being appropriate for the target domain). To address this issue, we examined two human interventions, 1) label replacement (LR), correcting misaligned labels, and 2) out-of-scope filtering (OOSF), removing instances that are out of the user’s domain of interest or to which no considered label applies. With oracle studies, we found that LR increases the absolute accuracy of models trained with diversified datasets by 14.4%. Moreover, we found that some models trained with data generated with LR interventions outperformed LLM-based few-shot classification. In contrast, OOSF was not effective in increasing model accuracy, implying the need for future work in human-in-the-loop text data generation.

利用LLM生成训练数据，考虑生成数据的多样性与准确性。

阅读全文 »

ENTDA

发表于 2023-09-11 更新于 2023-09-17 分类于 Paper ， NER 本文字数： 3k 阅读时长 ≈ 3 分钟

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

ENTDA，ACL 2023 Findings，清华与阿里达摩

Data augmentation techniques have been used to alleviate the problem of scarce labeled data in various NER tasks (flat, nested, and discontinuous NER tasks). Existing augmentation techniques either manipulate the words in the original text that break the semantic coherence of the text, or exploit generative models that ignore preserving entities in the original text, which impedes the use of augmentation techniques on nested and discontinuous NER tasks. In this work, we propose a novel Entity-toText based data augmentation technique named ENTDA to add, delete, replace or swap entities in the entity list of the original texts, and adopt these augmented entity lists to generate semantically coherent and entity preserving texts for various NER tasks. Furthermore, we introduce a diversity beam search to increase the diversity during the text generation process. Experiments on thirteen NER datasets across three tasks (flat, nested, and discontinuous NER tasks) and two settings (full data and low resource settings) show that ENTDA could bring more performance improvements compared to the baseline augmentation techniques.

基于entity list生成对应的新data

阅读全文 »

LLM-reason

发表于 2023-09-06 更新于 2024-04-29 分类于 Paper ， LLM ， Reason 本文字数： 37k 阅读时长 ≈ 34 分钟

使用LLM的推理方法

使用LLM进行推理的相关论文总结

阅读全文 »

LLM-too-positive-negative-comm-know

发表于 2023-09-05 分类于 Paper ， LLM ， Capacity 本文字数： 4.6k 阅读时长 ≈ 4 分钟

Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

复旦，ACL 2023，代码。

Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as “lions don’t live in the ocean”, is also ubiquitous in the world but rarely mentioned explicitly in the text. What do LLMs know about negative knowledge? This work examines the ability of LLMs to negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question-answering task (QA) to probe LLMs. Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs. Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.

作者主要讨论了LLM对于negative commonsense knowledge在判断和生成两个角度有明显差别的问题。LLM擅长判断某个knowledge是否成立，但是在生成对应的negative knowledge cases的时候又常常发生错误。

阅读全文 »

is-gpt3-good-data-annotator

发表于 2023-09-05 更新于 2023-10-18 分类于 Paper ， LLM ， Capacity 本文字数： 4.6k 阅读时长 ≈ 4 分钟

Is GPT-3 a Good Data Annotator?

南洋理工与阿里达摩，ACL 2023，代码。

Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. Through this analysis, we aim to provide insight into the potential of GPT-3 as a general-purpose data annotator in NLP.

作者探讨了利用GPT-3生成sentiment analysis (SA)，relation extraction (RE)，named entity recognition (NER)和aspect sentiment triplet extraction (ASTE)等任务的数据方法。

阅读全文 »

linear-algorithm-in-ICL

发表于 2023-09-04 更新于 2023-09-05 分类于 LLM ， ICL 本文字数： 5.7k 阅读时长 ≈ 5 分钟

What learning algorithm is in-context learning? Investigations with linear models

ICLR 2023, Google Research and MIT, 地址。

Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples (x, f(x)) presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners’ late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms.

阅读全文 »

LLM-know-what-they-dont-know

发表于 2023-09-03 分类于 Paper ， LLM 本文字数： 2.4k 阅读时长 ≈ 2 分钟

Do Large Language Models Know What They Don’t Know?

复旦大学，ACL 2023 Findings，代码。

Large language models (LLMs) have a wealth of knowledge that allows them to excel in various Natural Language Processing (NLP) tasks. Current research focuses on enhancing their performance within their existing knowledge. Despite their vast knowledge, LLMs are still limited by the amount of information they can accommodate and comprehend. Therefore, the ability to understand their own limitations on the unknows, referred to as self-knowledge, is of paramount importance. This study aims to evaluate LLMs’ self-knowledge by assessing their ability to identify unanswerable or unknowable questions. We introduce an automated methodology to detect uncertainty in the responses of these models, providing a novel measure of their self-knowledge. We further introduce a unique dataset, SelfAware, consisting of unanswerable questions from five diverse categories and their answerable counterparts. Our extensive analysis, involving 20 LLMs including GPT-3, InstructGPT, and LLaMA, discovering an intrinsic capacity for self-knowledge within these models. Moreover, we demonstrate that in-context learning and instruction tuning can further enhance this self-knowledge. Despite this promising insight, our findings also highlight a considerable gap between the capabilities of these models and human proficiency in recognizing the limits of their knowledge.

这篇论文主要讨论了LLM是否知道一个question是否有准确的答案？或者说LLM是否knowing what you don’t know，作者把这种能力成为LLM的self-knowledge。

阅读全文 »

MoRe

发表于 2023-09-01 更新于 2023-09-26 分类于 Paper ， MIE 本文字数： 2.5k 阅读时长 ≈ 2 分钟

作者通过text和image检索在Wikipedia上相关的text信息来辅助多模态信息抽取。

上海科技与阿里达摩，EMNLP 2022，代码。

Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. Most existing efforts largely focused on directly extracting potentially useful information from images (such as pixel-level features, identified objects, and associated captions). However, such extraction processes may not be knowledge aware, resulting in information that may not be highly relevant. In this paper, we propose a novel Multi-modal Retrieval based framework (MoRe). MoRe contains a text retrieval module and an imagebased retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively. Next, the retrieval results are sent to the textual and visual models respectively for predictions. Finally, a Mixture of Experts (MoE) module combines the predictions from the two models to make the final decision. Our experiments show that both our textual model and visual model can achieve state-of-the-art performance on four multi-modal NER datasets and one multimodal RE dataset. With MoE, the model performance can be further improved and our analysis demonstrates the benefits of integrating both textual and visual cues for such tasks.

阅读全文 »

Head-to-Tail-Knowledgeable-LLM

发表于 2023-08-28 分类于 Paper ， LLM ， Knowledge 本文字数： 5.7k 阅读时长 ≈ 5 分钟

Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?

Meta Reality Labs，arXiv 2023-08

Since the recent prosperity of Large Language Models (LLMs), there have been interleaved discussions regarding how to reduce hallucinations from LLM responses, how to increase the factuality of LLMs, and whether Knowledge Graphs (KGs), which store the world knowledge in a symbolic form, will be replaced with LLMs. In this paper, we try to answer these questions from a new angle: How knowledgeable are LLMs?

To answer this question, we constructed Headto-Tail, a benchmark that consists of 18K question-answer (QA) pairs regarding head, torso, and tail facts in terms of popularity. We designed an automated evaluation method and a set of metrics that closely approximate the knowledge an LLM confidently internalizes. Through a comprehensive evaluation of 14 publicly available LLMs, we show that existing LLMs are still far from being perfect in terms of their grasp of factual knowledge, especially for facts of torso-to-tail entities.

这篇工作是探究LLM在记忆knowledge问题上的又一篇工作。与前面的PopQA数据集有点类似，都是分析entity-related knowledge随着entity popularity变化的趋势。这篇工作分析了更多的开源LLM和不同领域下不同popularity的knowledge的回答准确性。

阅读全文 »

Generate rather than Retrieve: Large Language Models are Strong Context Generators

Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

使用LLM的推理方法

Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

Is GPT-3 a Good Data Annotator?

What learning algorithm is in-context learning? Investigations with linear models

Do Large Language Models Know What They Don’t Know?

Named Entity and Relation Extraction with Multi-Modal Retrieval

Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?