Facebook AI Releases KILT, A New Ideal for NLP

Facebook launched a unified benchmark for performing training, evaluation, and analysis on NLP models for Knowledge-Intensive Language Tasks. Let’s look at how this helps developers and where you can actually use it.

In our daily computer usage, we need to perform a lot of knowledge-intensive tasks, such as fact-checking or question answering which requires access to a wide pool of information. So, to make this task easier, Facebook AI has released a benchmark. It enables the application to process using a large range of data but from the same knowledge source on a universal interface.

The basis of this development is the need to efficiently digitalize almost all everyday tasks which include natural language processing. We have already seen a lot of pre-trained models and general-purpose architectures, but these models work on real-world problems using local information, which may not always be optimal.

For example, fact-checking a claim needs the system to know every shred of knowledge available and gather solid evidence to support or refute the claim. This is what sparks the need for a unified model for knowledge-intensive language tasks.

There are a huge number of datasets that exist already, but they have different formattings, pre-processing methodologies, and require different sets of evaluation and analysis tools. This makes it difficult to determine whether one can use the same knowledge representation again and again if each dataset is linked to a different source.

Additionally, if one decides to use different data sources, various approaches will demand re-indexing and re-encoding the large number of source documents. 

Thus, to facilitate progress and research on modes that require access to particular information from a huge knowledge source, KILT is introduced.

The chief goal of KILT is to lower the entry barrier for research involving knowledge-intensive NLP tasks using a common interface and a unified knowledge base, which is the Wikipedia Snapshot.

This model is based on a collaboration of eleven datasets working together to perform five different tasks. 

So, what can it do and how is it happening?

On analyzing various modern technologies, it was observed that using a hybrid model consisting of a neural retriever with a pre-trained sequence-to-sequence model performs better than others at almost all task-specific solutions when they were trained end-to-end. Additionally, evaluation of whether systems provide provenance to support their output was also performed.

To achieve this, the system boosts the output with provenance in the form of textual excerpts from supporting Wikipedia pages. To determine performance metrics, in addition to evaluating downstream performance, the model formulates variants for those awards points only in situations where systems receive provenance Wikipedia pages to support their output. 

The knowledge source used:

To present a unified view of the knowledge base containing all information necessary to perform all the tasks, it is difficult to define a unified knowledge source.

Despite all tasks using Wikipedia, they are bound to consider different snapshots of a page in different instances because Wikipedia pages are continually modified, removed, added, which leads to a drastically different knowledge every time the model is used. So, this model integrates the following methodology to have a concurrent knowledge base:

  1. Wikipedia representation: 

The KILT knowledge source is represented as a collection of JSON records, where each record is assigned:
i. a unique Wikipedia ID

ii. A unique Wikipedia title

iii. A text field that contains a record of strings, for each paragraph, bulleted list item, and section header

iv. A list of anchor elements, for each hyperlink in the original text, along with the span reference in the text field and the linked page.

v. category list

vi. An URL that redirects to the original page HTML along with the timestamp of the last page revision before the latest snapshot is considered.

  1. Mapping dataset to a fixed snapshot: 

The model processes an input that needs to be given an output along with provenance, as a set of textual spans from Wikipedia pages, to support the output produced.

The basis of the mapping strategy is to extract provenance spans in KILT knowledge-source. So, if we retrieve all the provenance spans for an input-output pair then all the knowledge needed to produce output can be obtained from the snapshot.

To map the dataset, we first try to match Wikipedia pages in each dataset to our snapshot based on Wikipedia URL redirections.

Then, we look for an appropriate provenance in the matched page. Next, the original provenance for the task’s input-output pair is replaced by the new provenance and report the BLEU (Bilingual Evaluation Understudy which is a mechanism to evaluate the quality of machine-translated text from one language to another) score between them.

Finally, all such outputs that have the BLEU score below a specific threshold for at least one provenance span, have their dev and test sets removed.

KILT focuses on five major tasks, namely, fact-checking, slot filling, entity linking, open domain question answering, dialogue. In KILT, the publicly available Wikipedia-based datasets are considered exclusively to merge and unify the knowledge source.

  1. Fact-checking:

For performing this task, the system verifies the claim against a knowledge base of evidence. It demands grave knowledge about the claim and reasoning over a collection of documents. We consider the claim to be as the input and the verification as the output where each label is accompanied by a provenance that supports the output classification level. The dataset used to train the model is FEVER. It is a large dataset used to check claim authenticity that requires retrieving sentence-level evidence to corroborate the classification if supported or refuted. To do this, FEVER clubs multiple pieces of knowledge together to produce a combined output and excludes all claims that do not have enough information because these instances have no information to back the output produced.

  1. Entity Linking

This task aims to assign a unique Wikipedia page to the entities mentioned in the text. The input for this is a text where a single entity mentioned is tagged with tokens to determine the actual keyword out of the whole excerpt. The tokens used are [START_ENT] and [END_ENT] to mark the start and end of the entity. The output is the title to the unique Wikipedia page for that entity along with the provenance pointing to the entire page. The characteristic of Wikipedia, where it assigns a unique identifier to each page is an added benefit of using Wikipedia, because then, finding the correct output is enough to retrieve the desired Wikipedia page. This task uses three datasets: 

  1. AIDA CoNLL-YAGO: This dataset boosts the CoNLL dataset with Wikipedia URL annotation for all entities with the help of the YAGO2 system. 
  2. WNED-WIKI: This dataset is created automatically by sampling the document from the 06/06/2003 Wikipedia dump and then balancing the difficulty of each mention.
  3. WNED-CWEB: Created with the same strategy as WNED-WIKI but samples using the ClueWeb 2012 corpora annotated with the FACC1 system.

3. Slot Filling:

Collect information on a particular relation of entities from a large set of natural language texts. Slot Filling requires the input to be unambiguous and then acquire relational knowledge for that entity. In KILT, the input is modeled as a structured subject-entity relation and the output as a list of valid enough object-entities, each of these accompanied by a provenance where the subject-relation-object fact holds. The training data uses 2 datasets:

  1. Zero Shot RE: This dataset is used to translate relation extraction into a reading comprehension problem. For each relation, it defines a crowd-sourced template question and each data point records a Wikipedia sentence expressing the fact that we then use it as provenance.

    To consider an open-domain version of this dataset and alter the input-output as per KILT interface the reformatting of the dataset is done by (i) excluding the negative pairs (ii) grouping of the template questions according to their subject-relation pairs and creating a single datapoint (iii) randomly splitting the dataset into three disjoint sets; train, test, dev (iv) using the subject entity from the input and querying it against Wikipedia titles for mapping strategy. (v) and by including all the template questions in the meta field.
  2. T-REx: This provides a large-scale gathering of facts aligned to sentences in Wikipedia abstracts through distant supervision. 

4. Open-domain question answering:

   This is the task of producing the appropriate answer for a question without any predefined location for the answer. The question is considered as the input and answer as the output with dataset-specific provenance. The results obtained for this task are based on four datasets:

  1. Natural Question: this is a corpus of real questions queried on Google search engine where each question comes with a Wikipedia page and an annotated long answer and a short answer that are both considered for provenance. 
  2. HotpotQA: This performs multiple hops over various Wikipedia pages to find an answer to each question. For each question-answer pair, a set of supporting sentences are used as provenance. 
  3. TriviaQA: a collection of question-answer-evidence triplets where evidence documents are gathered directly from Wikipedia.
  4. ELI5: also a collection of question-answer-evidence triplets with complex questions and long, explanatory answers that are free-form.

5. Dialogue:

The goal of this is to develop an engaging chatbot that can talk about a wide range of topics with any user, which often has its basis on topical, factual knowledge. The conversation history is considered as input and the next utterance as the output.

Wizard of Wikipedia is the dataset used which has a vast pool of conversation grounded with knowledge extracted from Wikipedia. One of the speakers is required to ground their utterance in a specific knowledge sentence from Wikipedia.

The chosen sentence is then chosen as the provenance for the task. Cases that have no provenance are discarded.

Additionally, the system considers open domain settings where it is not provided with any topic for conversation, and the model is required to search over all Wikipedia pages for knowledge at each turn in dialogue.

How can it help in the advancement of technology?

KILT empowers the research field by allowing the researchers to create general-purpose models and can then evaluate them across a wide variety of sectors, perform knowledge representation without the need to index different large-scale datasets or write new IO routines, and test their hypotheses around task-cynical memory.

Moreover, the KILT library provides fundamental building blocks to ease the research on knowledge-intensive language tasks by providing various state-of-the-art information retrieval systems paired with models that make predictions for different tasks based on the text read in the knowledge source. 


Facebook AI’s release KILT helps solve a wide variety of issues by creating a unifying benchmark to achieve knowledge-intensive NLP tasks using a common interface. This model can be modified as per the requirement to be used in almost all walks of life. It is now up to developers to create wonders using this.


You may also like...

3 Responses

  1. Preetish says:

    Ayoo.. This is amazing!

  2. Pranav says:

    I was looking for an article that would help me with this. Thanks a lott

  1. October 24, 2020

    […] has been releasing a lot of revolutionary models in recent times which include the wav2vec model, KILT, and Dynabench, an addition to this is the M2M-100 […]


Leave a Reply

Your email address will not be published. Required fields are marked *

DMCA.com Protection Status