Three questions put to Dr Oliver Karras

diesen Beitrag auf Deutsch lesen

An interview about artificial intelligence, ChatGPT, knowledge graphs, and answering questions about scholarly knowledge

Dr Oliver Karras is a post-doctoral researcher and data scientist in the Data Science and Digital Libraries Research Group at TIB: He is engaged in the ORKG, the Open Research Knowledge Graph; NFDI4ing, the National Research Data Infrastructure for Engineering Sciences; and FAIR Data Spaces, a shared data space for the science and industry communities. His research focuses on how the ORKG can be applied to engineering problems.

Dr Oliver Karras. Photo: TIB/C. Bierwagen

Everyone is talking about ChatGPT. The language model, based on machine learning technology, is already capable of all sorts of things. It can answer questions, hold conversations with users, and offer solutions to problems. But there are some things that ChatGPT cannot (yet) do. Where does ChatGPT fail at present?

Big language models such as ChatGPT generate answers composed of the most likely sequence of words in relation to the words used in the question. There is much debate at present about the extent to which this differs from the way the human brain thinks. However, the lack of understanding of the question and the context often leads to the problem of so-called “hallucinations” in language models.

Hallucinations are texts generated by language models that seem to be plausibly realistic, but are in fact either partially or entirely invented. Hallucinated texts are potentially critical because they purport to convey knowledge that, in the worst case, does not exist, meaning that the answer is incorrect. Language models are also unable to critically question the knowledge generated, to compare it with reality, and to support it with references. Nor can they build up a validated knowledge base and use it to answer further questions. And yet these are all requirements that are particularly important in the academic world.

In collaboration with TIB’s Professor Dr Sören Auer, Dr Markus Stocker and Dr Mohamad Yaser Jaradeh, as well as visiting scientists from L3S Research Centre, you have just published an article on this topic in Scientific Reports, a prestigious open access journal of Nature. What is the paper about?

In the article, we address the problem that current question answering systems and language models are in fact capable of producing good answers to questions about general knowledge, but often fail when it comes to answering questions about scholarly knowledge. For our research, we initially developed a dataset consisting essentially of 100 manually generated and 2,465 machine-generated questions about scholarly knowledge and their corresponding answers. Based on this dataset, we examined how well an existing question answering system (JarvisQA) and one of the currently most popular language models (ChatGPT) can correctly answer the 100 questions we generated manually. JarvisQA was able to answer 52 of the 100 questions, but only 12 of the answers were correct. Although ChatGPT was able to provide answers for 63 of the 100 questions, only 14 of them were actually correct. So if we consider only the ratio of correct answers to answers given, both systems answered a mere 23 per cent of the questions answered correctly, and only answered about 14 per cent of the total 100 questions at all.

Our study highlights how challenging it is for current question answering systems and language models to answer questions about scholarly knowledge. However, we also make our dataset publicly available. This publication is to enable the long-term development of question answering systems and language models that are perfectly capable of answering such questions about scholarly knowledge correctly.

So ChatGPT performs poorly when answering scholarly questions. Why is that the case, and how can the ORKG help produce better answers?

First of all, knowledge has to be presented to the computer (including also question answering systems and language models such as ChatGPT) in an understandable structure. These structures are comparatively simple in the case of general knowledge, but quickly become very complex when it comes to scholarly knowledge. Contextual understanding is required to properly understand these complex structures. As I explained at the beginning, language models are currently incapable of contextual understanding, which is why ChatGPT performs poorly when answering scholarly questions.

In this context, the ORKG can help provide better answers because it is designed to capture scholarly knowledge in a structure that is understandable to the computer. In addition, the scholarly knowledge in the ORKG is freely accessible to all via different interfaces. This accessibility means that the scholarly knowledge can be combined, for example, with metadata on contextual entities from other scholarly infrastructures. Such a combination of scholarly knowledge and contextual entities enables question answering systems and language models to answer questions about scholarly knowledge correctly in the broadest sense, since not only the scholarly knowledge is available itself, but also the necessary context to process it correctly.

The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge

Auer, S., Barone, D.A.C., Bartz, C. et al. The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge. Sci Rep 13, 7240 (2023).