Introduction

ChatGPT has become something of an academic buzzword, with some touting its helpfulness in research tasks and others skeptical of its abilities as well as its risks in our informational world. Libraries have a key stake in this conversation, as often our institutions stand at the intersection of approaches to learning and research. Through rigorous research and conversations on ChatGPT we have sought to elucidate a clearer picture about its inherent abilities for ourselves that go beyond the posturing of either world transforming or world ending effects. With the rapidly developing pace of technology it can be hard to foster and maintain a steady understanding of these systems in order to utilize them accurately. Here at Gottesman Libraries, we are exploring ChatGPT and other AI through this lens to try to evaluate its real use in research, library services, and beyond.

ChatGPT is a Natural Language Processing Model developed by a company called OpenAI. Natural Language Processing (NLP) is a type of machine learning that has the goal of interpreting and replicating natural speech patterns by calculating the most probabilistic ordered response in any given situation. It does this by ingesting a vast collection of language expressions drawn mostly from the internet, from wikipedia, digitized books, newspaper collections, forums and more–this is often called its training data. The specifics of its training data is hidden from the public. Its knowledge and ability is then refined by a process called Reinforcement Learning from Human Feedback (RLHF)–a way that humans can direct ChatGPT towards useful and natural sounding responses by demonstrating preference in the model (Santhosh, 2023). The complex process of RLHF is part of the reason why ChatGPT sounds so natural in its responses. With its large corpus of training data and RLHF, ChatGPT runs inputs through complex algorithms to generate natural sounding responses that follow the most probable order of a sentence, paragraph, or other speech form.

Although ChatGPT is high profile now, Artificial Intelligence has slowly been seeping into our digital world for some time. One common example of this is the case of predictive text. You may notice this across your emails, texts, and documents– grayed out auto-suggestions for a probable next word that you can then accept into your sentence. While you are writing an email that starts with “to whom it may” you will most likely get the autosuggestion of “concern” as the natural next word in this sequence. Sometimes the suggestions are correct, and sometimes they are completely off base. This works on a basic level in the same way that ChatGPT does, except instead of a question/answer format, these suggestions come to us more simply and integrated into our current flow of communication.

Contrary to how it may seem when we observe its output, an LM [learning model] is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

Just as auto-suggestions can be wrong in a given context, so can ChatGPT. A good example of when ChatGPT can miss the mark is in a Twitter thread in which a researcher asks the bot “What is the most cited economics paper of all time” (David Smerdon on Twitter, 2023). The model creates a response that could very likely be the most cited economics paper, but in reality the article it names does not exist. This is because the model is operating off patterns of language within a huge corpus of text, not the actual information contained in those websites. According to a group of researchers discussing the dangers of models like this, “contrary to how it may seem when we observe its output, an LM [learning model] is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.” (Bender et al., 2021).

Because of its refined ability to mimic our speech patterns, it is easy for users to assume ChatGPT is understanding our questions and giving meaningful answers, but that is not the case. The authors from above point to this key difference between ChatGPT speak and our typical human conversations: while the way ChatGPT responds may very well seem like it semantically interprets our meaning, it is not performing Natural Language Understanding (NLU), a task that has yet to be achieved through artificial intelligence due to the complexity and variability of human understanding. This prevents it from differentiating between fact and fiction sometimes, a type of processing issue that engineers in Machine Learning call hallucination. With the release of GPT-4, the underlying model behind ChatGPT, OpenAI has lessened the models tendency towards hallucination, but the possibility of incorrect information still exists.

In an education context, ChatGPT has been making news for its ability to pass graduate exams as well as a variety of standardized tests. These stories have brought up long standing conversations about if standardized testing truly accounts for knowledge. Rather than supporting ChatGPT’s ability, it more accurately points to the crisis in education on how to measure understanding. ChatGPT does best on tests with clear and delineated answers and struggles with cases in which it has to present an argument or create connections, such as its poor performance in AP Lit and Language exams (OpenAI, 2023).

Of course there is more complexity to the model than I am able to understand, or even anyone is able to understand besides the team at OpenAI, given they keep most of their proprietary data and algorithms behind closed doors. This is common for most AI as usage of black box algorithms proliferates in the field (Rudin & Radin, 2019). However, having a basic understanding of how NLP works can help direct us to a more informed understanding of AI that leads us away from simple dichotomies of simplifying or sinister. Ted Chiang at the New Yorker compares ChatGPT to a blurry JPEG of the web, and asks “ what use is there in having something that rephrases the Web?” (Chiang, 2023). Below are a couple guesses on that utility in a research context and also considerations of the general drawbacks of ChatGPT.

Possible Uses of ChatGPT

Common Knowledge Questions

If you need a general background in a certain field of study, ChatGPT can usually give an accurate approximation of this, just as reading a wikipedia article might (a similarity borne out of the fact that wikipedia is a key part of its training data). In this case, the benefit of ChatGPT comes from its ability to elaborate based on further prompting. If you don’t understand a certain aspect of its response, you can ask it to rephrase or elaborate. A common use of this is the “Explain like I am 5 years old” trend: ChatGPT can explain various concepts with different levels of complexity.

Phrasing and Understanding Language Expression

If you are having trouble drafting an email, text message, or other correspondence and you need inspiration, you can use chatGPT to get an approximation of a natural way to phrase it. Consider the large amount of web pages ChatGPT is trained on: if you were having trouble drafting an email to a colleague about a project, you might try googling examples of this type of email and reading a few. ChatGPT has access to these examples you might read and many many more– basically all the different variants and examples on the web. It has the ability to then identify patterns, approximate the most common expressions, and synthesize these into one example that it can print out in a matter of seconds. This is simplifying the process, but its ability to present common language expressions can be helpful in a variety of contexts.

Brainstorming

Because you can prompt it towards certain styles, expressions, or topics, the model could be an efficient brainstorming tool, especially when it comes to generating your research question and search terms. For example, you might ask ChatGPT for alternate terms for your keywords and different ways to phrase your research question. Brainstorming is a way in which you can experiment with the abilities and limitations of ChatGPT, keeping the context above in mind on the bots' status as an NLP model.

Drawbacks of ChatGPT

Graduate Level Research Inability

Because it generates responses based on the most probable sequence of a response, ChatGPT is not useful in researching or gaining insights in obscure or specific research topics that have nuance to their expression–topics that tend to be common in graduate level research. Even in classes that cover more commonly studied topics, you will often be tasked with demonstrating unique opinions or perspectives on a topic. A stochastic parrot like ChatGPT can only complexly mimic and not create true insights or demonstrate understanding in the way that humans can. If a certain combination of topics does not exist in its training data, then it won’t exist in ChatGPT’s output. However, because the model’s parameters values complete responses, it will still attempt to give an approximation, as in the case of the fake economics paper above, so it cannot be relied upon to differentiate between what it can and cannot do for itself.

References and Citing Sources

Even if you do gain information through ChatGPT on a more general topic, it will be difficult to verify that information due to ChatGPT’s inability to cite the sources it draws from. It makes sense that the model would not be able to provide accurate citations, as its algorithms draw from a huge amount of varied data to create a conglomeration of information settled into a nice tidy output that ChatGPT writes. Therefore even a simple sentence might come from so many sources that the citation we can imagine couldn’t even fit on a single page, if it could exist in the first place. If you are looking for information that goes beyond general knowledge, in order to be certain of its veracity you would have to verify it by other means. At that point, you may as well begin the search for that information elsewhere.

Societal Cost

Feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy.

Despite efforts by OpenAI to prevent harmful responses from ChatGPT, it still takes in both the implicit and explicit bias that the majority of the internet contains. Asking OpenAI to remove anything harmful from ChatGPT would be like asking to remove anything harmful from the internet: “feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy” (quoted in Bender et al., 2021). The moderation efforts have also been critiqued for their poor labor practices and outsourcing (Exclusive, 2023). Algorithmic bias in a variety of technologies has been a critique from the information sector for a long time, and ChatGPT’s natural sounding responses that may be laced with internet bias points to a continuance of these problems (Noble, 2018). The ability to mimic human conversation without semantic understanding presents challenges in a society that is drawn towards narratives of sentience. Most recently, an open letter calling for the pause in all AI experiments for 6 months to catch up with regulatory needs has been signed by prominent leaders in the tech industry (“Pause Giant AI Experiments,” n.d.).

Environmental Cost

As with many aspects of our modern day computing, the storage and processing of ChatGPT and other Large Language Models has a significant environmental cost. The CO2 emissions from cloud computing services contribute to this problem, but the question of the inherent utility of a Large Language Model compared to its incredible environmental cost must be posed (Bender et al., 2021).

Conclusion

Librarians have been adapting to change ever since information moved from clay tablets to papyrus.

As Harry E. Pince notes in his article on the future of artificial intelligence in libraries, “Librarians have been adapting to change ever since information moved from clay tablets to papyrus” (Pence, 2022). Artificial Intelligence usage in society is an important development in our informational landscape–it is our job as academic librarians to support the ways in which these changes are impacting research at the university level. We hope to continue these conversations with other Teachers College offices and faculty exploring the influence of AI in our world. If you have questions about ChatGPT and its use in a research context, or feedback/corrections from this post, please contact us on our website through ask a librarian.

References

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

Chiang, T. (2023, February 9). ChatGPT Is a Blurry JPEG of the Web. The New Yorker. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web

David Smerdon on Twitter. (2023, January 27). Twitter. https://twitter.com/dsmerdon/status/1618816703923912704

Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer. (2023, January 18). Time. https://time.com/6247678/openai-chatgpt-kenya-workers/

Noble, S. U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. New York University Press. http://ebookcentral.proquest.com/lib/columbia/detail.action?docID=4834260

GPT-4 Technical Report (2023). arXiv. https://doi.org/10.48550/arXiv.2303.08774

Pause Giant AI Experiments: An Open Letter. (n.d.). Future of Life Institute. Retrieved March 31, 2023, from https://futureoflife.org/open-letter/pause-giant-ai-experiments/

Pence, H. E. (2022). Future of Artificial Intelligence in Libraries. The Reference Librarian, 63(4), 133–143. https://doi.org/10.1080/02763877.2022.2140741

Rudin, C., & Radin, J. (2019). Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From an Explainable AI Competition. Harvard Data Science Review, 1(2). https://doi.org/10.1162/99608f92.5a8a3a3d

Santhosh, S. (2023, January 15). Reinforcement Learning from Human Feedback (RLHF)—ChatGPT. Medium. https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-from-human-feedback-rlhf-532e014fb4ae

Tags:

Learning at the Library
Trends