Augmenting an Incident Dataset with ChatGPT

Jon Ricketts

doi:10.56094/jss.v59i1.273

Authors

Jon Ricketts https://orcid.org/0000-0001-9487-9092

DOI:

https://doi.org/10.56094/jss.v59i1.273

Keywords:

Natural Language Processing, incident reporting, semantic search, hazard identification

Abstract

The field of Natural Language Processing (NLP) is evolving at a rapid rate, impacting ways of working across multiple industries including that of System Safety. One area of NLP is the development of advanced language models, notably ChatGPT—which is essentially a powerful artificial intelligence chatbot powered by a large language model. This paper takes an incident report dataset and augments it with ChatGPT to improve searching capability and provide answers to safety related queries. It is shown that incident datasets can be further adapted for knowledge retrival to support safety queries, however, a major limitation to deploying this method elsewhere are data protection policies. The underpinning vector database (used to retrieve relevant incident reports) demonstrated a useful semantic search ability for more accurate and meaningful searches of incident datasets. It is considered that if the outputs provide evidence or sources behind answers, and are used for advisory purposes then they can form useful tools for information and knowledge retrieval in System Safety.

References

OpenAI, “Introducing ChatGPT,” 2023. [Online]. Available: https://openai.com/blog/chatgpt. [Accessed: 29-Mar-2023].

J. Chatterjee and N. Dethlefs, “This new conversational AI model can be your friend, philosopher, and guide. and even your worst enemy,” Patterns, vol. 4, no. 1, pp. 1–3, 2023, doi: https://doi.org/10.1016/j.patter.2022.100676.

Y. Duan, L. Shao, G. Hu, Z. Zhou, Q. Zou, and Z. Lin, “Specifying Architecture of Knowledge Graph with Data Graph, Information Graph, Knowledge Graph and Wisdom Graph,” in IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA), 2017, pp. 327–332, doi: https://doi.org/10.1109/SERA.2017.7965747.

C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. MIT Press, 1999.

M. Shanahan, “Talking About Large Language Models,” arXiv Prepr., pp. 1–13, 2022.

OpenAI, “ChatGPT: Optimizing Language Models for Dialogue.” [Online]. Available: https://openai.com/blog/chatgpt/. [Accessed: 10-Feb-2023].

NASA, “Aviation Safety Reporting System. Program Briefing.” [Online]. Available: https://asrs.arc.nasa.gov/overview/summary.html. [Accessed: 04-Apr-2023].

C. W. Johnson, A Handbook of Incident and Accident Reporting. Glasgow: Glasgow University Press, 2003.

T. Van der Schaff, Near Miss Reporting as a Safety Tool. Butterworth Heinemann, 1991.

Society of Automotive Engineers, “ARP 4761 Guidelines and Methods for conducting the Safety Assessment Process on Civil Airborne Systems and Equipment,” 1996.

T. Aven and E. Zio, Knowledge in risk assessment and management. Chichester: Wiley, 2018. https://doi.org/10.1002/9781119317906

T. A. Kletz, “Searchlights from the past,” J. Hazard. Mater., vol. 159, no. 1, pp. 130–134, 2008, doi: https://doi.org/10.1016/j.jhazmat.2007.09.119.

J. Briggs, “Retrieval Enhanced Generative Question Answering with OpenAI,” 2023. [Online]. Available: https://github.com/pinecone-io/examples/blob/master/generation/generative-qa/ openai/gen-qa-openai/gen-qa-openai.ipynb. [Accessed: 06-Apr-2023].

E. Hoole, “ASRS Aviation Reports Dataset,” 2022. [Online]. Available: https://huggingface.co/datasets/elihoole/asrs-aviation-reports [Accessed: 13-Apr-2023].

Aviation Safety Network, “Boeing 737.” [Online]. Available: https://aviation-safety.net/database/types/Boeing-737-series/index. [Accessed: 13-Apr-2023].

W. X. Zhao et al., “A Survey of Large Language Models,” arXiv Prepr., pp. 1–51, 2023.