Proposing the Use of Hazard Analysis for Machine Learning Data Sets
DOI:
https://doi.org/10.56094/jss.v58i2.253Keywords:
machine learning, data assurance, data governanceAbstract
There is no debating the importance of data for artificial intelligence. The behavior of data-driven machine learning models is determined by the data set, or as the old adage states: “garbage in, garbage out (GIGO).” While the machine learning community is still debating which techniques are necessary and sufficient to assess the adequacy of data sets, they agree some techniques are necessary. In general, most of the techniques being considered focus on evaluating the volumes of attributes. Those attributes are evaluated with respect to anticipated counts of attributes without considering the safety concerns associated with those attributes. This paper explores those techniques to identify instances of too little data and incorrect attributes. Those techniques are important; however, for safety critical applications, the assurance analyst also needs to understand the safety impact of not having specific attributes present in the machine learning data sets. To provide that information, this paper proposes a new technique the authors call data hazard analysis. The data hazard analysis provides an approach to qualitatively analyze the training data set to reduce the risk associated with the GIGO.
References
AFE 87 Project Members. (2020). Machine Learning, AFE-87. College Station: Aerospace Vehicle Systems Institute. Retrieved June 1, 2022, from https://avsi.aero/projects/current-projects/cert-of-ml-systems/afe-87-machine-learning/
Brillinger, D. R. (2011). Data Analysis, Exploratory. Retrieved June 1, 2022, from https://www.stat.berkeley.edu/~brill/Papers/EDASage.pdf
Copeland, R. (2019). An Analysis and Classification Process towards the Qualification of Autonomous Systems in Army Aviation. Vertical Flight Society’s 75th Annual Forum & Technology Display. Philadelphia. Retrieved from https://vtol.org/store/product/an-analysis-and-classification-process-towards-the-qualification-of-autonomous-systems-in-army-aviation-14727.cfm
D. Sculley, G. H.-F. (2015). Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28.
Data Safety Initiative Working Group. (2022). Data Safety Guidance (Version 3.4). Safety-Critical Systems Club. Retrieved June 1, 2022, from https://scsc.uk/scsc-127G
Kevin Fuchs, P. A. (2016). INTUITEL and the Hypercube Model - Developing Adaptive Learning. SYSTEMICS, CYBERNETICS AND INFORMATICS, 14(3), 7-11. Retrieved June 1, 2022, from http://iiisci.org/journal/pdv/sci/pdfs/EA039OY16.pdf
Nagy, B. (2021). Increasing Confidence in Machine Learned (ML) Functional Behavior during Artificial Intelligence (AI) Development using Training Data Set Measurements. Acquisition Research Program. Retrieved June 1, 2022, from https://dair.nps.edu/handle/123456789/4393
Oliver Zendel, K. H. (2017). Analyzing Computer Vision Data — The Good, the Bad and the Ugly. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2017.706
Recommended Failure Modes and Effects Analysis (FMEA) Practices for Non-Automobile Applications, ARP 5580. (2020). SAE International.
Rob Ashmore, M. H. (2018). “Boxing Clever”: Practical Techniques for Gaining Insights into Training Data and Monitoring Distribution Shift. SAFECOMP 2018 Workshops, LNCS 11094, 393–405. Retrieved June 1, 2022, from https://doi.org/10.1007/978-3-319-99229-7_33
S-18. (1996). Guidelines for Conducting the Safety Assessment Process on Civil Aircraft, Systems, and Equipment, ARP4761. SAE International.
S-18. (2010). Guidelines for Development of Civil Aircraft and Systems, ARP4754A. SAE International.
SAE G-34. (2021). Artificial Intelligence in Aeronautical Systems: Statement of Concerns, AIR6988™. SAE International. Retrieved June 1, 2022, from https://www.sae.org/standards/content/air6988/
SAE G-34. (2022). Process Standard for Development and Certification/Approval of Aeronautical Safety-Related Products Implementing AI, AS6983. SAE International.
Safety of Autonomous Systems Working Group. (2022). Safety Assurance Objectives for Autonomous Systems V3, SCSC-153B. Safety Critical Systems Club. Retrieved June 1, 2022, from https://scsc.uk/SCSC-153B
SC-205. (2011). Software Considerations in Airborne Systems, DO-178C. Washington: RTCA, Inc.
Soudain, G. (2021). First usable guidance for Level 1 machine learning applications. European Union Aviation Safety Agency. Retrieved June 1, 2022, from https://www.easa.europa.eu/newsroom-and-events/news/easa-releases-its-concept-paper-first-usable-guidance-level-1-machine-0
Tabular Modeling Deep Dive. (2022, April). Retrieved June 1, 2022, from https://github.com/fastai/fastbook/blob/master/09_tabular.ipynb
Timnit Gebru, J. M. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 86-92. https://doi.org/10.1145/3458723
United States Code of Federal Regulations. (n.d.). 14 CFR 25.1309 Equipment, systems, and installations. US Government. Retrieved June 1, 2022, from https://www.ecfr.gov/current/title-14/chapter-I/subchapter-C/part-25/subpart-F/subject-group-ECFR9f24bf451b0d2b1/section-25.1309
Downloads
Published
How to Cite
Issue
Section
Categories
License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.