This article reports on a project in which a set of 767 laboratory-generated fire debris samples of known ground truth as to whether an ignitable liquid residue was present (class IL) or absent (class SUB) were used to train five machine learning classifiers.
Linear and quadratic discriminant analysis (LDA and QDA), k-nearest neighbors (kNN), and support vector machines with radial and linear kernels (SVMr and SVMl) were tested for their performance in correctly classifying the fire debris samples into class IL or class SUB. Each classifier was trained and tested/validated on 500 class-balanced data sets, each comprised of 400 fire debris samples (200 IL and 200 SUB) that were bootstrapped from the 767 laboratory-generated samples. Each bootstrapped data set was split into subsets for training (75 percent, 300 samples) and testing/validation (25 percent, 100 samples). The LDA, SVMr, and SVMl were found to give satisfactory performance based on area under the receiver operating characteristic curve (0.86–0.92), equal error rates (17−22 percent) and well-calibrated probabilities. The three satisfactory classifiers were further applied to a set of 129 fire debris samples produced in large-scale test burns. The classifications generated by the machine learning models were compared with the sample classes assigned by an informed analyst having knowledge of the chromatographic patterns of the ignitable liquids used to start the large-scale fires. The LDA and SVMl models gave results most closely aligned with the informed analyst. (publisher abstract modified)