U.S. flag

An official website of the United States government, Department of Justice.

A Feature Mapping Technique for Complex Data Object Generation With Likelihood and Deep Generative Approaches

NCJ Number
309738
Journal
Ieee Access Volume: 11 Dated: January 2023 Pages: 136643-136653
Date Published
2023
Length
11 pages
Annotation

This paper proposes a feature-mapping technique to statistically model Unconventional Data Sets, such as social and behavioral networks, consisting of complex data objects; it demonstrates the versatility of modeling UDS with real-world datasets; and it discusses the generation of synthetic data from UDS, using an Adversarial Autoencoder as the deep generative approach.

Abstract

When a sufficient amount of training data is available, Machine Learning (ML) models show great promise for solving problems involving complex and dynamic patterns. Social and behavioral domains are rich with such challenging problems, with complex object data extracted from documents, surveys, etc., and represented in forms such as graphs and trees. However, many social and behavioral data sets are inherently sparse and incomplete. The same data field may be unavailable in different records of a data set due to different causes, e.g., because it was not measured, not known, or simply not applicable to that particular record. Furthermore, collection challenges, cost, lack of participation, small affected populations, etc., result in very small sets of data. Resulting unconventional datasets cannot be directly used with potent approaches such as machine learning. A technique to model and synthesize large sets of such complex data objects while maintaining the same statistical and topological characteristics of original data helps overcome these challenges. The authors propose a novel feature-mapping technique to eliminate data inconsistencies and model data objects from unconventional datasets. The feature-mapped data objects are used to synthesize data using two likelihood approaches, i.e., multi-variate Gaussian and regular vine copulas, and one generative adversarial approach using an adversarial autoencoder (AAE). They demonstrate the robustness of the proposed technique with three real-world datasets representing disparate domains and validate the performance of likelihood and deep-generative approaches with these object synthesis strategies. (Published Abstract Provided)

Date Published: January 1, 2023