Comparison of Three Methods to Generate Synthetic Datasets for Social Science
Li-jing Arthur Chang
Many researchers often have difficulties finding enough data to test their hypotheses [1][2]. This study explores three different ways to create “synthetic”2 (i.e., artificial) data that mimics real-world data in statistical traits like correlations (i.e., relationships between the variables). To see how well these methods perform, the study compares the patterns of synthetic data to their real-world counterparts and sees how closely the data maintain the correlations. Additionally, the study uses seven machine learning3 prediction methods to see how these synthetic data perform. The findings indicate that two methods more effectively preserve the original correlation structure, while the third method yields better predictive performance. Full Text
|