Luigi Lombardi's homepage; Dept. of Psychology and Cognitive Science; University of Trento

Fake Data Analysis Project

A sample generation by replacement (SGR) approach



Introduction

SGR (Sample Generation by Replacement) is a probabilistic resampling procedure that can be used to study and evaluate uncertainty in inferences based on possible fake data as well as to study the implications of fake data for empirical results. In general, an SGR analysis takes an interpretation perspective which incorporates in a global model all the available information (empirical or hypothetical) about the process of faking and the underlying true model representation. In particular, SGR is not a method for detecting faking at the individual level but a rational approach to evaluate statistical results under potential faking corrupted data. In addition, SGR has a statistical descriptive nature and tries to capture the phenomenological effect of faking according to an informational, data-oriented perspective based on a data replacement (information replacement) paradigm. This makes SGR related in spirit to other statistical approaches such as, for example, uncertainty and sensitivity analysis (Helton et al. 2006) and prospective power analysis (Cohen 1988).

More specifically, SGR uses a two-stage sampling procedure based on two distinct generative models: the model defining the process that generates the data prior to any fake perturbation (data generation process) and the faking model which is used to perturb the data (data replacement process). By repeatedly sampling data from the SGR procedure we can generate the so called fake data sample (FDS) and eventually study the distribution of some relevant statistics computed on this simulated space. In SGR the first process is represented by some standard Monte Carlo procedures for ordinal data whereas the data replacement process is implemented using ad hoc probabilistic faking models.

The sgr package (ver. 1.3)

The sgr package is developed to provide R users a free open-source package for performing fake data analysis according to the sample generation by replacement approach. sgr includes functions for making simple inferences about discrete/ordinal fake data and allows to quantify uncertainty in inferences based on possible fake data as well as to study the implications of fake data for empirical results. For example, how sensitive are the results to possible fake data? Are the conclusions still valid under one or more scenarios of faking?

Download the sgr package: R package version 1.3 [(CRAN site)].

Papers about SGR and its application to fake data analysis

copyright notice:
The documents distributed here have been provided as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. Contributions referring to journal articles are denoted with [A]. Contributions referring to book chapters are denoted with [B].

selected papers

[A] Lombardi L. & Pastore M. (2015). Robust evaluation of fit-indices to fake-good perturbation of ordinal data. Quality & Quantity. [Online First Articles DOI: 10.1007/s11135-015-0282-1] (publisher web site). Author's copy (pdf)

[A] Lombardi L., Pastore M., Nucci M., & Bobbio A. (2015). SGR modeling of correlational effects in fake good self-report measures. Methodology and Computing in Applied Probability, 17, 1037-1055. (publisher web site). Author's copy (pdf)

[A] Lombardi L. & Pastore M. (2014). sgr: A package for simulating conditional fake ordinal data. The R Journal, 6(1), 164-177. (pdf). [This paper introduces the sgr package with several examples.]

[A] Pastore M. & Lombardi L. (2014). The impact of faking on Cronbach's Alpha for dichotomous and ordered rating scores. Quality & Quantity, 48, 1191-1211, (publisher web site). Author's copy (pdf)

[A] Lombardi L. & Pastore M. (2012). Sensitivity of fit indices to fake perturbation of ordinal data: A sample by replacement approach. Multivariate Behavioral Research, 47, 519-546. (publisher web site). Author's copy (pdf). [This is the seminal paper about SGR.]

older contributions

[B] Pastore M., Lombardi L., & Mereu F. (2007). Effects of malingering in self-report measures: A scenario analysis approach. In C. H. Skiadas (Ed.). Recent Advances in Stochastic Modelling and Data Analysis. Singapore: World Scientific Pub. Co. (publisher web site).

[B] Lombardi L., Pastore M., & Nucci M. (2004). Evaluating uncertainty of model acceptability in empirical applications: a replacement approach. In Monfort K., Oude H., Satorra A. (Eds.). Recent developments in structural equation modeling: theory and applications. Amsterdam, Kluwer academic publishers.