

Comput Stat Data Anal 52:2228–2237ĭray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J Veg Sci 8:463–474ĭray S (2008) On the number of principal components: a test of dimensionality based on measurements of similarity between matrices. J R Stat Soc B 39(1):1–38ĭiaz S, Cabido M (1997) Plant functional types and ecosystem function in relation to global change. Psychometrika 48(2):305–308Ĭlavel J, Merceron G, Escarguel G (2014) Missing data estimation in morphometrics: how much is too much? Syst Biol 63(2):203–218ĭempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. Syst Biol 61(6):941–954Ĭailliez F (1983) The analytical solution of the additive constant problem. Psychometrika 76:119–123īrown CM, Arbour JH, Jackson DA (2012) Testing of the effect of missing data estimation and distribution in morphometric multivariate data analyses. Lastly, we discuss the advantages and drawbacks of these approaches, the potential pitfalls and future challenges that need to be addressed in the future.īentler P, Yuan K (2011) Positive definiteness via off-diagonal scaling of a symmetric indefinite matrix. We carried out a simulation study to evaluate the relative merits of the different approaches in various situations (correlation structure, number of variables and individuals, and percentage of missing values) and also applied them on a real data set. We present several techniques to consider or estimate (impute) missing values in PCA and compare them using theoretical considerations. In plant ecology, this statistical challenge relates to the current effort to compile global plant functional trait databases producing matrices with a large amount of missing values. Here, we study the case where some of the data values are missing and propose a review of methods which accommodate PCA to missing data. Principal component analysis (PCA) is a standard technique to summarize the main structures of a data table containing the measurements of several quantitative variables for a number of individuals.
