The ‘gold standard’ for freely available Structural Equation Modelling (SEM) software is often taken to be Mx. Unfortunately, it is difficult to use Mx with large data sets such as can be produced with fMRI, the source code for Mx is not available, and it is often much easier to program calculations using Matlab which is the platform for SPM, the most popular package for analyzing fMRI data. Matlab SEM code for fMRI analysis is therefore needed. In addition to Mx, an important resource for SEM is ‘SEMNET’ – the SEM discussion network. It includes as members some of the most experienced people using SEM, mostly in areas out with neuroimaging. The particular example used below was discussed on SEMNET some time ago - links to that discussion are provided. SEM remains the most frequently used method for fMRI measurement of ‘effective connectivity’. However, other techniques such as ‘dynamic causal modelling’ (DCM) are becoming increasingly popular. These other techniques are not discussed here. The example below is used to introduce SEM from the beginning, starting with an Mx analysis, then considering the problem from the perspective of a non-numerical algebraic solution, and finally using Matlab in conjunction with Adaptive Simulated Annealing (ASA) with the ASAMIN ‘mex’ gateway function. There were two motivations for this approach. Firstly, SEM calculations can be complex and such complexity can obscure mistakes and misconceptions. Therefore, the objective here was to describe the simplest possible, most transparent SEM computational method, which is at the same time completely general and practical to use with Matlab. Obfuscation is anathema. Matlab code and other files for the example are here. The second motivation relates to the use of SEM in fMRI. There are many pitfalls with SEM relating to issues such as ‘unidentified models’, ‘empirical under-identification’, poor fitting of the model to the data, ‘local minima’ etc. These issues are well recognised in non-neuroimaging fields such as economics and psychology, yet it is still rare to find an fMRI publication mentioning the problems. They are important though, since ignoring them may possibly render a calculation at best wrong and at worst misleading. An FAQ which clearly discusses many of the pitfalls is available. Given the effort and time that goes into obtaining fMRI data, it is very important to avoid such problems. SEM is often used in fMRI for measuring ‘effective connectivity’- the influence of one neuronal region on another. Accurate measurement of effective connectivity is particularly important for clinical fMRI studies of schizophrenia, such as those testing the ‘disconnection hypothesis’, and associated theories of abnormal corollary discharge due to dysfunctional ‘forward models’. Example: It is important to note that the following example was chosen to indicate what happens with a SEM identification problem. It was not difficult to generate this problem. 1/ Path Diagrams & Reticular Action Model (RAM) Matrices The Mx manual has a very useful Tutorial introduction. This first section is partly based on that Tutorial, but uses a different path diagram and data as an example, plus considers SEM from the perspective of fMRI. SEMs are usually represented by path diagrams. There are 4 elements to a path diagram:
Consider the following path diagram (Mx file: ds_example1.mxd) This path diagram represents 4 observed variables F, G, H and J each with residual variance f, g, h and i respectively. The 4 observed variables could be obtained from a region of interest (ROI) analysis of fMRI data (see below). Two latent non-observed variables X and Y with a fixed residual variance of 1.0 are represented, such that X is assumed to cause F and G quantified by paths a and b, Y is assumed to cause H and J quantified by paths c and d. X and Y are assumed to covary by an amount e. The object of the SEM calculation is to estimate all these variables, with a, b, c, d and especially e being of particular interest, using an observed covariance matrix. In the case of an fMRI study it is possible to define ROIs using SPM and extract a ‘time-series’: e.g. in SPM99, the program ‘spm_regions.m’ extracts a representative time-course from voxel data in a data file (Y.mad) in terms of the first eigenvariate of the filtered and adjusted data in all suprathreshold voxels (saved in Y.mad) within a spherical ROI centred on the nearest (Y.mad) voxel to the selected location. Temporal filtering can be specified. P1 Total Number of Measurements The number of repeated measurements in an fMRI study can be large - it needs to be, because the SEM technique requires this. One of the two main criticisms of SEM studies are that sample sizes are often inadequate. (The other is that SEM is often used to misrepresent causal relationships - when in fact the data has been obtained from non-experimental data – see P6 below). For example, an fMRI study lasting for 22 min (quite a long task) may result in 500 echoplanar volumes being obtained, during which 288 stimuli are presented to a single subject. The output of this process would be four time-series corresponding to the F, G, H and J regions of interest, stored in a matrix (e.g. ts(4,:) ) - it is easy to calculate the covariance matrix ( cov(ts’) ). By this method the following observed covariance matrix ‘obs’ could have been calculated – see Mx data file (ds_example1.dat)
The RAM approach to SEM is very straightforward and completely general (McArdle and McDonald 1984). It is less efficient computationally than other much more complex techniques, so the latter were often used before the easy availability of fast PCs. Mx will automatically generate the RAM matrices from the path diagram but it is necessary to know how to do this for the later Matlab code. RAM requires the specification of three matrices, ‘F’, ‘A’ and ‘S’. The key to specification is consistent labelling of the rows and columns of the matrices with the latent and observed variables. The order of variables is unimportant. Asymmetric paths with single headed arrows are represented in a matrix ‘A’. For the path diagram example this is:
Note how the a, b, c and d variables relate to the latent and observed variables. The row and column labels are of course omitted from the actual matrix. Symmetric paths with double headed arrows are similarly represented in a matrix ‘S’
A Filter matrix ‘F’, which relates observed to latent variables, is also defined
A ‘1’ appears, wherever a row variable is the same as a column variable. In the following, ‘I’ refers to the identity matrix. The predicted covariance matrix ‘pred’ can be calculated from these matrices
and this matrix is then compared with the actual observed matrix ‘obs’ using a variety of fitting functions, e.g. the ‘maximum likelihood’ function (‘ML’ below). The FAQ discusses such fitting equations in detail.
The ‘numerical approach’ to SEM involves specifying path values and calculating how well the predicted covariance matrix matches the observed matrix. The objective is to vary the path values such that the discrepancy quantified by the fit function is minimised.
One of the problems that can occur with fitting is due to the existence of ‘local minima’. Each point along the ‘path values’ axis corresponds to a group of path values. Unfortunately, simple ‘gradient descent’ methods for minimisation have a tendency to get ‘stuck’ in local minima and so do not identify the best fit path values. SEM fitting is recognised as often having problems with local minima, and Mx (plus various other commercial SEM programs) uses a gradient descent method. One approach is to restart the minimisation procedure with different sets of initial parameters, since one set will often be arbitrarily better at finding the global minimum than another. Starting with path values, which have been roughly estimated by another method, and are therefore close to the expected final optimal value, has also been recommended. Instead of the gradient descent method, the Matlab code presented later uses a different method, ‘simulated annealing’, which can be far more robust when it comes to local minima. P3 Poor fit between Model and Data A model, which fits the data well, indicates that the proposed network provides an appropriate explanation of the observed data. Unfortunately, even at the global minimum, many models may fit the observed covariance matrix poorly. In fMRI, the path model is often based on reasonably well known anatomical connections, and some have suggested that it is acceptable to just ignore a poor fit. However, in non neuroimaging fields, it is generally considered important to use a model that fits the data well, before applying any statistical tests. The goodness of fit can be estimated with a variety of fit indices. Mx uses functions including the Root Mean Squared Error of Approximation (RMSEA) and Akaike’s Information Criterion (AIC) – see Mx manual. Early on, I attempted to ignore a poor fit of the model to the data. However, I found that when I attempted to do statistical tests on the path coefficients, the results could be very unstable, depending on whether an arbitrary change was made to the path model. This instability seemed much less with well fitting models. Running Mx on the above path model and data gives the output
AIC <
0 and RMSEA < 0.05 indicates a reasonably good fit of the model to the
data. which might seem OK at a first glance. However, adding Options SError to the Mx script results in these comments: Approximate numerical estimates of standard errors and CI's. *** Not as good as likelihood-based or bootstrap *** Likely to be wrong if error distribution is not symmetric
*** WARNING! *** Eigenvalue check indicates ill-conditioning Check your model for under-identification Most likely culprits are parameters specified as
Please check your model for under-identification and note that this message may be caused by parameters hitting actual or effective bounds. More complete diagnostic data may be found in the
Mx suggests parameters ‘e’ & ‘a’ may be suspect and does this by computing the eigenvalues and eigenvectors of the ‘hessian matrix’ (see below), using the information to assess for ‘under-identification’ - another of the pitfalls of SEM. It is easy to specify an under-identified model, yet this may not be obvious. In an under-identified model at least one of the path values can not be determined, although a SEM program will still output a (misleading) numerical value for an unidentified path. P4 Model Under-Identification and Empirical Under-Identification A statistical model is ‘identified’ if the known information available implies that there is one best value for each parameter in the model whose value is not known. For example,the following is not identified
since there are an infinite number of pairs of x and y values which satisfy the above equation. However, if a further equation is also specified
there is only one pair of values which satisfy the two simultaneous equations, and the system is ‘identified’. As the SEMNET FAQ says, there are a number of approaches for testing identification. Mx tests the hessian matrix, where
and the latter is the matrix of second order derivatives of the fit or discrepancy function with respect to all the free parameters of the model. If the model's parameters are all identified, then the rank of the information matrix will be equal to the number of free parameters in the model - equivalently, the matrix will be ‘positive definite’ - if not identified, then the rank will be deficient. This is analogous to checking for multi-collinearity in regression by evaluating the rank of the covariance matrix of predictors. As noted in the FAQ, this method has various limitations. Other techniques - which also have limitations - include heuristic methods and a technique using the augmented Jacobian matrix. Under-identification refers to a failure of the model, independent of the observed data. Unfortunately, even if a model is identified, particular data being used with the model may still result in ‘empirical under-identification’ - see FAQ. Empirically, the easiest method to test for any form of under-identification is to re-run the fitting procedure. Provided the identical optimum fit value is obtained on each occasion (indicating that the minimisation has not failed by getting stuck in a local minimum), then if each of the path values is also identical, the model + data are likely to be identified. A path which is not identified would be indicated by non-identical path values for the same fit value on repeated re-runs. A further method which can be combined with multiple re-runs is to estimate the standard error ‘se’ associated with each path value
This is not an accurate way to estimate the standard error. The value of the method lies in the fact that it is simple to implement, and an unidentified path tends to be associated with a larger estimated standard error. The only general theoretical way to determine if a given model is identified is to attempt a full algebraic solution, solving the covariance structure equations for the unknown parameters, and expressing the parameters as independent functions of the elements of the covariance matrix. This is generally avoided as it is ‘tedious and error-prone’ because a great deal of algebra may be required. However, it can be very enlightening to attempt a full solution – though empirical under-identification can still occur. An automated algebraic technique for SEM is therefore required. This is the subject of the next section. Principle The ‘algebraic approach’ suggests that the parameters of a SEM model are considered identified if it is possible to solve covariance structure equations for the unknown parameters - specifically to ‘express parameters as independent functions of the elements of the covariance matrix’. Say then we have a SEM model with various causal and covariance unknowns (a, b, c, d) which we want to estimate. Using RAM (or other) theory, we can derive algebraic expressions for a predicted covariance matrix ‘P’ say
where the elements of P are:
A SEM program would minimise the difference between P and the observed covariance matrix ‘O’ say, by varying the estimates of a-d according to a fit function where
However, the model is algebraically identified if it is possible to determine:
A detailed discussion on this method can be found elsewhere. Implementation To implement this algebraic manipulation automatically, it is possible to use the computer algebra program Macsyma. Macsyma can translate simple Matlab code into Macsyma code, meaning that it is possible to use a simple Matlab program to calculate the algebraic form of the predicted covariance matrix. Whilst Matlab has a symbolic manipulation package based on Maple, it does not seem to have a function equivalent to Macsyma’s SOLVER (see below). By this method, the predicted covariance matrix for the example is The following is then defined so, for example
and 10 simultaneous equations are defined. The simultaneous equations can be solved in an automated manner using the Macsyma SOLVER package. This program tries to find solutions of arbitrary systems of linear or non-linear parametric equations by employing heuristic search and valuation strategies in combination with the Macsyma SOLVE and LINSOLVE commands. If the equations cannot be solved explicitly, either because there exists no analytic solution or because the symbolic solution is too complex, SOLVER attempts to solve a subset and returns the non-linear kernel that could not be solved. Problems encountered during SOLVER action result in notifications and if necessary, prompts for choosing alternative actions. By this method an automated algebraic solution of the example was attempted and produced:
however SOLVER reported a linear dependency
equation y9 was not used, and 4 possible solutions were found
This indicates an identification problem, consistent with the Mx warning, despite being apparently identified from the perspective of ‘rules of thumb’ – a structure should be identified if “two or more factors… only two indicators for a factor… no correlated errors, each indicator loads on only one factor, and none of the variances or covariances is equal to zero”. A discussion on the identification problems with the example can be found elsewhere. These are partly due to a subtle issue from the way the residual variance has been specified. It is reasonably clear that:
It is possible to compare the Mx path values with the algebraic solution:
Therefore, Mx does not seem to be consistent with a full algebraic solution (though Mx did report a problem). Initial check The function called by asamin (SEM_func) calculates a variety of fitting functions for given input path parameters. In SEM_func(): set Nobs=250; and make sure the following is selected for fitting:
Then at the Matlab prompt define a variable, which corresponds to the Mx derived optimum path fit values:
Set:
Calling SEM_func directly should now return the value of ML ChiSq obtained by Mx:
which it does. Matlab ‘grid search’ If only a few paths are of interest, it’s possible to use SEM_func() within a simple ‘grid search’ using multiple loops, e.g.
This method may sometimes be practical. However, with 9 variables to determine here, such a calculation would take too long. Therefore, a better way is required to find an optimal solution. One possible method is Adaptive Simulated Annealing (ASA), which is C code written by Lester Ingber and developed to find the best global fit of a constrained non-convex cost-function over D-dimensional space. Simulated annealing is a generalization of a Monte Carlo method for examining the equations of state and frozen states of n-body systems. The annealing concept is based on the way liquids freeze or metals recrystalize in the process of annealing. A melt, initially at high temperature and disordered, is cooled so that the system at any time is approximately in thermodynamic equilibrium. As cooling proceeds, the system becomes more ordered and approaches a ‘frozen’ ground state - an adiabatic approach to the lowest energy state. If the initial temperature of the system is too low or cooling is done insufficiently slowly, the system may become ‘quenched’ forming defects or freezing out in metastable states - a local minimum state. ASA is a particularly efficient version of simulated annealing and has many user defined parameters. ASAMIN is a Matlab gateway function to ASA written by Shinichi Sakata. ASAMIN and ASA have been compiled with Microsoft Visual C++ 6.0 to form the Matlab callable ‘asamin.dll’ routine. A particularly useful feature of ASAMIN is that the ‘cost function’ for minimisation by ASA (SEM_func()) can be written in Matlab not C. Matlab-ASA Run ‘sem.m’, which fits the model 4 times with randomised starting values: (first row is estimated path values, second is estimated standard error). The SRMR (Tabachnick and Fidell 1996, p752) has been used for estimating the goodness of fit of the model to the data. SRMR<0.05 indicates a good fit – note that use of the SRMR is invalid here since this equation should be used with standardised data (correlation matrices) – a non-standardised version (RMR) is also available. Run 1
Minimum function value = 0.018526 ASA exit state = 0 1, SRMR = 0.011672 Run 2
Minimum function value = 0.019854 ASA exit state = 0 1, SRMR = 0.012083 Run 3
Minimum function value = 0.018742 ASA exit state = 0 1, SRMR = 0.011739 Run 4
Minimum function value = 0.019555 ASA exit state = 0 1, SRMR = 0.011993 Consistent with discussion on SEMNET:
The above calculation did not use the maximum likelihood formula for fitting, since there is another problem - run ‘sem’ with this equation selected and observe the output. ML assumes a ‘positive definite’ predicted matrix, which can be tested for by attempting a Cholesky decomposition with the following code inserted into SEM_func()
This indicates non-positive definite predicted matrices are being generated during attempted function minimisation and so the ML formula can not be used. Assuming instead that a properly identified model is used and that consistent minimum function values are obtained with associated unique path values, a set of optimum paths can be derived for each subject taking part in a study. In the case of a clinical fMRI study, these can be considered summary statistics from a first-level analysis, and entered into a second level analysis using e.g. t-tests to compare a group of patients with a group of controls. The result would comprise a random effects analysis. Summary
P5 Miscellaneous issues There are a number of other issues which may be relevant to a given study such as ‘equivalent models’ (see below), model comparison strategies, statistical power estimation, non-positive definite matrix problems (as above), covariance versus correlation matrix use etc. Information on these can be also be found on the SEMNET FAQ. From the perspective of the general linear model (GLM), analysis of variance & covariance (ANCOVA) are special cases of multiple regression and correlation (Keppel and Zedeck 1989). ‘Dummy variables’ can be used as the independent variables in a regression analysis to implement a t-test or ANOVA. ANOVA itself is just a generalisation of a t-test for the case of more than two groups. Other techniques such as Pearson’s correlation are also specific cases of the GLM. Multiple regression analysis has an unspecified number of independent variables, but only one dependent variable. The generalisation to an unspecified number of dependent variables is termed canonical correlation analysis (CCA). CCA has been referred to as a general parametric significance-testing system, since simple correlation, t-test for independent samples, ANOVA, factorial ANOVA, ANCOVA, t-test for correlated samples, discriminant analysis and even a chi-squared test of independence are special cases of CCA (Knapp 1978). Multivariate analysis of variance (MANOVA) is also a special case of CCA, using dummy variables as the independent variables, analogous to implementing ANOVA with linear regression (Zinkgraf 1983). Multivariate analysis of variance and covariance (MANCOVA) is similarly analogous to ANCOVA from the perspective of the multivariate GLM. However, CCA is a special case of SEM (Bagozzi, Fornell et al. 1981), meaning that SEM can be used to implement all the above special case tests involving two groups of variables (independent & dependent). SEM is also used to implement methods involving one group of variables such as factor analysis (Lawley and Maxwell 1971). Furthermore, there are ways to incorporate modulatory interactions into SEM extending the technique to non-linear analyses (Friston 2002). The above discussion only considers variables conforming to univariate and multivariate normal distributions. The use of ‘resampling’ techniques (Good 1994) with SEM extends it to non-parametric statistical tests. Consequently, SEM is an extremely general technique. This comes at a price. It is important to have a far better understanding of SEM than is needed for more robust special cases (e.g. correlation, t-test) to avoid misuse, incorrect and misleading results. Even then, there are no guarantees. Investigating fMRI ‘effective connectivity’, in contrast to ‘functional connectivity’, has sometimes been argued to require SEM. Different definitions of effective and functional connectivity exist resulting in confusion; however, ‘effective connectivity’ is linked with the notion of ‘causality’ (the ‘effect’ of one neuronal system on another) in contrast to a mere ‘functional’ analysis. Thus justified, SEM is applied in an attempt to investigate hypothesized abnormal effective connectivity, but not necessarily with evidence of consideration of the many pitfalls. However, since linear regression is a special case of SEM, incorporates the same notion of causality (the ‘arrow’ goes from the independent to dependent variable), a priori specification of a few planned regression analyses (without SEM using SPM with ROIs) would be an easier and more robust test of effective connectivity. SEM is sometimes referred to as ‘causal modelling’; however, a well-fitting SEM model does not in itself imply causal relationships: a poorly fitting model (P3 above) does not imply anything. In many cases a well-fitting model can only be interpreted as consistent with the hypothesised causal relationships. On the basis of goodness of fit measures, SEM is unable to distinguish between different types of model involving: variable B dependent on variable A, A dependent on B, or A and B covarying: i.e. different causation or no causation. This is the problem of ‘equivalent models’. Consequently, in the search for causality, the most important issue is not the statistical test but the experimental design. A randomised experiment is the ‘only scientifically proven method of testing causal relations from data, and to this day, the one and only causal concept permitted in mainstream statistics’ (Perl 2000) . There is nothing in the notion of effective connectivity which requires a SEM approach. The study of causal inference is a subject in itself (Perl 2000) . If despite such caveats SEM is still of interest, it is best to keep the model as simple as possible, since this will often make it easier to find a model that fits (explains) the data well. Additionally, simpler models are easier to solve algebraically. I am interested to know if you disagree about anything, find errors in the above discussion or Matlab code (sem.m, SEM_func.m, asamin.dll), and if you find the code & discussion useful. Please cite [Steele J.D., et al (2004) NeuroImage, 23, 269-280] if you use the code. Postscript The method of image analysis described by Dr Meyer-Lyndenberg (Stein et al, 2007, NeuroImage, 36, 736-45) was developed using the above code as a starting point. References Bagozzi, R. P., C. Fornell, et al. (1981). "Cannonical correlation analysis as a special case of a structural relations model." Multivariate Behavioral Research 16: 437-454. Friston, K. (2002). "Beyond Phrenology: What Can Neuroimaging Tell Us About Distributed Circuitry?" Annu Rev Neurosci 25: 221-50. Good, P. (1994). Permutation Tests. London, Springer-Verlag. Keppel, G. and S. Zedeck (1989). Data analysis for research designs. Berkley, Freeman. Knapp, T. R. (1978). "Canonical correlation analysis: a general parametric significance testing system." Psychological Bulletin 85: 410-416. Lawley, D. N. and A. E. Maxwell (1971). Factor analysis as a statistical method. London, Butterworths. McArdle, J. J. and R. P. McDonald (1984). "Some algebraic properties of the Reticular Action Model for moment structures." British Journal of Mathematical and Statistical Psychology 37: 234-251. Perl, J. (2000). Causality, Models, Reasoning, and Inference. Cambridge, Cambridge University Press. Tabachnick, B. G. and L. S. Fidell (1996). Using multivariate statistics. New York, HarperCollins Publishers Inc. Zinkgraf, S. (1983). "Performing factorial multivariate analysis of variance using canonical correlation analysis." Educational and Psychological Measurement 43: 63-68. Web pages |