Skip to contents

PEFA is a partially exploratory approach to factor analysis, which can incorporate partial knowledge together with unknown number of factors, using bi-level Bayesian regularization. When partial knowledge is not needed, it reduces to the fully exploratory factor analysis (FEFA; Chen, 2021). A large number of factors can be imposed for selection where true factors will be identified against spurious factors. The loading vector is reparameterized to tackle model sparsity at the factor and loading levels with the multivariate spike and slab priors. Parameters are obtained by sampling from the posterior distributions with the Markov chain Monte Carlo (MCMC) techniques. The estimation results can be summarized with summary.lawbl and the trace or density of the posterior can be plotted with plot_lawbl.

Usage

pefa(
  dat,
  Q = NULL,
  K = 8,
  mjf = 3,
  PPMC = FALSE,
  burn = 5000,
  iter = 5000,
  missing = NA,
  eig_eps = 1,
  sign_eps = 0,
  rfit = TRUE,
  rs = FALSE,
  update = 1000,
  rseed = 12345,
  verbose = FALSE,
  auto_stop = FALSE,
  max_conv = 10,
  digits = 4
)

Arguments

dat

A \(N \times J\) data matrix or data.frame consisting of the responses of \(N\) individuals to \(J\) items.

Q

A \(J \times K\) design matrix for the loading pattern with \(K\) factors and \(J\) items for PEFA. Elements are 1, -1, and 0 for specified, unspecified, and zero-fixed loadings, respectively. It's not needed for FEFA, which is the default. See Examples.

K

Maximum number of factors for selection under FEFA. Not used for PEFA.

mjf

Minimum number of items per factor.

PPMC

logical; TRUE for conducting posterior predictive model checking.

burn

Number of burn-in iterations before posterior sampling.

iter

Number of formal iterations for posterior sampling (> 0).

eig_eps

minimum eigenvalue for factor extraction.

sign_eps

minimum value for switch sign of loading vector.

rfit

logical; TRUE for providing relative fit (DIC, BIC, AIC).

rs

logical; TRUE for enabling recommendation system.

update

Number of iterations to update the sampling information.

rseed

An integer for the random seed.

verbose

logical; to display the sampling information every update or not.

  • Feigen: Eigenvalue for each factor.

  • NLA_lg0: Number of Loading magnitudes > 0 for each factor.

  • iShrink: Inverted shrinkage parameter for each factor.

  • True Fa: Is the factor identified as true or not.

  • EPSR & NCOV: EPSR for each factor & # of convergence.

  • ROW: LA overflow,sign switch,bk=0, <eig_eps: Loading overflow, sign switch, vector bk=0 and eigenvalue<eig_eps.

auto_stop

logical; TRUE for enabling auto stop based on EPSR.

max_conv

maximum consecutive number of convergence for auto stop.

digits

Number of significant digits to print when printing numeric values.

Value

pcfa returns an object of class lawbl without item intercepts. It contains a lot of information about the posteriors that can be summarized using summary.lawbl.

References

Chen, J. (2021). A Bayesian regularized approach to exploratory factor analysis in one step. Structural Equation Modeling: A Multidisciplinary Journal, 28(4), 518-528. DOI: 10.1080/10705511.2020.1854763.

Chen, J. (In Press). Fully and partially exploratory factor analysis with bi-level Bayesian regularization. Behavior Research Methods.

Examples

# \donttest{
#####################################################
#  Example 1: Fully EFA                             #
#####################################################

dat <- sim18cfa0$dat

m0 <- pefa(dat = dat, K=5, burn = 2000, iter = 2000,verbose = TRUE)
#> 
#> Tot. Iter = 1000
#>           [,1]      [,2]    [,3]    [,4]      [,5]
#> Feigen  3.0751 0.0000000  2.9399  3.1796 0.000e+00
#> NLA_lg0 9.0000 0.0000000 12.0000 10.0000 0.000e+00
#> iShrink 0.2006 0.0003665  0.2937  0.3649 2.847e-16
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0  195    0    0  240
#> [3,]    0  432    0    0  502
#> [4,]    1  544    3    8  498
#> 
#> Tot. Iter = 2000
#>           [,1]      [,2]  [,3]    [,4]      [,5]
#> Feigen  2.8513 0.0000000 2.853  3.0671 0.000e+00
#> NLA_lg0 9.0000 0.0000000 9.000 10.0000 0.000e+00
#> iShrink 0.2843 0.0005151 1.029  0.3692 1.657e-20
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0  366    0    0  502
#> [3,]    0  921    0    0  975
#> [4,]    1 1055    3    8 1025
#> 
#> Tot. Iter = 3000
#>            [,1]   [,2]   [,3]   [,4]      [,5]
#> Feigen   3.3503 0.0000 2.6914 3.1388 0.000e+00
#> NLA_lg0 11.0000 0.0000 9.0000 9.0000 0.000e+00
#> iShrink  0.3073 0.0459 0.4774 0.2367 7.354e-23
#> Tru Fac 1 0 1 1 0
#> EPSR & NCONV 1.01 1.022 1.012 1
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0  519    0    0  764
#> [3,]    0 1505    0    0 1488
#> [4,]    1 1471    3    8 1512
#> 
#> Tot. Iter = 4000
#>            [,1]    [,2]   [,3]  [,4]      [,5]
#> Feigen   3.2592 0.00000 3.1686 3.221 0.000e+00
#> NLA_lg0 11.0000 0.00000 8.0000 8.000 0.000e+00
#> iShrink  0.8602 0.04228 0.3323 1.006 7.724e-30
#> Tru Fac 1 0 1 1 0
#> EPSR & NCONV 1.011 1.005 1.006 2
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0  519    0    0 1009
#> [3,]    0 2501    0    0 2022
#> [4,]    1 1475    3    8 1978
#>    user  system elapsed 
#>   14.52    0.02   14.57 
summary(m0) # summarize basic information
#> $NJK
#> [1] 1000   18    5
#> 
#> $`Miss%`
#> [1] 0
#> 
#> $`LD Allowed`
#> [1] FALSE
#> 
#> $`Burn in`
#> [1] 2000
#> 
#> $Iteration
#> [1] 2000
#> 
#> $`No. of sig lambda`
#> [1] 24
#> 
#> $Selected
#> [1]  TRUE FALSE  TRUE  TRUE FALSE
#> 
#> $`Auto, NCONV, MCONV`
#> [1]  0  2 10
#> 
#> $EPSR
#>      Point est. Upper C.I.
#> [1,]     1.0111     1.0306
#> [2,]     1.0046     1.0065
#> [3,]     1.0061     1.0313
#> 
#> $`DIC, BIC, AIC`
#> [1] 4911.460 2892.593 2313.478
#> 
#> $Time
#>    user  system elapsed 
#>   14.52    0.02   14.57 
#> 
summary(m0, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
#>          1      3      4
#> I1  0.0000 0.0000 0.6798
#> I2  0.0000 0.0000 0.6883
#> I3  0.0000 0.0000 0.7043
#> I4  0.0000 0.0000 0.7103
#> I5  0.3235 0.0000 0.6952
#> I6  0.3406 0.0000 0.6749
#> I7  0.7229 0.0000 0.0000
#> I8  0.7178 0.0000 0.0000
#> I9  0.7231 0.0000 0.0000
#> I10 0.6968 0.0000 0.0000
#> I11 0.6961 0.2924 0.0000
#> I12 0.6972 0.2853 0.0000
#> I13 0.0000 0.6785 0.0000
#> I14 0.0000 0.6744 0.0000
#> I15 0.0000 0.7082 0.0000
#> I16 0.0000 0.6772 0.0000
#> I17 0.0000 0.6895 0.3005
#> I18 0.0000 0.6853 0.2872
summary(m0, what = 'phi') #summarize factorial correlations
#>      row col    est     sd  lower  upper sig
#> [1,]   3   1 0.2881 0.0358 0.2196 0.3587   1
#> [2,]   4   1 0.2688 0.0361 0.2025 0.3421   1
#> [3,]   4   3 0.3154 0.0358 0.2523 0.3897   1
summary(m0, what = 'eigen') #summarize factorial eigenvalue
#>       est     sd  lower  upper sig
#> F1 3.2443 0.1612 2.9169 3.5517   1
#> F3 2.9944 0.1521 2.7198 3.3141   1
#> F4 3.0561 0.1577 2.7607 3.3788   1

##########################################################
#  Example 2: PEFA with two factors partially specified  #
##########################################################

J <- ncol(dat)
K <- 5
Q<-matrix(-1,J,K);
Q[1:2,1]<-Q[7:8,2]<-1
Q
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    1   -1   -1   -1   -1
#>  [2,]    1   -1   -1   -1   -1
#>  [3,]   -1   -1   -1   -1   -1
#>  [4,]   -1   -1   -1   -1   -1
#>  [5,]   -1   -1   -1   -1   -1
#>  [6,]   -1   -1   -1   -1   -1
#>  [7,]   -1    1   -1   -1   -1
#>  [8,]   -1    1   -1   -1   -1
#>  [9,]   -1   -1   -1   -1   -1
#> [10,]   -1   -1   -1   -1   -1
#> [11,]   -1   -1   -1   -1   -1
#> [12,]   -1   -1   -1   -1   -1
#> [13,]   -1   -1   -1   -1   -1
#> [14,]   -1   -1   -1   -1   -1
#> [15,]   -1   -1   -1   -1   -1
#> [16,]   -1   -1   -1   -1   -1
#> [17,]   -1   -1   -1   -1   -1
#> [18,]   -1   -1   -1   -1   -1

m1 <- pefa(dat = dat, Q = Q,burn = 2000, iter = 2000,verbose = TRUE)
#> 
#> Tot. Iter = 1000
#>           [,1]   [,2]    [,3]     [,4]     [,5]
#> Feigen  3.0879 3.3755  3.1763 0.000000 0.00e+00
#> NLA_lg0 9.0000 8.0000 10.0000 0.000000 0.00e+00
#> iShrink 0.8762 0.7396  0.5269 0.005856 3.38e-06
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0    0    0  160  219
#> [3,]    0    0    0  485  490
#> [4,]    0    0    1  515  510
#> 
#> Tot. Iter = 2000
#>          [,1]  [,2]   [,3]     [,4]      [,5]
#> Feigen  3.139 3.046  2.818 0.000000 0.000e+00
#> NLA_lg0 8.000 8.000 10.000 0.000000 0.000e+00
#> iShrink 1.218 0.937  1.071 0.002086 4.385e-07
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0    0    0  348  469
#> [3,]    0    0    0  972  964
#> [4,]    0    0    1 1028 1036
#> 
#> Tot. Iter = 3000
#>           [,1]   [,2]   [,3]      [,4]      [,5]
#> Feigen  3.1556 3.5876 2.8366 0.000e+00 0.000e+00
#> NLA_lg0 9.0000 9.0000 8.0000 0.000e+00 0.000e+00
#> iShrink 0.4709 0.8438 0.7828 4.889e-09 1.519e-10
#> Tru Fac 1 1 1 0 0
#> EPSR & NCONV 1.044 1.001 1.092 1
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0    0    0  587  740
#> [3,]    0    0    0 1468 1463
#> [4,]    0    0    1 1532 1537
#> 
#> Tot. Iter = 4000
#>           [,1]  [,2]   [,3]      [,4]      [,5]
#> Feigen  2.9765 2.998  3.223 0.000e+00 0.000e+00
#> NLA_lg0 9.0000 8.000 11.000 0.000e+00 0.000e+00
#> iShrink 0.5865 1.365  1.191 8.653e-15 1.355e-18
#> Tru Fac 1 1 1 0 0
#> EPSR & NCONV 1.004 1.008 1.013 2
#> [1] "ROW: LA overflow, sign switch, bk=0, <eig_eps"
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0    0    0  856 1009
#> [3,]    0    0    0 1944 1942
#> [4,]    0    0    1 2056 2058
#>    user  system elapsed 
#>   15.13    0.00   15.14 
summary(m1)
#> $NJK
#> [1] 1000   18    5
#> 
#> $`Miss%`
#> [1] 0
#> 
#> $`LD Allowed`
#> [1] FALSE
#> 
#> $`Burn in`
#> [1] 2000
#> 
#> $Iteration
#> [1] 2000
#> 
#> $`No. of sig lambda`
#> [1] 24
#> 
#> $Selected
#> [1]  TRUE  TRUE  TRUE FALSE FALSE
#> 
#> $`Auto, NCONV, MCONV`
#> [1]  0  2 10
#> 
#> $EPSR
#>      Point est. Upper C.I.
#> [1,]     1.0041     1.0220
#> [2,]     1.0078     1.0302
#> [3,]     1.0125     1.0507
#> 
#> $`DIC, BIC, AIC`
#> [1] 4906.446 2886.731 2307.616
#> 
#> $Time
#>    user  system elapsed 
#>   15.13    0.00   15.14 
#> 
summary(m1, what = 'qlambda')
#>          1      2      3
#> I1  0.6834 0.0000 0.0000
#> I2  0.6924 0.0000 0.0000
#> I3  0.7067 0.0000 0.0000
#> I4  0.7122 0.0000 0.0000
#> I5  0.6984 0.3245 0.0000
#> I6  0.6767 0.3411 0.0000
#> I7  0.0000 0.7283 0.0000
#> I8  0.0000 0.7231 0.0000
#> I9  0.0000 0.7268 0.0000
#> I10 0.0000 0.7001 0.0000
#> I11 0.0000 0.7017 0.2905
#> I12 0.0000 0.6995 0.2850
#> I13 0.0000 0.0000 0.6784
#> I14 0.0000 0.0000 0.6741
#> I15 0.0000 0.0000 0.7090
#> I16 0.0000 0.0000 0.6776
#> I17 0.3029 0.0000 0.6876
#> I18 0.2867 0.0000 0.6870
summary(m1, what = 'phi')
#>      row col    est     sd  lower  upper sig
#> [1,]   2   1 0.2711 0.0371 0.1995 0.3424   1
#> [2,]   3   1 0.3188 0.0363 0.2507 0.3863   1
#> [3,]   3   2 0.2879 0.0352 0.2213 0.3620   1
summary(m1,what='eigen')
#>       est     sd  lower  upper sig
#> F1 3.0802 0.1591 2.8060 3.4049   1
#> F2 3.2822 0.1694 2.9720 3.6228   1
#> F3 2.9940 0.1530 2.7138 3.3128   1
# }