PCFA
is a partially confirmatory approach covering a wide range of
the exploratory-confirmatory continuum in factor analytic models (Chen, Guo, Zhang, & Pan, 2021).
The PCFA is only for continuous data, while the generalized PCFA (GPCFA; Chen, 2021)
covers both continuous and categorical data.
There are two major model variants with different constraints for identification. One assumes local
independence (LI) with a more exploratory tendency, which can be also called the E-step.
The other allows local dependence (LD) with a more confirmatory tendency, which can be also
called the C-step. Parameters are obtained by sampling from the posterior distributions with
the Markov chain Monte Carlo (MCMC) techniques. Different Bayesian Lasso methods are used to
regularize the loading pattern and LD. The estimation results can be summarized with summary.lawbl
and the factorial eigenvalue can be plotted with plot_lawbl
.
Usage
pcfa(
dat,
Q,
LD = TRUE,
cati = NULL,
cand_thd = 0.2,
PPMC = FALSE,
burn = 5000,
iter = 5000,
update = 1000,
missing = NA,
rfit = TRUE,
sign_check = FALSE,
sign_eps = -0.5,
rs = FALSE,
auto_stop = FALSE,
max_conv = 10,
rseed = 12345,
digits = 4,
alas = FALSE,
verbose = FALSE
)
Arguments
- dat
A \(N \times J\) data matrix or data.frame consisting of the responses of \(N\) individuals to \(J\) items.
- Q
A \(J \times K\) design matrix for the loading pattern with \(K\) factors and \(J\) items. Elements are 1, -1, and 0 for specified, unspecified, and zero-fixed loadings, respectively. For models with LI or the E-step, one can specify a few (e.g., 2) loadings per factor. For models with LD or the C-step, the sufficient condition of one specified loading per item is suggested, although there can be a few items without any specified loading. See
Examples
.- LD
logical;
TRUE
for allowing LD (model with LD or C-step).- cati
The set of categorical (polytomous) items in sequence number (i.e., 1 to \(J\));
NULL
for no and -1 for all items (default isNULL
).- cand_thd
Candidate parameter for sampling the thresholds with the MH algorithm.
- PPMC
logical;
TRUE
for conducting posterior predictive model checking.- burn
Number of burn-in iterations before posterior sampling.
- iter
Number of formal iterations for posterior sampling (> 0).
- update
Number of iterations to update the sampling information.
- missing
Value for missing data (default is
NA
).- rfit
logical;
TRUE
for providing relative fit (DIC, BIC, AIC).- sign_check
logical;
TRUE
for checking sign switch of loading vector.- sign_eps
minimum value for switch sign of loading vector (if
sign_check=TRUE
).- rs
logical;
TRUE
for enabling recommendation system.- auto_stop
logical;
TRUE
for enabling auto stop based onEPSR<1.1
.- max_conv
maximum consecutive number of convergence for auto stop.
- rseed
An integer for the random seed.
- digits
Number of significant digits to print when printing numeric values.
- alas
logical; for adaptive Lasso or not. The default is
FALSE
.- verbose
logical; to display the sampling information every
update
or not.Feigen
: Eigenvalue for each factor.NLA_le3
: Number of Loading estimates >= .3 for each factor.Shrink
: Shrinkage (or ave. shrinkage for each factor for adaptive Lasso).EPSR & NCOV
: EPSR for each factor & # of convergence.Ave. Thd
: Ave. thresholds for polytomous items.Acc Rate
: Acceptance rate of threshold (MH algorithm).LD>.2 >.1 LD>.2 >.1
: # of LD terms larger than .2 and .1, and LD's shrinkage parameter.#Sign_sw
: Number of sign switch for each factor.
Value
pcfa
returns an object of class lawbl
without item intercepts. It contains a lot of information about
the posteriors that can be summarized using summary.lawbl
.
References
Chen, J., Guo, Z., Zhang, L., & Pan, J. (2021). A partially confirmatory approach to scale development with the Bayesian Lasso. Psychological Methods. 26(2), 210–235. DOI: 10.1037/met0000293.
Chen, J. (2021). A generalized partially confirmatory factor analysis framework with mixed Bayesian Lasso methods. Multivariate Behavioral Research. DOI: 10.1080/00273171.2021.1925520.
Examples
# \donttest{
#####################################################
# Example 1: Estimation with continuous data & LD #
#####################################################
dat <- sim18cfa1$dat
J <- ncol(dat)
K <- 3
Q<-matrix(-1,J,K);
Q[1:6,1]<-Q[7:12,2]<-Q[13:18,3]<-1
m0 <- pcfa(dat = dat, Q = Q, LD = TRUE,burn = 2000, iter = 2000)
summary(m0) # summarize basic information
#> $NJK
#> [1] 1000 18 3
#>
#> $`Miss%`
#> [1] 0
#>
#> $`LD Allowed`
#> [1] TRUE
#>
#> $`Burn in`
#> [1] 2000
#>
#> $Iteration
#> [1] 2000
#>
#> $`No. of sig lambda`
#> [1] 23
#>
#> $Selected
#> [1] TRUE TRUE TRUE
#>
#> $`Auto, NCONV, MCONV`
#> [1] 0 0 10
#>
#> $EPSR
#> Point est. Upper C.I.
#> [1,] 1.7125 3.5603
#> [2,] 1.0459 1.1902
#> [3,] 1.1321 1.4495
#>
#> $`No. of sig LD terms`
#> [1] 6
#>
#> $`DIC, BIC, AIC`
#> [1] 3597.172 2497.062 1378.094
#>
#> $Time
#> user system elapsed
#> 35.08 0.02 35.18
#>
summary(m0, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
#> 1 2 3
#> I1 0.7233 0.0000 0.0000
#> I2 0.6469 0.0000 0.0000
#> I3 0.7660 0.0000 0.0000
#> I4 0.7660 0.0000 0.0000
#> I5 0.7658 0.1835 0.0000
#> I6 0.7425 0.0000 0.0000
#> I7 0.0000 0.7646 0.0000
#> I8 0.0000 0.7091 0.0000
#> I9 0.0000 0.7420 0.0000
#> I10 0.0000 0.7220 0.0000
#> I11 0.0000 0.7087 0.2341
#> I12 0.0000 0.7163 0.2194
#> I13 0.0000 0.0000 0.7084
#> I14 0.0000 0.0000 0.6745
#> I15 0.0000 0.0000 0.7426
#> I16 0.0000 0.0000 0.7343
#> I17 0.2472 0.0000 0.7259
#> I18 0.2328 0.0000 0.7234
summary(m0, what = 'offpsx') #summarize significant LD terms
#> row col est sd lower upper sig
#> [1,] 14 1 0.2876 0.0416 0.2042 0.3671 1
#> [2,] 7 2 0.2495 0.0608 0.1408 0.3637 1
#> [3,] 4 3 0.2692 0.0516 0.1744 0.3658 1
#> [4,] 13 8 0.2654 0.0614 0.1418 0.3899 1
#> [5,] 10 9 0.3080 0.0608 0.2008 0.4382 1
#> [6,] 16 15 0.2631 0.0635 0.1374 0.3843 1
######################################################
# Example 2: Estimation with categorical data & LI #
######################################################
dat <- sim18ccfa40$dat
J <- ncol(dat)
K <- 3
Q<-matrix(-1,J,K);
Q[1:2,1]<-Q[7:8,2]<-Q[13:14,3]<-1
m1 <- pcfa(dat = dat, Q = Q,LD = FALSE,cati=-1,burn = 2000, iter = 2000)
summary(m1) # summarize basic information
#> $NJK
#> [1] 1000 18 3
#>
#> $`Miss%`
#> [1] 9.888889
#>
#> $`LD Allowed`
#> [1] FALSE
#>
#> $`Burn in`
#> [1] 2000
#>
#> $Iteration
#> [1] 2000
#>
#> $`No. of sig lambda`
#> [1] 24
#>
#> $Selected
#> [1] TRUE TRUE TRUE
#>
#> $`Auto, NCONV, MCONV`
#> [1] 0 0 10
#>
#> $EPSR
#> Point est. Upper C.I.
#> [1,] 1.2016 1.6828
#> [2,] 1.1469 1.4762
#> [3,] 1.0670 1.2669
#>
#> $`Cat Items`
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#>
#> $`max No. of categories`
#> [1] 4
#>
#> $`DIC, BIC, AIC`
#> [1] 4133.164 3834.688 3466.606
#>
#> $Time
#> user system elapsed
#> 65.82 0.11 66.26
#>
summary(m1, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
#> 1 2 3
#> I1 0.7423 0.0000 0.0000
#> I2 0.7406 0.0000 0.0000
#> I3 0.7442 0.0000 0.0000
#> I4 0.7620 0.0000 0.0000
#> I5 0.7688 0.2287 0.0000
#> I6 0.7184 0.2275 0.0000
#> I7 0.0000 0.7350 0.0000
#> I8 0.0000 0.7322 0.0000
#> I9 0.0000 0.7314 0.0000
#> I10 0.0000 0.6976 0.0000
#> I11 0.0000 0.7014 0.2681
#> I12 0.0000 0.6945 0.2641
#> I13 0.0000 0.0000 0.7506
#> I14 0.0000 0.0000 0.7243
#> I15 0.0000 0.0000 0.7331
#> I16 0.0000 0.0000 0.7115
#> I17 0.2784 0.0000 0.7283
#> I18 0.3019 0.0000 0.6667
summary(m1, what = 'offpsx') #summarize significant LD terms
#> NULL
summary(m1,what='thd') #thresholds for categorical items
#> [,1] [,2] [,3]
#> [1,] -1.3566 0.0754 1.6245
#> [2,] -1.4324 0.0571 1.4929
#> [3,] -1.4491 0.0539 1.4929
#> [4,] -1.3762 0.0598 1.5975
#> [5,] -1.4274 0.0147 1.4727
#> [6,] -1.4862 0.0849 1.5689
#> [7,] -1.4943 0.0361 1.6604
#> [8,] -1.5413 -0.0550 1.4096
#> [9,] -1.4773 -0.0132 1.5135
#> [10,] -1.5041 0.0126 1.6268
#> [11,] -1.4495 0.0218 1.5745
#> [12,] -1.5340 0.0205 1.6335
#> [13,] -1.5586 -0.0522 1.4086
#> [14,] -1.4153 0.0509 1.5145
#> [15,] -1.5415 -0.0238 1.4803
#> [16,] -1.4598 -0.0093 1.3759
#> [17,] -1.4275 0.0653 1.4807
#> [18,] -1.5253 0.0380 1.5541
# }