Fit prolong Model — prolong • prolong

prolong() is designed to take an \(n \times t\) outcome matrix and an \(n \times p \times t\) array of covariates and automatically run the entire processing and model fitting process, returning a list of coefficients with corresponding variable names.

Usage

prolong(
  x,
  y,
  lambda1 = NULL,
  lambda2 = NULL,
  lambdar = NULL,
  groups = TRUE,
  foldids = NULL,
  optimvals = c(1, 0.01)
)

Arguments

x: Input covariate array, with n rows, p columns, and t slices
y: Input response matrix, with n rows and t columns
lambda1: Lasso/group lasso parameter, if left NULL this parameter will be chosen via a cross-validation that keeps each subject's observations across time points together. It is recommended to save the lambda2 and lambdar values from the first run of prolong for future runs, since the optimization step takes the longest
lambda2: Laplacian penalty parameter, if left NULL will be chosen along with lambdar via MLE, see supplementary material in (Steiner et al. 2022)
lambdar: Nuisance parameter, if left NULL will be chosen along with lambdar via MLE. Added to the the diagonal elements of the laplacian matrix to get the invertibility required for the MLE
groups: Optional pre-specified groups. If NULL or FALSE, lasso will be used. If left as TRUE, then there will be p groups each containing the observations across time points
foldids: Optional pre-specified foldids for the cv. Should be of length n. If left NULL, subjects will be automatically split into 5 folds
optimvals: Initial values of lambda2 and lambdar to be used by optim() for the MLE of lambda2 and lambdar

Value

A list object with S3 class prolong

`beta`	`p*length(lambda1)` matrix of coefficients. Currently, `lambda1` is chosen via cross-validation and is just a single value
`selected`	the names of the variables with at least one non-zero coefficient
`df`	number of non-zero coefficients (out of `p*t(t-1)/2`, not `p`)
`dim`	dimension of full coefficient matrix over `lambda1` values. Currently, `lambda1` is chosen via cross-validation and is just a single value
`lambda1`	sequence of `lambda1` values used in final `gglasso`/`glmnet` call
`lambda2`	`lambda2` value either passed by user or chosen via MLE, parameter of interest for the network penalty
`lambdar`	`lambdar` value either passed by user or chosen via MLE, nuisance parameter needed to estimate `lambda2` via MLE
`npasses`	total number of iterations summed over `lambda1` values for final `gglasso`/`glmnet` call
`jerr`	error flag for final `gglasso`/`glmnet` call, 0 if no error
`group`	vector of consecutive integers describing the grouping of coefficients
`call`	the `gglasso`/`glmnet` call that produced this object

Details

First, the data is reshaped into the first-differenced vectorized \(n(t-1)\) length Y and first-differenced matricized \(n(t-1) \times pt(t-1)/2\) X with a corresponding dependence matrix. Next, hyperparameters for the graph laplacian-based network penalty are found via MLE and the parameter for lasso/group lasso is found via a careful implementation of cross-validation. Lastly, a group lasso + laplacian or lasso + laplacian model is implemented, and its bias-adjusted coefficients are returned.

References

Broll S, Basu S, Lee MH, Wells MT (2023). “PROLONG: Penalized Regression for Outcome guided Longitudinal Omics analysis with Network and Group constraints.” bioRxiv. doi:10.1101/2023.11.06.565845 , https://www.biorxiv.org/content/early/2023/11/07/2023.11.06.565845.full.pdf, https://www.biorxiv.org/content/early/2023/11/07/2023.11.06.565845. Steiner A, Abbas K, Brzyski D, Pączek K, Randolph TW, Goñi J, Harezlak J (2022). “Incorporation of spatial- and connectivity-based cortical brain region information in regularized regression: Application to Human Connectome Project data.” Frontiers in Neuroscience, 16. ISSN 1662-453X, doi:10.3389/fnins.2022.957282 , https://www.frontiersin.org/articles/10.3389/fnins.2022.957282. Yang Y, Zou H (2015). “A fast unified algorithm for solving group-lasso penalize learning problems.” Statistics and Computing, 25(6), 1129--1141. ISSN 1573-1375, doi:10.1007/s11222-014-9498-5 , 2022-09-06. Yuan M, Lin Y (2005). “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1), 49-67. ISSN 1369-7412, doi:10.1111/j.1467-9868.2005.00532.x . Friedman J, Tibshirani R, Hastie T (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1--22. doi:10.18637/jss.v033.i01 . Tibshirani R (1996). “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267--288. ISSN 00359246, http://www.jstor.org/stable/2346178.

Examples

if (FALSE) {
promod <- prolong(Ymatrix, Xarray)
promod$beta
promod$selected

promod <- prolong(Ymatrix, Xarray, lambda2 = .001, lambdar = 10, groups = FALSE)
promod$beta
promod$selected
}