Skip to contents

prolong() is designed to take an \(n \times t\) outcome matrix and an \(n \times p \times t\) array of covariates and automatically run the entire processing and model fitting process, returning a list of coefficients with corresponding variable names.

Usage

prolong(
  x,
  y,
  lambda1 = NULL,
  lambda2 = NULL,
  lambdar = NULL,
  groups = TRUE,
  foldids = NULL,
  optimvals = c(1, 0.01)
)

Arguments

x

Input covariate array, with n rows, p columns, and t slices

y

Input response matrix, with n rows and t columns

lambda1

Lasso/group lasso parameter, if left NULL this parameter will be chosen via a cross-validation that keeps each subject's observations across time points together. It is recommended to save the lambda2 and lambdar values from the first run of prolong for future runs, since the optimization step takes the longest

lambda2

Laplacian penalty parameter, if left NULL will be chosen along with lambdar via MLE, see supplementary material in (Steiner et al. 2022)

lambdar

Nuisance parameter, if left NULL will be chosen along with lambdar via MLE. Added to the the diagonal elements of the laplacian matrix to get the invertibility required for the MLE

groups

Optional pre-specified groups. If NULL or FALSE, lasso will be used. If left as TRUE, then there will be p groups each containing the observations across time points

foldids

Optional pre-specified foldids for the cv. Should be of length n. If left NULL, subjects will be automatically split into 5 folds

optimvals

Initial values of lambda2 and lambdar to be used by optim() for the MLE of lambda2 and lambdar

Value

A list object with S3 class prolong

betap*length(lambda1) matrix of coefficients. Currently, lambda1 is chosen via cross-validation and is just a single value
selectedthe names of the variables with at least one non-zero coefficient
dfnumber of non-zero coefficients (out of p*t(t-1)/2, not p)
dimdimension of full coefficient matrix over lambda1 values. Currently, lambda1 is chosen via cross-validation and is just a single value
lambda1sequence of lambda1 values used in final gglasso/glmnet call
lambda2lambda2 value either passed by user or chosen via MLE, parameter of interest for the network penalty
lambdarlambdar value either passed by user or chosen via MLE, nuisance parameter needed to estimate lambda2 via MLE
npassestotal number of iterations summed over lambda1 values for final gglasso/glmnet call
jerrerror flag for final gglasso/glmnet call, 0 if no error
groupvector of consecutive integers describing the grouping of coefficients
callthe gglasso/glmnet call that produced this object

Details

First, the data is reshaped into the first-differenced vectorized \(n(t-1)\) length Y and first-differenced matricized \(n(t-1) \times pt(t-1)/2\) X with a corresponding dependence matrix. Next, hyperparameters for the graph laplacian-based network penalty are found via MLE and the parameter for lasso/group lasso is found via a careful implementation of cross-validation. Lastly, a group lasso + laplacian or lasso + laplacian model is implemented, and its bias-adjusted coefficients are returned.

References

Broll S, Basu S, Lee MH, Wells MT (2023). “PROLONG: Penalized Regression for Outcome guided Longitudinal Omics analysis with Network and Group constraints.” bioRxiv. doi:10.1101/2023.11.06.565845 , https://www.biorxiv.org/content/early/2023/11/07/2023.11.06.565845.full.pdf, https://www.biorxiv.org/content/early/2023/11/07/2023.11.06.565845. Steiner A, Abbas K, Brzyski D, Pączek K, Randolph TW, Goñi J, Harezlak J (2022). “Incorporation of spatial- and connectivity-based cortical brain region information in regularized regression: Application to Human Connectome Project data.” Frontiers in Neuroscience, 16. ISSN 1662-453X, doi:10.3389/fnins.2022.957282 , https://www.frontiersin.org/articles/10.3389/fnins.2022.957282. Yang Y, Zou H (2015). “A fast unified algorithm for solving group-lasso penalize learning problems.” Statistics and Computing, 25(6), 1129--1141. ISSN 1573-1375, doi:10.1007/s11222-014-9498-5 , 2022-09-06. Yuan M, Lin Y (2005). “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1), 49-67. ISSN 1369-7412, doi:10.1111/j.1467-9868.2005.00532.x . Friedman J, Tibshirani R, Hastie T (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1--22. doi:10.18637/jss.v033.i01 . Tibshirani R (1996). “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267--288. ISSN 00359246, http://www.jstor.org/stable/2346178.

Examples

if (FALSE) {
promod <- prolong(Ymatrix, Xarray)
promod$beta
promod$selected

promod <- prolong(Ymatrix, Xarray, lambda2 = .001, lambdar = 10, groups = FALSE)
promod$beta
promod$selected
}