prolong()
is designed to take an \(n \times t\) outcome matrix and an
\(n \times p \times t\) array of covariates and automatically run the
entire processing and model fitting process, returning a list of coefficients
with corresponding variable names.
Usage
prolong(
x,
y,
lambda1 = NULL,
lambda2 = NULL,
lambdar = NULL,
groups = TRUE,
foldids = NULL,
optimvals = c(1, 0.01)
)
Arguments
- x
Input covariate array, with n rows, p columns, and t slices
- y
Input response matrix, with n rows and t columns
- lambda1
Lasso/group lasso parameter, if left
NULL
this parameter will be chosen via a cross-validation that keeps each subject's observations across time points together. It is recommended to save the lambda2 and lambdar values from the first run of prolong for future runs, since the optimization step takes the longest- lambda2
Laplacian penalty parameter, if left
NULL
will be chosen along with lambdar via MLE, see supplementary material in (Steiner et al. 2022)- lambdar
Nuisance parameter, if left
NULL
will be chosen along with lambdar via MLE. Added to the the diagonal elements of the laplacian matrix to get the invertibility required for the MLE- groups
Optional pre-specified groups. If
NULL
orFALSE
, lasso will be used. If left asTRUE
, then there will be p groups each containing the observations across time points- foldids
Optional pre-specified foldids for the cv. Should be of length n. If left
NULL
, subjects will be automatically split into 5 folds- optimvals
Initial values of lambda2 and lambdar to be used by
optim()
for the MLE of lambda2 and lambdar
Value
A list object with S3 class prolong
beta | p*length(lambda1) matrix of coefficients. Currently, lambda1 is chosen via cross-validation and is just a single value |
selected | the names of the variables with at least one non-zero coefficient |
df | number of non-zero coefficients (out of p*t(t-1)/2 , not p ) |
dim | dimension of full coefficient matrix over lambda1 values. Currently, lambda1 is chosen via cross-validation and is just a single value |
lambda1 | sequence of lambda1 values used in final gglasso /glmnet call |
lambda2 | lambda2 value either passed by user or chosen via MLE, parameter of interest for the network penalty |
lambdar | lambdar value either passed by user or chosen via MLE, nuisance parameter needed to estimate lambda2 via MLE |
npasses | total number of iterations summed over lambda1 values for final gglasso /glmnet call |
jerr | error flag for final gglasso /glmnet call, 0 if no error |
group | vector of consecutive integers describing the grouping of coefficients |
call | the gglasso /glmnet call that produced this object |
Details
First, the data is reshaped into the first-differenced vectorized \(n(t-1)\) length Y and first-differenced matricized \(n(t-1) \times pt(t-1)/2\) X with a corresponding dependence matrix. Next, hyperparameters for the graph laplacian-based network penalty are found via MLE and the parameter for lasso/group lasso is found via a careful implementation of cross-validation. Lastly, a group lasso + laplacian or lasso + laplacian model is implemented, and its bias-adjusted coefficients are returned.
References
Broll S, Basu S, Lee MH, Wells MT (2023). “PROLONG: Penalized Regression for Outcome guided Longitudinal Omics analysis with Network and Group constraints.” bioRxiv. doi:10.1101/2023.11.06.565845 , https://www.biorxiv.org/content/early/2023/11/07/2023.11.06.565845.full.pdf, https://www.biorxiv.org/content/early/2023/11/07/2023.11.06.565845. Steiner A, Abbas K, Brzyski D, Pączek K, Randolph TW, Goñi J, Harezlak J (2022). “Incorporation of spatial- and connectivity-based cortical brain region information in regularized regression: Application to Human Connectome Project data.” Frontiers in Neuroscience, 16. ISSN 1662-453X, doi:10.3389/fnins.2022.957282 , https://www.frontiersin.org/articles/10.3389/fnins.2022.957282. Yang Y, Zou H (2015). “A fast unified algorithm for solving group-lasso penalize learning problems.” Statistics and Computing, 25(6), 1129--1141. ISSN 1573-1375, doi:10.1007/s11222-014-9498-5 , 2022-09-06. Yuan M, Lin Y (2005). “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1), 49-67. ISSN 1369-7412, doi:10.1111/j.1467-9868.2005.00532.x . Friedman J, Tibshirani R, Hastie T (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1--22. doi:10.18637/jss.v033.i01 . Tibshirani R (1996). “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267--288. ISSN 00359246, http://www.jstor.org/stable/2346178.