Title: | Ridge Partial Correlation |
---|---|
Description: | Computes the ridge partial correlation coefficients in a high or ultra-high dimensional linear regression problem. An extended Bayesian information criterion is also implemented for variable selection. Users provide the matrix of covariates as a usual dense matrix or a sparse matrix stored in a compressed sparse column format. Detail of the method is given in the manual. |
Authors: | Somak Dutta [aut, cre, cph], An Nguyen [aut, ctb], Run Wang [ctb], Vivekananda Roy [ctb] |
Maintainer: | Somak Dutta <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.3 |
Built: | 2025-03-23 05:53:56 UTC |
Source: | https://github.com/somakd/rpc |
This function performs model selection using extended BIC and the ridge partial correlation coefficients
eBIC(rpc.obj)
eBIC(rpc.obj)
rpc.obj |
an object of class |
BIC.PATH vector of eBIC of each model, starting with the intercept only model.
model.best sorted indices (in increased order) of the model with the smallest EBIC
An Nguyen
Somak Dutta
Maintainer: Somak Dutta <[email protected]>
n <- 50; p <- 400; trueidx <- 1:3 truebeta <- c(4,5,6) X <- matrix(rnorm(n*p), n, p) # n x p covariate matrix y <- 0.5 + X[,trueidx] %*% truebeta + rnorm(n) res <- rpc(X,y, lambda = 0.1, ncores = 1) eBIC(res) # model.best: model selected by eBIC
n <- 50; p <- 400; trueidx <- 1:3 truebeta <- c(4,5,6) X <- matrix(rnorm(n*p), n, p) # n x p covariate matrix y <- 0.5 + X[,trueidx] %*% truebeta + rnorm(n) res <- rpc(X,y, lambda = 0.1, ncores = 1) eBIC(res) # model.best: model selected by eBIC
The sample ridge partial correlations (RPC) can be used for a simple, efficient, and scalable variables screening method in ultra-high dimensional linear regression. These coefficients incorporate both the ridge regularized estimates of the regression coefficients and the ridge regularized partial variances of the predictor variables, providing the screening method with sure screening property without strong assumptions on the marginal correlations. For more details, please see Wang et al. (2025).
This function computes the ridge partial correlations for each variable in the covariate matrix.
rpc(X, y, lambda = 1, XXt = NULL, ncores = 1, RAM = 6)
rpc(X, y, lambda = 1, XXt = NULL, ncores = 1, RAM = 6)
X |
The input matrix of type ‘matrix’ or ‘dgCMatrix’. No need to center or scale X as that would be done implicitly. |
y |
vector containing the response variable. |
lambda |
the regularization parameter. |
XXt |
the matrix UU' where U = scale(X), will be computed
if not provided. When the matrix X is large, recomputing XXt can be costly,
for example if the rpc's for a new lambda value is required. It is thus
wise to pass XXt (see function |
ncores |
the number of cores to be used. Default is 1. |
RAM |
The algorithm will use a maximum of this much additional RAM. Default is 6GB. Increasing this number, within the limit of the machine, will allow the algorithm to run faster. |
Consider the linear regression model:
where is the
design matrix,
is a p-dimensional vector,
and
is the n-dimensional vector of iid residual errors with mean 0 and variance
,
is independent of
.
Assuming the design matrix is centered and scaled, and letting
,
the (sample) partial correlation between the
predictor and
given the remaining predictors is
, where
is the
joint precision matrix of the response and the covariates.
In cases when
,
is not invertible.
Hence, we consider the ridge regularized version of
,
given by
, where:
The ridge partial correlations between and
is then defined in terms of elements of
as follows:
where is the
th entry in
.
For variable screening, one may choose the variables with largest absolute RPC values,
for some
(e.g.,
or
). Alternatively, the extended BIC
criteria may be used. See
eBIC
function in this package.
N.B. Further hyper-threading speed-up can be obtained by
loading the MatrixExtra
package.
an object containing the following items:
- rpc: vector of rpc coefficients.
- X: the original matrix of covariates.
- XXt: the matrix UU', where U is the centered-scaled X.
- y: vector of centered and scaled response variable.
- lambda: the original lambda used.
Somak Dutta
An Nguyen
Maintainer: Somak Dutta <[email protected]>
[eBIC()] for model selecting, [XXt.compute()] for computing crossproduct.
## Toy example: n <- 50; p <- 400; trueidx <- 1:3 ## First three variables are important truebeta <- c(4,5,6) X <- matrix(rnorm(n*p), n, p) ## n x p covariate matrix y <- 0.5 + X[,trueidx] %*% truebeta + rnorm(n) ## Response res <- rpc(X,y, lambda = 0.1, ncores = 1) order(abs(res$rpc),decreasing = TRUE)[1:10] # Top 10 variables ## Run another case with the same X and y, but pass XXt to make it faster res2 <- rpc(X,y, lambda = 0.001,XXt = res$XXt , ncores = 1) order(abs(res2$rpc),decreasing = TRUE)[1:10] # Top 10 variables
## Toy example: n <- 50; p <- 400; trueidx <- 1:3 ## First three variables are important truebeta <- c(4,5,6) X <- matrix(rnorm(n*p), n, p) ## n x p covariate matrix y <- 0.5 + X[,trueidx] %*% truebeta + rnorm(n) ## Response res <- rpc(X,y, lambda = 0.1, ncores = 1) order(abs(res$rpc),decreasing = TRUE)[1:10] # Top 10 variables ## Run another case with the same X and y, but pass XXt to make it faster res2 <- rpc(X,y, lambda = 0.001,XXt = res$XXt , ncores = 1) order(abs(res2$rpc),decreasing = TRUE)[1:10] # Top 10 variables
For an input matrix X, This function computes UU' = tcrossprod(U) where U = scale(X) in a memory efficient way.
XXt.compute( X, meanX = NULL, varX = NULL, ncores = 4, check = TRUE, chunksize = 1000 )
XXt.compute( X, meanX = NULL, varX = NULL, ncores = 4, check = TRUE, chunksize = 1000 )
X |
The input matrix of type ‘matrix’ or ‘dgCMatrix’. No need to center or scale X as that would be done implicitly. |
meanX |
Vector of column means of X. Will be computed if not provided. |
varX |
Vector of column variances of X. Will be computed if not provided. |
ncores |
the number of cores to be used. Default is 1. |
check |
check for zero variances. |
chunksize |
When using openmp parallelization, this would be the chunk size under a dynamic scheduling. |
a nrow(X) x nrow(X) dense symmetric matrix.
Somak Dutta
Maintainer: Somak Dutta <[email protected]>
require("Matrix") set.seed(123) x <- rsparsematrix(100,100,density = 0.3) norm(XXt.compute(x) - tcrossprod(scale(x))) # (4.783951e-13 on my machine)
require("Matrix") set.seed(123) x <- rsparsematrix(100,100,density = 0.3) norm(XXt.compute(x) - tcrossprod(scale(x))) # (4.783951e-13 on my machine)