| Title: | High-Performance Functions for Martingale Difference Correlation and Hypothesis Testing in Mixture Cure Models |
|---|---|
| Description: | Provides efficient tools for computing martingale difference correlation, martingale difference divergence, and their partial extensions to assess conditional mean dependence. Separately, the package introduces a novel hypothesis test for detecting covariate effects on the cure rate in mixture cure models, using MDC-based statistics. These methods enable both general dependence analysis and targeted inference in survival models with a cure fraction. |
| Authors: | Blanca Monroy-Castillo [aut, cre], Amalia Jácome [aut], Ricardo Cao [aut], Ingrid Van Keilegom [aut], Ursula Müller [aut] |
| Maintainer: | Blanca Monroy-Castillo <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-05-20 08:36:56 UTC |
| Source: | https://github.com/castlemon/mdccure |
The aim of this function is to test whether the cure rate , as a function of the covariates, satisfies a certain parametric model.
goft( x, time, delta, model = c("logit", "probit", "cloglog"), theta0 = NULL, nsimb = 499, h = NULL )goft( x, time, delta, model = c("logit", "probit", "cloglog"), theta0 = NULL, nsimb = 499, h = NULL )
x |
A numeric vector representing the covariate of interest. |
time |
A numeric vector of observed survival times. |
delta |
A numeric vector indicating censoring status (1 = event occurred, 0 = censored). |
model |
A character string specifying the parametric model for the incidence part. Can be |
theta0 |
Optional numeric vector with initial values for the model parameters. Default is |
nsimb |
An integer indicating the number of bootstrap replicates.Default is |
h |
Optional bandwidth value used for nonparametric estimation of the cure rate. Default is |
We want to test wether the cure rate , as a function of covariates, satisfies a certain parametric model, such as, logistic, probit or cloglog model.
The hypothesis are:
where is a finite-dimensional parameter space and is a known function up to the parameter vector .
The test statistic is based on a weighted distance between a nonparametric estimator and a parametric estimator under :
where is a known weighting function, often chosen as the covariate density .
A practical empirical version of the statistic is given by:
where the integral is replaced by a sample average.
A list with the following components:
Numeric value of the test statistic.
Numeric value of the bootstrap p-value for testing the null hypothesis.
The bandwidth used.
Müller, U.U, & Van Keilegom, I. (2019). Goodness-of-fit tests for the cure rate in a mixture cure model. Biometrika, 106, 211-227. doi:10.1093/biomet/asy058
## Some artificial data set.seed(123) n <- 50 x <- runif(n, -2, 2) ## Covariate values y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes c <- rexp(n) ## Censoring values p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible u <- runif(n) t <- ifelse(u < p, pmin(y, c), c) ## Observed times d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator data <- data.frame(x = x, t = t, d = d) goft(x, t, d, model = "logit")## Some artificial data set.seed(123) n <- 50 x <- runif(n, -2, 2) ## Covariate values y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes c <- rexp(n) ## Censoring values p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible u <- runif(n) t <- ifelse(u < p, pmin(y, c), c) ## Observed times d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator data <- data.frame(x = x, t = t, d = d) goft(x, t, d, model = "logit")
mdc computes the squared martingale difference correlation between a response variable Y
and explanatory variable(s) X, measuring conditional mean dependence.
X can be either univariate or multivariate.
mdc(X, Y, center = "U")mdc(X, Y, center = "U")
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
center |
Character string indicating the centering method to use. One of:
|
Returns the squared martingale difference correlation of Y given X.
Shao, X., and Zhang, J. (2014). Martingale difference correlation and its use in high-dimensional variable screening. Journal of the American Statistical Association, 109(507), 1302-1318. doi:10.1080/01621459.2014.887012.
# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # multivariate data with 5 variables y <- rbinom(n, 1, 0.5) # binary covariate # Compute MDC with U-centering mdc(x, y, center = "U") # Compute MDC with double-centering mdc(x, y, center = "D")# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # multivariate data with 5 variables y <- rbinom(n, 1, 0.5) # binary covariate # Compute MDC with U-centering mdc(x, y, center = "U") # Compute MDC with double-centering mdc(x, y, center = "D")
Computes dependence between a multivariate dataset x and a univariate covariate y
using different variants of the MDC (martingale difference correlation) test.
mdc_test(x, y, method, permutations = 999, parallel = TRUE, ncores = -1)mdc_test(x, y, method, permutations = 999, parallel = TRUE, ncores = -1)
x |
Vector or matrix where rows represent samples, and columns represent variables. |
y |
Covariate vector. |
method |
Character string indicating the test to perform. One of:
|
permutations |
Number of permutations. Defaults to 999. |
parallel |
Logical. Whether to use parallel computing. Defaults to |
ncores |
Number of threads for parallel computing (used only if |
A list containing the test results and p-values.
Shao, X., and Zhang, J. (2014). Martingale difference correlation...
set.seed(123) x <- matrix(rnorm(50 * 5), nrow = 50) y <- rbinom(50, 1, 0.5) mdc_test(x, y, method = "FMDCU")set.seed(123) x <- matrix(rnorm(50 * 5), nrow = 50) y <- rbinom(50, 1, 0.5) mdc_test(x, y, method = "FMDCU")
mdd computes the squared martingale difference divergence (MDD) between response variable(s) Y
and explanatory variable(s) X, measuring conditional mean dependence.
mdd(X, Y, center = "U")mdd(X, Y, center = "U")
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
center |
Character string indicating the centering method to use. One of:
Default is |
Returns the squared Martingale Difference Divergence of Y given X.
Shao, X., and Zhang, J. (2014). Martingale difference correlation and its use in high-dimensional variable screening. Journal of the American Statistical Association, 109(507), 1302-1318. doi:10.1080/01621459.2014.887012.
# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # multivariate explanatory variables y_vec <- rbinom(n, 1, 0.5) # univariate response y_mat <- matrix(rnorm(n * 2), nrow = n) # multivariate response # Compute MDD with vector Y and U-centering mdd(x, y_vec, center = "U") # Compute MDD with matrix Y and double-centering mdd(x, y_mat, center = "D")# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # multivariate explanatory variables y_vec <- rbinom(n, 1, 0.5) # univariate response y_mat <- matrix(rnorm(n * 2), nrow = n) # multivariate response # Compute MDD with vector Y and U-centering mdd(x, y_vec, center = "U") # Compute MDD with matrix Y and double-centering mdd(x, y_mat, center = "D")
This function generates a plot comparing nonparametric and parametric estimations of cure probability in a univariate setting. The nonparametric estimate is displayed with 95% confidence bands, while the parametric estimate is based on a logit, probit or complementary log-log link. An optional covariate density curve can be added as a secondary axis.
plotCure( x, time, delta, main.title = NULL, title.x = NULL, model = "logit", theta = NULL, legend.pos = "bottom", density = TRUE, hsmooth = 10, npoints = 100 )plotCure( x, time, delta, main.title = NULL, title.x = NULL, model = "logit", theta = NULL, legend.pos = "bottom", density = TRUE, hsmooth = 10, npoints = 100 )
x |
A numeric vector containing the covariate values. |
time |
A numeric vector representing the observed survival times. |
delta |
A binary vector indicating the event status (1 = event, 0 = censored). |
main.title |
Character string for the main title of the plot. If |
title.x |
Character string for the x-axis label. If |
model |
A character string indicating the assumed model. Options include |
theta |
A numeric vector of length 2, specifying the coefficients for the logistic model to generate the parametric estimate. |
legend.pos |
A character string indicating the position of the legend. Options include |
density |
Logical; if |
hsmooth |
Numeric. Smoothing bandwidth parameter (h) for the cure probability estimator. |
npoints |
Integer. Number of points at which the estimator is evaluated over the covariate range. |
The function estimates the cure probability nonparametrically using the probcure function
and overlays it with a parametric estimate obtained from a logistic regression model.
Confidence intervals (95%) are included for the nonparametric estimate. Optionally,
the density of the covariate can be shown as a shaded area with a secondary y-axis.
A ggplot object representing the cure probability plot.
## Not run: set.seed(123) x <- runif(200) time <- rexp(200) delta <- rbinom(200, 1, 0.7) theta <- c(0.5, -1) plot.cure.uni(x = x, time = time, delta = delta, main.title = "Estimated Cure Probability", theta = theta, legend.pos = "bottom", density = FALSE) ## End(Not run)## Not run: set.seed(123) x <- runif(200) time <- rexp(200) delta <- rbinom(200, 1, 0.7) theta <- c(0.5, -1) plot.cure.uni(x = x, time = time, delta = delta, main.title = "Estimated Cure Probability", theta = theta, legend.pos = "bottom", density = FALSE) ## End(Not run)
pmdd measures conditional mean dependence of Y given X, adjusting for the dependence on Z.
pmdc(X, Y, Z)pmdc(X, Y, Z)
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
Z |
A vector or matrix where rows represent samples and columns represent variables. |
Returns the squared partial martingale difference correlation of Y given X, adjusting for the dependence on Z.
Park, T., Shao, X., and Yao, S. (2015). Partial martingale difference correlation. Electronic Journal of Statistics, 9(1), 1492-1517. doi:10.1214/15-EJS1047.
# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # explanatory variables y <- matrix(rnorm(n), nrow = n) # response variable z <- matrix(rnorm(n * 2), nrow = n) # conditioning variables # Compute partial MDD pmdd(x, y, z)# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # explanatory variables y <- matrix(rnorm(n), nrow = n) # response variable z <- matrix(rnorm(n * 2), nrow = n) # conditioning variables # Compute partial MDD pmdd(x, y, z)
pmdd measures conditional mean dependence of Y given X, adjusting for the dependence on Z.
pmdd(X, Y, Z)pmdd(X, Y, Z)
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
Z |
A vector or matrix where rows represent samples and columns represent variables. |
Returns the squared partial martingale difference divergence of Y given X, adjusting for the dependence on Z.
Park, T., Shao, X., and Yao, S. (2015). Partial martingale difference correlation. Electronic Journal of Statistics, 9(1), 1492-1517. doi:10.1214/15-EJS1047.
# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # explanatory variables y <- matrix(rnorm(n), nrow = n) # response variable z <- matrix(rnorm(n * 2), nrow = n) # conditioning variables # Compute partial MDD pmdd(x, y, z)# Generate example data set.seed(123) n <- 50 x <- matrix(rnorm(n * 5), nrow = n) # explanatory variables y <- matrix(rnorm(n), nrow = n) # response variable z <- matrix(rnorm(n * 2), nrow = n) # conditioning variables # Compute partial MDD pmdd(x, y, z)
Performs nonparametric hypothesis tests to evaluate the association between a covariate and the cure probability in mixture cure models. Several test statistics are supported, including martingale difference correlation (MDC)-based tests and an alternative GOFT test.
testcov( x, time, delta, h = NULL, method = "FMDCU", P = 999, parallel = TRUE, ncores = -1 )testcov( x, time, delta, h = NULL, method = "FMDCU", P = 999, parallel = TRUE, ncores = -1 )
x |
A numeric vector representing the covariate of interest. |
time |
A numeric vector of observed survival times. |
delta |
A binary vector indicating censoring status: |
h |
Bandwidth parameter for kernel smoothing. Either a positive numeric value, |
method |
Character string specifying the test to perform. One of:
Default is |
P |
Integer. Number of permutations or bootstrap replications used to compute the null distribution of the test statistic.
For methods |
parallel |
Logical. If |
ncores |
Integer. Number of cores to use for parallel computing. If |
The function computes a statistic, based on the methodology proposed by Monroy-Castillo et al.,
to test whether a covariate has an effect on the cure probability.
The main problem is that the response variable (cure indicator ) is partially observed due to censoring.
This is addressed by estimating the cure indicator using the methodology of Amico et al. (2021).
We define , with .
We assume and that follow-up is long enough so that for all .
Therefore, individuals with censored observed times greater than are considered cured ().
Four tests are proposed: three are based on the martingale difference correlation (MDC). For the MDCU and MDCV tests, the null distribution is approximated via a permutation procedure. To provide a faster alternative, a chi-squared approximation is implemented for the MDCU test statistic (FMDCU). Additionally, a modified version of the goodness-of-fit test proposed by Müller and Van Keilegom (2019) is included (GOFT). The test statistic is given by:
where denotes the nonparametric estimator of the cure probability under the alternative hypothesis,
and denotes the nonparametric estimator of the cure probability under the null hypothesis.
The approximation of the critical value for the test is done using the bootstrap procedure given in Section 3 of Müller and Van Keilegom (2019).
A list containing:
test_results: A list with the results (e.g., test statistics and p-values) of the selected test(s).
nu_hat: A numeric vector of estimated cure probabilities.
Amico, M, Van Keilegom, I. & Han, B. (2021). Assessing cure status prediction from survival data using receiver operating characteristic curves. Biometrika, 108(3), 727–740. doi:10.1093/biomet/asaa080
López-Cheda, A., Cao, R., Jácome, M. A., & Van Keilegom, I. (2016). Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Computational Statistics & Data Analysis, 100, 490–502. doi:10.1016/j.csda.2016.04.006
Müller, U.U, & Van Keilegom, I. (2019). Goodness-of-fit tests for the cure rate in a mixture cure model. Biometrika, 106, 211-227. doi:10.1093/biomet/asy058
Shao, X., & Zhang, J. (2014). Martingale difference correlation and its use in high-dimensional variable screening. Journal of the American Statistical Association, 105, 144-165. doi:10.1080/01621459.2014.887012
## Some artificial data set.seed(123) n <- 50 x <- runif(n, -2, 2) ## Covariate values y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes c <- rexp(n) ## Censoring values p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible u <- runif(n) t <- ifelse(u < p, pmin(y, c), c) ## Observed times d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator data <- data.frame(x = x, t = t, d = d) testcov(x, t, d)## Some artificial data set.seed(123) n <- 50 x <- runif(n, -2, 2) ## Covariate values y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes c <- rexp(n) ## Censoring values p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible u <- runif(n) t <- ifelse(u < p, pmin(y, c), c) ## Observed times d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator data <- data.frame(x = x, t = t, d = d) testcov(x, t, d)
Performs a permutation-based test assessing the association between a primary covariate (x) and the cure indicator, while adjusting for a secondary covariate (z).
The test calculates the p-value via permutation using the partial martingale difference correlation.
testcov2(x, time, z, delta, P = 999, H = NULL)testcov2(x, time, z, delta, P = 999, H = NULL)
x |
Numeric vector. The primary covariate whose association with the latent cure indicator is tested. |
time |
Numeric vector. Observed survival or censoring times. |
z |
Numeric vector. Secondary covariate for adjustment. |
delta |
Numeric vector. Censoring indicator (1 indicates event occurred, 0 indicates censored). |
P |
Integer. Number of permutations used to compute the permutation p-value. Default is 999. |
H |
Optional numeric. Bandwidth parameter (currently unused, reserved for future extensions). |
In order to test if the cure rate depends on the covariate given it depends on the covariate . The hypotheses are
The proxy of the cure rate under the null hypothesis is obtained by:
The statistic for testing the covariate hypothesis is based on partial martingale difference correlation and it is given by:
The null distribution is approximated using a permutation test.
List with components:
Numeric. The test statistic value.
Numeric. The permutation p-value assessing the null hypothesis of no association between x and the latent cure indicator, adjusting for z.
Park, T., Saho, X. & Yao, S. (2015). Partial martingale difference correlation. Electronic Journal of Statistics, 9, 1492–1517. doi:10.1214/15-EJS1047
pmdc for the partial martingale difference correlation, pmdd for the partial martingale difference divergence,
testcov for the test for one covariate.