Title: | Bagging Bandwidth Selection in Kernel Density and Regression Estimation |
---|---|
Description: | Bagging bandwidth selection methods for the Parzen-Rosenblatt and Nadaraya-Watson estimators. These bandwidth selectors can achieve greater statistical precision than their non-bagged counterparts while being computationally fast. See Barreiro-Ures et al. (2020) <doi:10.1093/biomet/asaa092> and Barreiro-Ures et al. (2021) <doi:10.48550/arXiv.2105.04134>. |
Authors: | Daniel Barreiro-Ures [aut], Ruben Fernandez-Casal [aut, cre], Jeffrey Hart [aut], Ricardo Cao [aut], Mario Francisco-Fernandez [aut] |
Maintainer: | Ruben Fernandez-Casal <[email protected]> |
License: | GPL-3 |
Version: | 1.1 |
Built: | 2025-02-22 04:55:12 UTC |
Source: | https://github.com/rubenfcasal/baggingbwsel |
This package implements bagging bandwidth selection methods for the Parzen-Rosenblatt kernel density estimator, and for the Nadaraya-Watson and local polynomial kernel regression estimators. These bandwidth selectors can achieve greater statistical precision than their non-bagged counterparts while being computationally fast. See Barreiro-Ures et al. (2021a) and Barreiro-Ures et al. (2021b).
Maintainer: Ruben Fernandez-Casal [email protected]
Authors:
Daniel Barreiro-Ures [email protected]
Jeffrey Hart
Ricardo Cao
Mario Francisco-Fernandez
Barreiro-Ures, D., Cao, R., Francisco-Fernández, M., & Hart, J. D. (2021a). Bagging cross-validated bandwidths with application to big data. Biometrika, 108(4), 981-988, doi:10.1093/biomet/asaa092.
Barreiro-Ures, D., Cao, R., & Francisco-Fernández, M. (2021b). Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples. arXiv preprint, doi:10.48550/arXiv.2105.04134.
Useful links:
Report bugs at https://github.com/rubenfcasal/baggingbwsel/issues/
Bagged CV bandwidth selector for Parzen-Rosenblatt estimator
bagcv(x, r, s, h0, h1, nb = r, ncores = parallel::detectCores())
bagcv(x, r, s, h0, h1, nb = r, ncores = parallel::detectCores())
x |
Vector. Sample. |
r |
Positive integer. Size of the subsamples. |
s |
Positive integer. Number of subsamples. |
h0 |
Positive real number. Range over which to minimize, left bound. |
h1 |
Positive real number. Range over which to minimize, right bound. |
nb |
Positive integer. Number of bins. |
ncores |
Positive integer. Number of cores with which to parallelize the computations. |
Bagged cross-validation bandwidth selector for the Parzen-Rosenblatt estimator.
Bagged CV bandwidth.
set.seed(1) x <- rnorm(10^6) bagcv(x, 5000, 100, 0.01, 1, 1000, 2)
set.seed(1) x <- rnorm(10^6) bagcv(x, 5000, 100, 0.01, 1, 1000, 2)
Bagged CV bandwidth selector for local polynomial kernel regression.
bagreg( x, y, r, s, h0, h1, nb = r, ncores = parallel::detectCores(), poly.index = 0 )
bagreg( x, y, r, s, h0, h1, nb = r, ncores = parallel::detectCores(), poly.index = 0 )
x |
Covariate vector. |
y |
Response vector. |
r |
Positive integer. Size of the subsamples. |
s |
Positive integer. Number of subsamples. |
h0 |
Positive real number. Range over which to minimize, left bound. |
h1 |
Positive real number. Range over which to minimize, right bound. |
nb |
Positive integer. Number of bins to use in cross-validation. |
ncores |
Positive integer. Number of cores with which to parallelize the computations. |
poly.index |
Non-negative integer defining local constant (0) or local linear (1) smoothing. Default value: 0 (Nadaraya-Watson estimator). |
Bagged cross-validation bandwidth selector for local polynomial kernel regression.
Bagged CV bandwidth.
set.seed(1) x <- rnorm(10^5) y <- 2*x+rnorm(1e5,0,0.5) bagreg(x, y, 1000, 10, 0.01, 1, 1000, 2)
set.seed(1) x <- rnorm(10^5) y <- 2*x+rnorm(1e5,0,0.5) bagreg(x, y, 1000, 10, 0.01, 1, 1000, 2)
Bagging bootstrap bandwidth selector for Parzen-Rosenblatt estimator
hboot_bag( x, m = n, N = 1, nb = 1000L, g, lower, upper, ncores = parallel::detectCores(logical = FALSE) )
hboot_bag( x, m = n, N = 1, nb = 1000L, g, lower, upper, ncores = parallel::detectCores(logical = FALSE) )
x |
Vector. Sample. |
m |
Positive integer. Size of the subsamples. |
N |
Positive integer. Number of subsamples. |
nb |
Positive integer. Number of bins. |
g |
Positive real number. Pilot bandwidth. |
lower |
Positive real number. Range over which to minimize, left bound. |
upper |
Positive real number. Range over which to minimize, right bound. |
ncores |
Positive integer. Number of cores with which to parallelize the computations. |
Bagging bootstrap bandwidth selector for the Parzen-Rosenblatt estimator.
Bagged CV bandwidth.
set.seed(1) x <- rnorm(10^5) hboot_bag(x, 5000, 10, 1000, lower=0.001, upper=1, ncores=2)
set.seed(1) x <- rnorm(10^5) hboot_bag(x, 5000, 10, 1000, lower=0.001, upper=1, ncores=2)
Generalized bagging CV bandwidth selector for Parzen-Rosenblatt estimator
hsss_dens(x, r, s, nb = r, h0, h1, ncores = parallel::detectCores())
hsss_dens(x, r, s, nb = r, h0, h1, ncores = parallel::detectCores())
x |
Vector. Sample. |
r |
Positive integer. Size of the subsamples. |
s |
Positive integer. Number of subsamples. |
nb |
Positive integer. Number of bins. |
h0 |
Positive real number. Range over which to minimize, left bound. |
h1 |
Positive real number. Range over which to minimize, right bound. |
ncores |
Positive integer. Number of cores with which to parallelize the computations. |
Generalized bagging cross-validation bandwidth selector for the Parzen-Rosenblatt estimator.
Bagged CV bandwidth.
set.seed(1) x <- rnorm(10^5) hsss_dens(x, 5000, 100, 1000, 0.001, 1, 2)
set.seed(1) x <- rnorm(10^5) hsss_dens(x, 5000, 100, 1000, 0.001, 1, 2)
Estimation of the optimal subsample size for bagged CV bandwidth for Parzen-Rosenblatt estimator
mopt(x, N, r = 1000, s = 100, ncores = parallel::detectCores())
mopt(x, N, r = 1000, s = 100, ncores = parallel::detectCores())
x |
Vector. Sample. |
N |
Positive integer. Number of subsamples for the bagged bandwidth. |
r |
Positive integer. Size of the subsamples. |
s |
Positive integer. Number of subsamples. |
ncores |
Positive integer. Number of cores with which to parallelize the computations. |
Estimates the optimal size of the subsamples for the bagged CV bandwidth selector for the Parzen-Rosenblatt estimator.
Estimate of the optimal subsample size.
set.seed(1) x <- rt(10^5, 5) mopt(x, 500, 500, 10, 2)
set.seed(1) x <- rt(10^5, 5) mopt(x, 500, 500, 10, 2)
Second order bagging CV bandwidth selector for Parzen-Rosenblatt estimator
tss_dens(x, r, s, h0, h1, nb = 1000, ncores = 1)
tss_dens(x, r, s, h0, h1, nb = 1000, ncores = 1)
x |
Vector. Sample. |
r |
Vector. The two subsample sizes. |
s |
Positive integer. Number of subsamples. |
h0 |
Positive real number. Range over which to minimize, left bound. |
h1 |
Positive real number. Range over which to minimize, right bound. |
nb |
Positive integer. Number of bins. |
ncores |
Positive integer. Number of cores with which to parallelize the computations. |
Second order bagging cross-validation bandwidth selector for the Parzen-Rosenblatt estimator.
Second order bagging CV bandwidth.
set.seed(1) x <- rnorm(10^5) tss_dens(x, 5000, 10, 0.01, 1, 1000, 2)
set.seed(1) x <- rnorm(10^5) tss_dens(x, 5000, 10, 0.01, 1, 1000, 2)