Package 'hdbcp'

Title: Bayesian Change Point Detection for High-Dimensional Data
Description: Functions implementing change point detection methods using the maximum pairwise Bayes factor approach. Additionally, the package includes tools for generating simulated datasets for comparing and evaluating change point detection techniques.
Authors: JaeHoon Kim [aut, cre], KyoungJae Lee [aut, ths]
Maintainer: JaeHoon Kim <[email protected]>
License: GPL-3
Version: 1.0.0
Built: 2024-11-20 03:00:46 UTC
Source: https://github.com/jaehoonkim98/hdbcp

Help Index


Generate Simulated Datasets with Change Points in Covariance Matrix

Description

This function generates simulated datasets that include change points in the covariance matrix for change point detection. Users can specify various parameters to control the dataset size, dimension, size of signal, and change point locations. The generated datasets include datasets with and without change points, allowing for comparisons in simulation studies.

Usage

generate_cov_datasets(
  n,
  p,
  signal_size,
  sparse = TRUE,
  single_point = round(n/2),
  multiple_points = c(round(n/4), round(2 * n/4), round(3 * n/4)),
  type = c(1, 2, 3, 4, 5)
)

Arguments

n

Number of observations to generate.

p

Number of features or dimensions for each observation.

signal_size

Magnitude of the signal applied at change points.

sparse

Determines if a sparse covariance structure is used (default is TRUE).

single_point

Location of a single change point in the dataset (default is n/2).

multiple_points

Locations of multiple change points within the dataset (default is quartiles of n).

type

Integer vector specifying the type of dataset to return. Options are as follows: - 1: No change points (H0 data) - 2: Single change point with rare signals - 3: Single change point with many signals - 4: Multiple change points with rare signals - 5: Multiple change points with many signals

Value

A 3D array containing the generated datasets. Each slice represents a different dataset type.

Examples

# Generate a default dataset
datasets <- generate_cov_datasets(100, 50, 1)

null_data <- datasets[,,1]
single_many_data <- datasets[,,3]

Generate Simulated Datasets with Change Points in Mean Vector

Description

This function generates simulated datasets that include change points in the mean vector for change point detection. Users can specify various parameters to control the dataset size, dimension, size of signal, and change point locations. The generated datasets include datasets with and without change points, allowing for comparisons in simulation studies.

Usage

generate_mean_datasets(
  n = 500,
  p = 200,
  signal_size = 1,
  pre_proportion = 0.4,
  pre_value = 0.3,
  single_point = round(n/2),
  multiple_points = c(round(n/4), round(2 * n/4), round(3 * n/4)),
  type = c(1, 2, 3, 4, 5)
)

Arguments

n

Number of observations to generate.

p

Number of features or dimensions for each observation.

signal_size

Magnitude of the signal to apply at change points.

pre_proportion

Proportion of the covariance matrix's off-diagonal elements to be set to a pre-defined value (default is 0.4).

pre_value

Value assigned to selected off-diagonal elements of the covariance matrix (default is 0.3).

single_point

Location of a single change point in the dataset (default is n/2).

multiple_points

Locations of multiple change points within the dataset (default is quartiles of n).

type

Integer specifying the type of dataset to return. Options are as follows: - 1: No change points (H0 data) - 2: Single change point with rare signals - 3: Single change point with many signals - 4: Multiple change points with rare signals - 5: Multiple change points with many signals The default options are 1, 2, 3, 4, and 5.

Value

A 3D array containing the generated datasets. Each slice represents a different dataset type.

Examples

# Generate a default dataset
datasets <- generate_mean_datasets(100, 50, 1)

null_data <- datasets[,,1]
single_many_data <- datasets[,,3]

Majority Rule for Multiscale approach using mxPBF Results

Description

This function implements a majority rule-based post-processing approach to identify common change points across multiple window sizes from mxPBF results.

Usage

majority_rule_mxPBF(res_mxPBF)

Arguments

res_mxPBF

A list of results from mxPBF_mean() or mxPBF_cov().

Value

A vector of final detected change points that are common across multiple windows based on majority rule.

Examples

n <- 500
p <- 200
signal_size <- 1
pre_value <- 0.3
pre_proportion <- 0.4
given_data <- generate_mean_datasets(n, p, signal_size, pre_proportion, pre_value,
single_point = 250, multiple_points = c(150,300,350), type = 5)
nws <- c(25, 60, 100)
alps <- seq(1,10,0.05)
res_mxPBF <- mxPBF_mean(given_data, nws, alps)
majority_rule_mxPBF(res_mxPBF)

Multivariate Normal Random Number Generator

Description

Generates random numbers from a multivariate normal distribution with specified mean and covariance matrix using a C++ implementation.

Usage

mvrnorm_cpp(n = 1, mu, Sigma)

Arguments

n

The number of random samples to generate. Defaults to 1.

mu

The mean vector of the distribution.

Sigma

The covariance matrix of the distribution.

Value

A numeric matrix where each row is a random sample from the multivariate normal distribution.

Examples

# Example usage
mu <- c(0, 0)
Sigma <- matrix(c(1, 0.5, 0.5, 1), 2, 2)
mvrnorm_cpp(5, mu, Sigma)

Change Point Detection in Mean Structure using Maximum Pairwise Bayes Factor (mxPBF)

Description

This function detects change points in both mean and covariance structure of multivariate Gaussian data using the Maximum Pairwise Bayes Factor (mxPBF). The function selects alpha that controls the empirical False Positive Rate (FPR), as suggested in the paper. The function conducts a multiscale approach using the function.

Usage

mxPBF_combined(
  given_data,
  nws,
  alps,
  a0 = 0.01,
  b0 = 0.01,
  FPR_want = 0.05,
  n_sample = 300,
  n_cores = 1
)

Arguments

given_data

An (n×p)(n \times p) data matrix representing nn observations and pp variables.

nws

A set of window sizes for change point detection.

alps

A grid of alpha values used in the empirical False Positive Rate (FPR) method.

a0

A hyperparameter a0a_0 used in the mxPBF (default: 0.01).

b0

A hyperparameter b0b_0 used in the mxPBF (default: 0.01).

FPR_want

Desired False Positive Rate for selecting alpha, used in the empirical FPR method (default: 0.05).

n_sample

Number of simulated samples to estimate the empirical FPR, used in the empirical FPR method (default: 300).

n_cores

Number of threads for parallel execution via OpenMP (default: 1).

Value

A list provided. Each element in the list contains:

Result_cov

A list result from the mxPBF_cov() function.

Result_mean

A list result from the mxPBF_mean() function applied to each segmented data.

Change_points_cov

Locations of detected change points identified by mxPBF_cov() function.

Change_points_mean

Locations of detected change points identified by mxPBF_mean() function.

Examples

nws <- c(25, 60, 100)
alps <- seq(1,10,0.05)
## H0 data
mu1 <- rep(0,10)
sigma1 <- diag(10)
X <- mvrnorm_cpp(500, mu1, sigma1)
res1 <- mxPBF_combined(X, nws, alps)

## H1 data
mu2 <- rep(1,10)
sigma2 <- diag(10)
for (i in 1:10) {
  for (j in i:10) {
    if (i == j) {
    next
    } else {
    cov_value <- rnorm(1, 1, 1)
    sigma2[i, j] <- cov_value
    sigma2[j, i] <- cov_value
    }
  }
}
sigma2 <- sigma2 + (abs(min(eigen(sigma2)$value))+0.1)*diag(10) # Make it nonsingular
Y1 <- mvrnorm_cpp(150, mu1, sigma1)
Y2 <- mvrnorm_cpp(150, mu2, sigma1)
Y3 <- mvrnorm_cpp(200, mu2, sigma2)
Y <- rbind(Y1, Y2, Y3)
res2 <- mxPBF_combined(Y, nws, alps)

Change Point Detection in Covaraiance Structure using Maximum Pairwise Bayes Factor (mxPBF)

Description

This function detects change points in the covariance structure of multivariate Gaussian data using the Maximum Pairwise Bayes Factor (mxPBF). The function selects alpha that controls the empirical False Positive Rate (FPR), as suggested in the paper. One can conduct a multiscale approach using the function majority_rule_mxPBF().

Usage

mxPBF_cov(
  given_data,
  nws,
  alps,
  a0 = 0.01,
  b0 = 0.01,
  FPR_want = 0.05,
  n_sample = 300,
  n_cores = 1
)

Arguments

given_data

An (n×p)(n \times p) data matrix representing nn observations and pp variables.

nws

A set of window sizes for change point detection.

alps

A grid of alpha values used in the empirical False Positive Rate (FPR) method.

a0

A hyperparameter a0a_0 used in the mxPBF (default: 0.01).

b0

A hyperparameter b0b_0 used in the mxPBF (default: 0.01).

FPR_want

Desired False Positive Rate for selecting alpha, used in the empirical FPR method (default: 0.05).

n_sample

Number of simulated samples to estimate the empirical FPR, used in the empirical FPR method (default: 300).

n_cores

Number of threads for parallel execution via OpenMP (default: 1).

Value

A list of length equal to the number of window sizes provided. Each element in the list contains:

Change_points

Locations of detected change points.

Bayes_Factors

Vector of calculated Bayes Factors for each middle points.

Selected_alpha

Optimal alpha value selected based on the method that controls the empirical FPR.

Window_size

Window size used for change point detection.

Examples

nws <- c(25, 60, 100)
alps <- seq(1,10,0.05)
## H0 data
mu <- rep(0,10)
sigma1 <- diag(10)
X <- mvrnorm_cpp(500, mu, sigma1)
res1 <- mxPBF_cov(X, nws, alps)

## H1 data
mu <- rep(0,10)
sigma2 <- diag(10)
for (i in 1:10) {
  for (j in i:10) {
    if (i == j) {
    next
    } else {
    cov_value <- rnorm(1, 1, 1)
    sigma2[i, j] <- cov_value
    sigma2[j, i] <- cov_value
    }
  }
}
sigma2 <- sigma2 + (abs(min(eigen(sigma2)$value))+0.1)*diag(10) # Make it nonsingular
Y1 <- mvrnorm_cpp(250, mu, sigma1)
Y2 <- mvrnorm_cpp(250, mu, sigma2)
Y <- rbind(Y1, Y2)
res2 <- mxPBF_cov(Y, nws, alps)

Change Point Detection in Mean Structure using Maximum Pairwise Bayes Factor (mxPBF)

Description

This function detects change points in the mean structure of multivariate Gaussian data using the Maximum Pairwise Bayes Factor (mxPBF). The function selects alpha that controls the empirical False Positive Rate (FPR), as suggested in the paper. One can conduct a multiscale approach using the function majority_rule_mxPBF().

Usage

mxPBF_mean(given_data, nws, alps, FPR_want = 0.05, n_sample = 300, n_cores = 1)

Arguments

given_data

An (n×p)(n \times p) data matrix representing nn observations and pp variables.

nws

A set of window sizes for change point detection.

alps

A grid of alpha values used in the empirical False Positive Rate (FPR) method.

FPR_want

Desired False Positive Rate for selecting alpha, used in the empirical FPR method (default: 0.05).

n_sample

Number of simulated samples to estimate the empirical FPR, used in the empirical FPR method (default: 300).

n_cores

Number of threads for parallel execution via OpenMP (default: 1).

Value

A list of length equal to the number of window sizes provided. Each element in the list contains:

Change_points

Locations of detected change points.

Bayes_Factors

Vector of calculated Bayes Factors for each middle points.

Selected_alpha

Optimal alpha value selected based on the method that controls the empirical FPR.

Window_size

Window size used for change point detection.

Examples

nws <- c(25, 60, 100)
alps <- seq(1,10,0.05)
## H0 data
mu1 <- rep(0,10)
sigma <- diag(10)
X <- mvrnorm_cpp(500, mu1, sigma)
res1 <- mxPBF_mean(X, nws, alps)

## H1 data
mu2 <- rep(1,10)
sigma <- diag(10)
Y <- rbind(mvrnorm_cpp(250,mu1,sigma), mvrnorm_cpp(250,mu2,sigma))
res2 <- mxPBF_mean(Y, nws, alps)