Title: | Sample Size Analysis for Psychological Networks and More |
---|---|
Description: | An implementation of the sample size computation method for network models proposed by Constantin et al. (2021) <doi:10.31234/osf.io/j5v7u>. The implementation takes the form of a three-step recursive algorithm designed to find an optimal sample size given a model specification and a performance measure of interest. It starts with a Monte Carlo simulation step for computing the performance measure and a statistic at various sample sizes selected from an initial sample size range. It continues with a monotone curve-fitting step for interpolating the statistic across the entire sample size range. The final step employs stratified bootstrapping to quantify the uncertainty around the fitted curve. |
Authors: | Mihai Constantin [aut, cre] |
Maintainer: | Mihai Constantin <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.8.6 |
Built: | 2024-11-02 04:14:13 UTC |
Source: | https://github.com/mihaiconstantin/powerly |
Generate matrices of true model parameters for the supported true models.
These matrices are intended to passed to the model_matrix
argument of
powerly()
.
generate_model(type, ...)
generate_model(type, ...)
type |
Character string representing the type of true model. Possible
values are |
... |
Required arguments used for the generation of the true model. See
the True Models section of |
A matrix containing the model parameters.
This function plots the results for each step of the method.
## S3 method for class 'Method' plot( x, step = 3, last = TRUE, save = FALSE, path = NULL, width = 14, height = 10, ... )
## S3 method for class 'Method' plot( x, step = 3, last = TRUE, save = FALSE, path = NULL, width = 14, height = 10, ... )
x |
An object instance of class |
step |
A single positive integer representing the method step that
should be plotted. Possibles values are |
last |
A logical value indicating whether the last iteration of the
method should be plotted. The default is |
save |
A logical value indicating whether the plot should be saved to a file on disk. |
path |
A character string representing the path (i.e., including the
filename and extension) where the plot should be saved on disk. If |
width |
A single numerical value representing the desired plot width.
The default unit is inches (i.e., set by |
height |
A single numerical value representing the desired plot height.
The default unit is inches (i.e., set by |
... |
Optional arguments to be passed to |
An ggplot2::ggplot object containing the plot for the requested step of the method. The plot object returned can be further modified and also contains the patchwork class applied.
Example of a plot for each step of the method: Step 1: Monte Carlo Replications Step 2: Curve Fitting Step 3: Bootstrapping
This function plots the results for of a sample size analysis validation.
## S3 method for class 'Validation' plot(x, save = FALSE, path = NULL, width = 14, height = 10, bins = 20, ...)
## S3 method for class 'Validation' plot(x, save = FALSE, path = NULL, width = 14, height = 10, bins = 20, ...)
x |
An object instance of class |
save |
A logical value indicating whether the plot should be saved to a file on disk. |
path |
A character string representing the path (i.e., including the
filename and extension) where the plot should be saved on disk. If |
width |
A single numerical value representing the desired plot width.
The default unit is inches (i.e., set by |
height |
A single numerical value representing the desired plot height.
The default unit is inches (i.e., set by |
bins |
A single positive integer passed to |
... |
Optional arguments to be passed to |
An ggplot2::ggplot object containing the plot for the validation procedure. The plot object returned can be further modified and also contains the patchwork class applied.
Example of a validation plot:
summary.Validation()
, validate()
Run an iterative three-step Monte Carlo method and return the sample sizes required to obtain a certain value for a performance measure of interest (e.g., sensitivity) given a hypothesized network structure.
powerly( range_lower, range_upper, samples = 30, replications = 30, model = "ggm", ..., model_matrix = NULL, measure = "sen", statistic = "power", measure_value = 0.6, statistic_value = 0.8, monotone = TRUE, increasing = TRUE, spline_df = NULL, solver_type = "quadprog", boots = 10000, lower_ci = 0.025, upper_ci = 0.975, tolerance = 50, iterations = 10, cores = NULL, backend_type = NULL, save_memory = FALSE, verbose = TRUE )
powerly( range_lower, range_upper, samples = 30, replications = 30, model = "ggm", ..., model_matrix = NULL, measure = "sen", statistic = "power", measure_value = 0.6, statistic_value = 0.8, monotone = TRUE, increasing = TRUE, spline_df = NULL, solver_type = "quadprog", boots = 10000, lower_ci = 0.025, upper_ci = 0.975, tolerance = 50, iterations = 10, cores = NULL, backend_type = NULL, save_memory = FALSE, verbose = TRUE )
range_lower |
A single positive integer representing the lower bound of the candidate sample size range. |
range_upper |
A single positive integer representing the upper bound of the candidate sample size range. |
samples |
A single positive integer representing the number of sample sizes to select from the candidate sample size range. |
replications |
A single positive integer representing the number of Monte Carlo replications to perform for each sample size selected from the candidate range. |
model |
A character string representing the type of true model to find a
sample size for. Possible values are |
... |
Required arguments used for the generation of the true model. See the True Models section for the arguments required for each true model. |
model_matrix |
A square matrix representing the true model. See the True Models section for what this matrix should look like depending on the true model selected. |
measure |
A character string representing the type of performance
measure of interest. Possible values are |
statistic |
A character string representing the type of statistic to be
computed on the values obtained for the performance measures. Possible values
are |
measure_value |
A single numerical value representing the desired value
for the performance measure of interest. The default is |
statistic_value |
A single numerical value representing the desired
value for the statistic of interest. The default is |
monotone |
A logical value indicating whether a monotonicity assumption
should be placed on the values of the performance measure. The default is
|
increasing |
A logical value indicating whether the performance measure
is assumed to follow a non-increasing or non-decreasing trend. |
spline_df |
A vector of positive integers representing the degrees of
freedom considered for constructing the spline basis, or |
solver_type |
A character string representing the type of the quadratic
solver used for estimating the spline coefficients. Currently only
|
boots |
A positive integer representing the number of bootstrap runs to
perform on the matrix of performance measures in order to obtained
bootstrapped values for the statistic of interest. The default is |
lower_ci |
A single numerical value indicating the lower bound for the
confidence interval to be computed on the bootstrapped statistics. The
default is |
upper_ci |
A single numerical value indicating the upper bound for the
confidence to be computed on the bootstrapped statistics. The default is
|
tolerance |
A single positive integer representing the width at the
candidate sample size range at which the algorithm is considered to have
converge. The default is |
iterations |
A single positive integer representing the number of
iterations the algorithm is allowed to run. The default is |
cores |
A single positive positive integer representing the number of
cores to use for running the algorithm in parallel, or |
backend_type |
A character string indicating the type of cluster to
create for running the algorithm in parallel, or |
save_memory |
A logical value indicating whether to save memory by only
storing the results for the last iteration of the method. The default |
verbose |
A logical value indicating whether information about the
status of the algorithm should be printed while running. The default is
|
This function represents the implementation of the method introduced by Constantin et al. (2021; see doi:10.31234/osf.io/j5v7u) for performing a priori sample size analysis in the context of network models. The method takes the form of a three-step recursive algorithm designed to find an optimal sample size value given a model specification and an outcome measure of interest (e.g., sensitivity). It starts with a Monte Carlo simulation step for computing the outcome of interest at various sample sizes. It continues with a monotone non-decreasing curve-fitting step for interpolating the outcome. The final step employs a stratified bootstrapping scheme to account for the uncertainty around the recommendation provided. The method runs the three steps recursively until the candidate sample size range used for the search shrinks below a specified value.
An R6::R6Class()
instance of Method
class that contains the results for
each step of the method for the last and previous iterations.
Main fields:
$duration
: The time in seconds elapsed during the method run.
$iteration
: The number of iterations performed.
$converged
: Whether the method converged.
$previous
: The results during the previous iteration.
$range
: The candidate sample size range.
$step_1
: The results for Step 1.
$step_2
: The results for Step 2.
$step_3
: The results for Step 3.
$recommendation
: The sample size recommendation(s).
The plot
method can be called on the return value to visualize the results.
See plot.Method()
for more information on how to plot the method
results.
for Step 1: plot(results, step = 1)
for Step 2: plot(results, step = 2)
for Step 3: plot(results, step = 3)
Gaussian Graphical Model
type: cross-sectional
symbol: ggm
...
arguments for generating true models:
nodes
: A single positive integer representing the number of nodes in the network (e.g., 10
).
density
: A single numerical value indicating the density of the network (e.g., 0.4
).
positive
: A single numerical value representing the proportion of positive edges in the network (e.g., 0.9
for 90% positive edges).
range
: A length two numerical value indicating the uniform interval from where to sample values for the partial correlations coefficients (e.g., c(0.5, 1)
).
constant
: A single numerical value representing the constant described by Yin and Li (2011).
for more information on the arguments see:
the function bootnet::genGGM()
Yin, J., and Li, H. (2011). A sparse conditional gaussian graphical model for analysis of genetical genomics data. The annals of applied statistics, 5(4), 2630.
supported performance measures: sen
, spe
, mcc
, rho
Performance Measure | Symbol | Lower | Upper |
Sensitivity | sen |
0.00 |
1.00 |
Specificity | spe |
0.00 |
1.00 |
Matthews correlation | mcc |
-1.00 |
1.00 |
Pearson correlation | rho |
-1.00 |
1.00 |
Statistics | Symbol | Lower | Upper |
Power | power |
0.00 |
1.00 |
If you would like to support a new model, performance measure, or statistic, please open a pull request on GitHub at github.com/mihaiconstantin/powerly/pulls.
To request a new model, performance measure, or statistic, please submit your request at github.com/mihaiconstantin/powerly/issues. If possible, please also include references discussing the topics you are requesting.
Alternatively, you can get in touch at mihai at mihaiconstantin dot com
.
Constantin, M. A., Schuurman, N. K., & Vermunt, J. (2021). A General Monte Carlo Method for Sample Size Analysis in the Context of Network Models. PsyArXiv. doi:10.31234/osf.io/j5v7u
plot.Method()
, summary.Method()
, validate()
, generate_model()
# Suppose we want to find the sample size for observing a sensitivity of `0.6` # with a probability of `0.8`, for a GGM true model consisting of `10` nodes # with a density of `0.4`. # We can run the method for an arbitrarily generated true model that matches # those characteristics (i.e., number of nodes and density). results <- powerly( range_lower = 300, range_upper = 1000, samples = 30, replications = 30, measure = "sen", statistic = "power", measure_value = .6, statistic_value = .8, model = "ggm", nodes = 10, density = .4, cores = 2, verbose = TRUE ) # Or we omit the `nodes` and `density` arguments and specify directly the edge # weights matrix via the `model_matrix` argument. # To get a matrix of edge weights we can use the `generate_model()` function. true_model <- generate_model(type = "ggm", nodes = 10, density = .4) # Then, supply the true model to the algorithm directly. results <- powerly( range_lower = 300, range_upper = 1000, samples = 30, replications = 30, measure = "sen", statistic = "power", measure_value = .6, statistic_value = .8, model = "ggm", model_matrix = true_model, cores = 2, verbose = TRUE ) # To visualize the results, we can use the `plot` S3 method and indicating the # step that should be plotted. plot(results, step = 1) plot(results, step = 2) plot(results, step = 3) # To see a summary of the results, we can use the `summary` S3 method. summary(results)
# Suppose we want to find the sample size for observing a sensitivity of `0.6` # with a probability of `0.8`, for a GGM true model consisting of `10` nodes # with a density of `0.4`. # We can run the method for an arbitrarily generated true model that matches # those characteristics (i.e., number of nodes and density). results <- powerly( range_lower = 300, range_upper = 1000, samples = 30, replications = 30, measure = "sen", statistic = "power", measure_value = .6, statistic_value = .8, model = "ggm", nodes = 10, density = .4, cores = 2, verbose = TRUE ) # Or we omit the `nodes` and `density` arguments and specify directly the edge # weights matrix via the `model_matrix` argument. # To get a matrix of edge weights we can use the `generate_model()` function. true_model <- generate_model(type = "ggm", nodes = 10, density = .4) # Then, supply the true model to the algorithm directly. results <- powerly( range_lower = 300, range_upper = 1000, samples = 30, replications = 30, measure = "sen", statistic = "power", measure_value = .6, statistic_value = .8, model = "ggm", model_matrix = true_model, cores = 2, verbose = TRUE ) # To visualize the results, we can use the `plot` S3 method and indicating the # step that should be plotted. plot(results, step = 1) plot(results, step = 2) plot(results, step = 3) # To see a summary of the results, we can use the `summary` S3 method. summary(results)
This function summarizes the objects of class Method
and provides
information about the method run and the sample size recommendation.
## S3 method for class 'Method' summary(object, ...)
## S3 method for class 'Method' summary(object, ...)
object |
An object instance of class |
... |
Other optional arguments currently not in use. |
This function summarizes the objects of class Validation
providing
information about the validation procedure and results.
## S3 method for class 'Validation' summary(object, ...)
## S3 method for class 'Validation' summary(object, ...)
object |
An object instance of class |
... |
Other optional arguments currently not in use. |
This function can be used to validate the recommendation obtained from a sample size analysis.
validate( method, replications = 3000, cores = NULL, backend_type = NULL, verbose = TRUE )
validate( method, replications = 3000, cores = NULL, backend_type = NULL, verbose = TRUE )
method |
An object of class |
replications |
A single positive integer representing the number of
Monte Carlo simulations to perform for the recommended sample size. The
default is |
cores |
A single positive positive integer representing the number of
cores to use for running the validation in parallel, or |
backend_type |
A character string indicating the type of cluster to
create for running the validation in parallel, or |
verbose |
A logical value indicating whether information about the
status of the validation should be printed while running. The default is
|
The sample sizes used during the validation procedure is automatically extracted
from the method
argument.
An R6::R6Class()
instance of Validation
class that contains the results
of the validation.
Main fields:
$sample
: The sample size used for the validation.
$measures
: The performance measures observed during validation.
$statistic
: The statistic computed on the performance measures.
$percentile_value
: The performance measure value at the desired percentile.
$validator
: An R6::R6Class()
instance of StepOne
class.
The plot
S3 method can be called on the return value to visualize the
validation results (i.e., see plot.Validation()
).
plot(validation)
plot.Validation()
, summary.Validation()
, powerly()
, generate_model()
# Perform a sample size analysis. results <- powerly( range_lower = 300, range_upper = 1000, samples = 30, replications = 30, measure = "sen", statistic = "power", measure_value = .6, statistic_value = .8, model = "ggm", nodes = 10, density = .4, cores = 2, verbose = TRUE ) # Validate the recommendation obtained during the analysis. validation <- validate(results, cores = 2) # Plot the validation results. plot(validation) # To see a summary of the validation procedure, we can use the `summary` S3 method. summary(validation)
# Perform a sample size analysis. results <- powerly( range_lower = 300, range_upper = 1000, samples = 30, replications = 30, measure = "sen", statistic = "power", measure_value = .6, statistic_value = .8, model = "ggm", nodes = 10, density = .4, cores = 2, verbose = TRUE ) # Validate the recommendation obtained during the analysis. validation <- validate(results, cores = 2) # Plot the validation results. plot(validation) # To see a summary of the validation procedure, we can use the `summary` S3 method. summary(validation)