Introduction
This vignette focuses on how to conduct simulations using the metaROC package. One of the major reasons for the development of this package was to tailor it specifically to this purpose. Therefore, there is plenty to talk about.
First of all, we load the package and then set a random seed. This is specifically important to reproduce simulation results for result consistency.
The package provides two types of simulations, following the phases of methodological research in Biostatistics described by Heinze et al. (2024). Phase II simulations offer an initial look at how a method behaves in practice, focusing on a small number of carefully chosen, controlled scenarios. Phase III simulations extend this by exploring robustness across a wider range of settings that more closely resemble the diversity and complexity encountered in real applications.
It is important to note that STM can only be used for estimation on Phase II simulation data if the diagnostic threshold is known in each study, for which the STM should estimate the model. This is due to the fact that data on multiple thresholds may be simulated in each study, but the STM are only able to consider a single one of them.
In the following, we first demonstrate how to generate data from both
simulation phases separately using the simulate_metaROC()
function. We then conduct our own simulation studies using the main
function of the package, metaROC(), which allows data to be
simulated and models to be estimated or evaluated.
Phase II simulation
In a way, Phase II simulations serve as an early reality check. They are usually designed around a limited set of scenarios that highlight key properties of the method or situations where it is expected to perform well. The emphasis is less on exhaustiveness and more on understanding basic behavior, potential advantages, and obvious failure modes, often through focused comparisons with existing approaches.
For this phase, we can specify the particular model from which data
should be simulated. The available options are
"stoye2024cloglog", "stoye2024logit",
"steinhauser2016logitlmm",
"hoyer2018weibullaft", and
"stoye2024copula".
In general, the simulate_metaROC() function first
requires the type of simulation, which in this case is set via
type = "phaseII". The number of simulations is controlled
by the argument n_sims, which can be set to any desired
integer \(\ge 1\).
Depending on the chosen model, the structure of the model-specific
parameter (model_pars) list within control_sim
differs. For the mentioned models for Phase II simulated data, preset
configurations are also available to target specific true AUC
values.
These parameters control the structure and range of the simulated
data. n_studies_min and n_studies_max set the
minimum and maximum number of studies per dataset, while
n_subj_min and n_subj_max define the range for
the number of subjects in each study. Similarly,
n_thresholds_min and n_thresholds_max
determine how many thresholds each study reports, with
thresholds_lower_bound and
thresholds_upper_bound specifying the allowable range of
threshold values. Finally, true_preval_min and
true_preval_max set the range of true prevalence values in
the simulated datasets.
Before running simulations and looking at the output, we will first
look at how to set up control_sim for each model and how to
make use of the available presets.
Discrete GLMMs
For the discrete GLMMs (Stoye et al.
2024), we need to specify several parameters.
beta11, beta12, and beta13 are
the intercept, linear, and quadratic parameters that describe the true
dependence of categorical covariates in the linear term of the GLMM for
the non-diseased population. beta21 and beta22
are the intercept and linear parameter for the corresponding dependence
in the diseased population. This means that we assume a true underlying
quadratic structure for the discrete hazard of diseased and a linear
structure for non-diseased individuals. Finally, var_rand_H
and var_rand_D define the true variances of the bivariate
random effects for the non-diseased (H) and diseased (D) populations.
The correlation parameter in the random effect is fixed to \(\rho=0.86\).
control_sim_cloglog = list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 20, n_subj_max = 500,
n_thresholds_min = 1, n_thresholds_max = 4,
thresholds_lower_bound = 5, thresholds_upper_bound = 7,
true_preval_min = 0.1, true_preval_max = 0.3,
model = "stoye2024cloglog",
model_pars = list(beta11 = -38.2, beta12 = 10.26,
beta13 = -0.669,
beta21 = 3.516, beta22 = -0.306,
var_rand_H = 0.6561, var_rand_D = 0.9216))Now to check if this works correctly:
sim_stoye_cloglog <- simulate_metaROC(type = "phaseII", n_sims = 5,
control_sim = control_sim_cloglog)Now that we have defined control_sim and generated our
simulated dataset, we can try using a preset with the
"stoye2024logit" model. Presets are only available for true
AUC values of "65", "75", or "85"
(corresponding to 0.65, 0.75, 0.85), and the selected preset will ensure
that the true AUC matches the desired value. Internally, the preset is
computed using metaROC:::simulation_preset(). Here, we will
try a preset with a true AUC value of "85":
control_sim_logit = list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 20, n_subj_max = 500,
n_thresholds_min = 1, n_thresholds_max = 4,
thresholds_lower_bound = 5, thresholds_upper_bound = 7,
true_preval_min = 0.1, true_preval_max = 0.3,
model = "stoye2024cloglog",
model_pars = list(preset = TRUE, AUC = "85"))We can then check again to confirm that the preset has been applied correctly:
sim_stoye_logit <- simulate_metaROC(type = "phaseII", n_sims = 5,
control_sim = control_sim_logit)Using a preset has the clear advantage of making the
control_sim list simpler to specify. It is also useful if
you have a specific true AUC value in mind, as manually calibrating the
model parameters can be tedious.
Weibull AFT model
For the "hoyer2018weibullaft" model (Hoyer et al. 2018), beta00 and
lambda0 correspond to the location and scale of the true
Weibull distribution for test values in the non-diseased population.
beta01 and lambda1 are the corresponding
parameters for the diseased population. These parameters determine where
the distributions are centered and how spread out they are.
var_rand_H and var_rand_D represent the
true variances of the bivariate random effects for the non-diseased (H)
and diseased (D) populations, respectively. They account for
between-study variability in the distributions of test values:
control_sim_weibaft = list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 20, n_subj_max = 500,
n_thresholds_min = 1, n_thresholds_max = 4,
thresholds_lower_bound = 5, thresholds_upper_bound = 7,
true_preval_min = 0.1, true_preval_max = 0.3,
model = "hoyer2018weibullaft",
model_pars = list(beta00 = -3, lambda0 = 5,
beta01 = -5, lambda1 = 6,
var_rand_H = 0.6561, var_rand_D = 0.9216))Now to check if this has worked correctly:
sim_weibaft <- simulate_metaROC(type = "phaseII", n_sims = 5,
control_sim = control_sim_weibaft)Now that we have defined control_sim and generated our
simulated dataset, we can try using a preset with the
"hoyer2018weibullaft" model. Here, we will try a preset
with a true AUC value of "75":
control_sim_weibaft_pres = list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 20, n_subj_max = 500,
n_thresholds_min = 1, n_thresholds_max = 4,
thresholds_lower_bound = 5, thresholds_upper_bound = 7,
true_preval_min = 0.1, true_preval_max = 0.3,
model = "hoyer2018weibullaft",
model_pars = list(preset = TRUE,AUC = "75"))We can then check again to confirm that the preset has been applied correctly:
sim_weibaft_pres <- simulate_metaROC(type = "phaseII", n_sims = 5,
control_sim = control_sim_weibaft_pres)Logit LMM
Note: The parameterization of this model in our implementation has been changed. Using the parameters presented here does not reflect the new parameterization, hence the warnings. Do not use this model for data generation in the current version of this package!
For the "steinhauser2016logitlmm" model (Steinhauser et al. 2016), beta11
and beta12 are the true mean and variance of the normal
distribution for test values in the non‑diseased population.
beta21 and beta22 are the true mean and
variance of the normal distribution for test values in the diseased
population.
var_rand_H and var_rand_D represent the
true variances of the bivariate random effects for the non‑diseased (H)
and diseased (D) populations, respectively, capturing between‑study
variability:
control_sim_steinh = list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 20, n_subj_max = 500,
n_thresholds_min = 1, n_thresholds_max = 4,
thresholds_lower_bound = 5, thresholds_upper_bound = 7,
true_preval_min = 0.1, true_preval_max = 0.3,
model = "steinhauser2016logitlmm",
model_pars = list(beta11 = 2, beta12 = 0.5,
beta21 = 3, beta22 = 0.8,
var_rand_H = 0.6561, var_rand_D =0.9216))Now to check if this has worked correctly:
sim_steinh <- simulate_metaROC(type = "phaseII", n_sims = 5,
control_sim = control_sim_steinh)
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!Now that we have defined control_sim and generated our
simulated dataset, we can try using a preset with the
"steinhauser2016logitlmm" model. Here, we will try a preset
with a true AUC value of "65":
control_sim_steinh_pres = list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 20, n_subj_max = 500,
n_thresholds_min = 1, n_thresholds_max = 4,
thresholds_lower_bound = 5, thresholds_upper_bound = 7,
true_preval_min = 0.1, true_preval_max = 0.3,
model = "steinhauser2016logitlmm",
model_pars = list(preset = TRUE, AUC = "65"))We can then check again to confirm that the preset has been applied correctly:
sim_steinh_pres <- simulate_metaROC(type = "phaseII", n_sims = 5,
control_sim = control_sim_steinh_pres)
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!
#> Warning in simulate_metaROC_from_model(n_studies_min =
#> control_sim$n_studies_min, : logitLMM data generation does not work with
#> changed parameterization!Copula MTM
For the "stoye2024copula" model, beta00 and
lambda0 are the parameters of the true distribution of test
values for the non-diseased population, while beta01 and
lambda1 are the corresponding parameters for the diseased
population (see Hoyer et al. (2018) for
details).
copula_kind_sim specifies the copula family to use,
either "clayton" or "joe".
cop_params are the true copula parameters: a single
positive value for the Clayton copula, or a vector of three values for
the Joe copula (first positive, second and third between 0 and 1).
norm_approx_sim is a Boolean indicating whether a normal
approximation should be used instead of the exact binomial distribution
in the copula marginals.
marginal_kind_sim specifies the distribution for the
test values in each population and can be "weibull",
"lognormal", "loglogistic", or
"gompertz".
control_sim_cop <- list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 50, n_subj_max = 200,
n_thresholds_min = 2, n_thresholds_max = 5,
thresholds_lower_bound = 0.1, thresholds_upper_bound = 0.9,
true_preval_min = 0.1, true_preval_max = 0.5,
model = "stoye2024copula",
model_pars = list(beta00 = 0.5, lambda0 = 1.2,
beta01 = -0.3, lambda1 = 0.8,
copula_kind_sim = "clayton",
cop_params = 2,
norm_approx_sim = TRUE,
marginal_kind_sim = "lognormal"
)
)Now to check if this has worked correctly:
sim_cop <- simulate_metaROC(type = "phaseII", n_sims = 5,
control_sim = control_sim_cop)Now that we have defined control_sim and generated our
simulated dataset, we can try using a preset with the
"hoyer2018weibullaft" model. Here, we will try a preset
with a true AUC value of "75". Note that when using a
preset, there is always marginal_kind_sim = "weibull".
control_sim_cop_pres = list(n_studies_min = 5, n_studies_max = 10,
n_subj_min = 50, n_subj_max = 200,
n_thresholds_min = 2, n_thresholds_max = 5,
thresholds_lower_bound = 0.1, thresholds_upper_bound = 0.9,
true_preval_min = 0.1, true_preval_max = 0.5,
model = "stoye2024copula",
model_pars = list(preset = TRUE, AUC = "75")
)We can then check again to confirm that the preset has been applied correctly: Preset does not work for copulas.
#sim_cop_pres <- simulate_metaROC(type = "phaseII", n_sims = 5,
# control_sim = control_sim_cop_pres)Simulation output
Now we will take a look at what simulate_metaROC()
returns. We will use the dataset we first simulated from the
"stoye2024cloglog" model to explore the output. The overall
structure of the returned object is the same for all the models we have
discussed earlier.
The returned object by simulate_metaROC() is a list with
2 entries. First, data stores the specific dataset in a
tibble:
head(sim_stoye_cloglog$data)
#> # A tibble: 6 × 7
#> replicate study threshold TP FN FP TN
#> <int> <fct> <fct> <int> <int> <int> <int>
#> 1 1 1 5.6 19 20 10 153
#> 2 1 1 6.5 0 39 0 163
#> 3 1 1 6.8 0 39 0 163
#> 4 1 2 6 0 25 0 220
#> 5 1 2 6.8 0 25 0 220
#> 6 1 3 5.7 10 7 16 65Importantly, the structure of the simulated datasets is identical to
that of real-world datasets, with one minor exception. As multiple
datasets can be simulated from the same underlying data-generating
process, an additional column replicate is included. This
column indicates the identifier of each simulated dataset.
Additionally, data_info stores all the information on
the underyling data generating process. This is particularly important
for evaluating a model fitted to simulated data.
sim_stoye_cloglog$data_info
#> $model_id
#> [1] "stoye2024cloglog"
#>
#> $true_sens
#> [1] 9.737029e-01 9.375570e-01 8.891743e-01 8.263907e-01 7.478580e-01
#> [6] 6.538353e-01 5.470146e-01 4.330258e-01 3.201362e-01 2.177847e-01
#> [11] 1.341150e-01 7.345648e-02 3.513149e-02 1.440224e-02 4.971278e-03
#> [16] 1.421436e-03 3.320754e-04 6.274557e-05 9.535363e-06 1.165283e-06
#> [21] 1.151997e-07
#>
#> $true_spec
#> [1] 0.1764855 0.3696620 0.5616079 0.7305545 0.8584556 0.9388991 0.9792862
#> [8] 0.9947594 0.9990644 0.9998889 0.9999918 0.9999996 1.0000000 1.0000000
#> [15] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
#>
#> $true_AUC
#> [1] 0.8697526
#>
#> $model_pars
#> $model_pars$beta11
#> [1] -38.2
#>
#> $model_pars$beta12
#> [1] 10.26
#>
#> $model_pars$beta13
#> [1] -0.669
#>
#> $model_pars$beta21
#> [1] 3.516
#>
#> $model_pars$beta22
#> [1] -0.306
#>
#> $model_pars$var_rand_H
#> [1] 0.6561
#>
#> $model_pars$var_rand_D
#> [1] 0.9216
#>
#> $model_pars$betas1
#> [1] -3.62500 -3.27469 -2.93776 -2.61421 -2.30404 -2.00725 -1.72384 -1.45381
#> [9] -1.19716 -0.95389 -0.72400 -0.50749 -0.30436 -0.11461 0.06176 0.22475
#> [17] 0.37436 0.51059 0.63344 0.74291 0.83900
#>
#> $model_pars$betas2
#> [1] 1.9860 1.9554 1.9248 1.8942 1.8636 1.8330 1.8024 1.7718 1.7412 1.7106
#> [11] 1.6800 1.6494 1.6188 1.5882 1.5576 1.5270 1.4964 1.4658 1.4352 1.4046
#> [21] 1.3740
#>
#> $model_pars$all_available_thresholds
#> [1] 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8
#> [20] 6.9 7.0Not only does simulate_metaROC() return the input
parameters from model_pars, but it also provides the true
ROC curve through the true sensitivity and specificity values, along
with the corresponding AUC. Additionally,
$model_pars$all_available_thresholds stores all the
available thresholds.
This contains all the necessary information for simulating Phase II type data. We will now move on to Phase III type data.
Phase III simulation
Phase III simulations move beyond targeted checks to a more systematic exploration of performance. Here, the method is evaluated across a broader and more varied set of data-generating mechanisms, including changes in sample size, true diagnostic accuracy or underlying assumptions in the data-generating mechanisms. The goal is to understand how consistent the method performs, where its strengths and limitations lie, and how it compares to alternatives under conditions that are closer to those encountered in practice.
In our implementation of Phase III simulations, first the type has to
be specified (type = "phaseIII"). Then, several control
parameters have to be set. n_studies specifies the range of
number of studies per dataset, and n_subj specifies the
range of individuals per study. n_thresholds gives the
range of thresholds reported per study. outcome_type
indicates the type of test values sampled, either
"continuous" or "ordinal".
true_preval defines the range of disease prevalence per
study, and true_AUC sets the diagnostic test quality as
area under the ROC curve. Either 0.75 or 0.9 can be chosen.
heterogeneity specifies the heterogeneity between studies,
either "low" or "high".
weight_standard_threshold gives an additional weight for
including a ‘standard’ threshold in each study, either 0 or 0.7.
Finally, round_threshold_to_xth_decimal specifies the
number of decimal places to round thresholds to in continuous settings
(not required for ordinal settings).
sim_phase_3 <- simulate_metaROC(type = "phaseIII", n_sims = 5,
control_sim = list(n_studies = list(min=5,max=10),
n_subj = list(min=20,max=500),
n_thresholds = list(min=1,max=4),
outcome_type = "continuous",
true_preval = list(min=0.1,max=0.3),
true_AUC = 0.9,
heterogeneity = "low",
weight_standard_threshold = 0,
round_threshold_to_xth_decimal = 1)
)The returned object by simulate_metaROC() is a list with
2 entries. First, data stores the specific dataset in a
tibble:
head(sim_phase_3$data) |> print(width=Inf)
#> # A tibble: 6 × 12
#> replicate study threshold N prevalence D H TP FN FP TN
#> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <int>
#> 1 1 1 6.4 51 0.290 15 36 12 3 4 32
#> 2 1 2 5.1 407 0.137 56 351 55 1 144 207
#> 3 1 2 6.4 407 0.137 56 351 48 8 37 314
#> 4 1 2 6.9 407 0.137 56 351 41 15 23 328
#> 5 1 3 7.4 87 0.284 25 62 11 14 2 60
#> 6 1 4 7.5 273 0.108 30 243 18 12 6 237
#> threshold_stm
#> <dbl>
#> 1 1
#> 2 0
#> 3 1
#> 4 0
#> 5 1
#> 6 1Each row in sim_phase_3$data corresponds to a specific
threshold within a study and simulation replicate. The column
replicate identifies the simulation run, while
study indexes the individual studies. The applied
diagnostic threshold is stored in threshold.
The columns N and prevalence give the total
number of subjects and the disease prevalence in each study. The
variables D and H denote the numbers of
diseased and non-diseased individuals, respectively. Diagnostic outcomes
are summarized by the cell counts TP, FN,
FP, and TN.
Finally, threshold_stm indicates which threshold from
each study would be selected for use in an STM, if fitted.
Additionally, data_info stores all the information on
the underlying data generating process in control_sim. This
is particularly important for evaluating a fitted model to simulated
data.
head(sim_phase_3$data_info)
#> $n_replicates
#> [1] 5
#>
#> $n_studies
#> $n_studies$min
#> [1] 5
#>
#> $n_studies$max
#> [1] 10
#>
#>
#> $n_subj
#> $n_subj$min
#> [1] 20
#>
#> $n_subj$max
#> [1] 500
#>
#>
#> $n_thresholds
#> $n_thresholds$min
#> [1] 1
#>
#> $n_thresholds$max
#> [1] 4
#>
#>
#> $outcome_type
#> [1] "continuous"
#>
#> $true_preval
#> $true_preval$min
#> [1] 0.1
#>
#> $true_preval$max
#> [1] 0.3Simulation Studies
As mentioned earlier, the package’s main function is
metaROC(). This function can be used for simulation, model
fitting, and evaluation, particularly in the context of simulation
studies. Internally, depending on the action specified—either
simulate, estimate, or
evaluate—metaROC() calls the corresponding
specialized functions. As a result, the inputs and outputs of
metaROC() largely mirror those of the underlying
subfunctions.
In this section, we demonstrate the workflow of conducting a simulation study. First, we generate multiple datasets using the same data-generating process. Next, we fit several models to these datasets and, finally, evaluate their performance.
Data Generation
First, we simulate five datasets from phase III by specifying
action = "simulate" and the simulation type, as in the
simulate_metaROC() function. Apart from this,
control_sim is defined as described in the section on Phase
III simulations.
control_sim_study <- list(
n_studies = list(min = 5, max = 10),
n_subj = list(min = 50, max = 300),
n_thresholds = list(min = 3, max = 6),
outcome_type = "continuous",
true_preval = list(min = 0.05, max = 0.30),
true_AUC = 0.75,
heterogeneity = "high",
weight_standard_threshold = 0.7,
round_threshold_to_xth_decimal = 1,
sens_weight = 0.6
)
sim_study_data = metaROC(action="simulate", type = "phaseIII", n_sims = 5,
control_sim = control_sim_study)
#> Hello and welcome to metaROC!
#> Chosen action: simulate
#> Successfully generated 5 datasets.
head(sim_study_data$data) |> print(width=Inf)
#> # A tibble: 6 × 12
#> replicate study threshold N prevalence D H TP FN FP TN
#> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <int>
#> 1 1 1 5.2 196 0.0790 15 181 13 2 126 55
#> 2 1 1 5.8 196 0.0790 15 181 11 4 75 106
#> 3 1 1 6.1 196 0.0790 15 181 11 4 50 131
#> 4 1 1 7 196 0.0790 15 181 6 9 32 149
#> 5 1 2 6.5 132 0.274 36 96 2 34 2 94
#> 6 1 3 5.5 277 0.135 37 240 29 8 138 102
#> threshold_stm
#> <dbl>
#> 1 0
#> 2 0
#> 3 1
#> 4 0
#> 5 1
#> 6 0Again the returned object mimics the structure from the base simulation function. Now that we have all our simulation data, we can continue with fitting models to the data.
Model Estimation
To fit models to the simulated data, we simply have to set
action = "estimate" and again the base function will be
called internally. However, in contrast to the base function, now either
one model or a vector of models can be fitted. When fitting several
models these have to be input as a vector of model identifiers. This is
especially important when comparing model performances on a specific
dataset. For the model specific parameters now the input parameter
control_fit is used, which is a list of model specifics
that are required to fit some models. For details on which parameters
are required for each model, please refer to the documentation or the
model-specific guides, as well as the application guide, where all
possible input parameters are illustrated. In practice, these parameters
simply need to be supplied in the control_fit list, as
shown in the model estimation step below.
In the following analysis, the Phase III dataset is used for both
model fitting and evaluation. We fit two MTM, namely
"stoye2024cloglog" and
"steinhauser2016logitlmm", followed by an STM,
"chu2006glmm", and a nonparametric MTM,
"martinez2017np". Since "martinez2017np"
requires a variant to be specified, this information is provided via the
control_fit list. It is important to note that when specifying a list of
model-specific parameters to be added, the list must contain one entry
for each model. Even if no input parameters are to be modified for a
given model, an empty list entry (list()) must be provided.
To see which input parameters can be modified for each model, please
refer to the model specific vignettes or the documentation of the
metaROC method.
sim_study_models = metaROC(action="estimate", data = sim_study_data$data,
data_info = sim_study_data$data_info,
model_fit = c("stoye2024cloglog",
"steinhauser2016logitlmm",
"martinez2017np",
"chu2006glmm"),
control_fit = list(
list(),
list(),
list(variant = "fixed-effects"),
list())
)
#> Hello and welcome to metaROC!
#> Chosen action: estimate
#> Hello and welcome to metaROC!
#> Requested model: stoye2024cloglog
#> This is a GLMM for the discrete Hazard for multiple thresholds.
#> See https://doi.org/10.1002/jrsm.1753 for more details.
#> boundary (singular) fit: see help('isSingular')
#> Hello and welcome to metaROC!
#> Requested model: steinhauser2016logitlmm
#> This is an LMM for multiple thresholds.
#> See https://doi.org/10.1186/s12874-016-0196-1 for more details.
#> boundary (singular) fit: see help('isSingular')
#> Hello and welcome to metaROC!
#> Requested model: martinez2017np
#> This is a non-parametric model for multiple thresholds.
#> See https://doi.org/10.1177/0962280214537047 for more details.
#> Number of papers included in meta-analysis: 7
#> Model considered: fixed-effects
#> The area under the summary ROC curve (AUC) is 0.653.
#> The optimal specificity and sensitivity (in the Youden index sense) for summary ROC curve are 0.706 and 0.574, respectively.
#> Hello and welcome to metaROC!
#> Requested model: chu2006glmm
#> This is a GLMM for a single threshold per study.
#> See https://doi.org/10.1177/0272989X09353452 for more details.
#> boundary (singular) fit: see help('isSingular')
#> Hello and welcome to metaROC!
#> Requested model: stoye2024cloglog
#> This is a GLMM for the discrete Hazard for multiple thresholds.
#> See https://doi.org/10.1002/jrsm.1753 for more details.
#> Hello and welcome to metaROC!
#> Requested model: steinhauser2016logitlmm
#> This is an LMM for multiple thresholds.
#> See https://doi.org/10.1186/s12874-016-0196-1 for more details.
#> Hello and welcome to metaROC!
#> Requested model: martinez2017np
#> This is a non-parametric model for multiple thresholds.
#> See https://doi.org/10.1177/0962280214537047 for more details.
#> Number of papers included in meta-analysis: 6
#> Model considered: fixed-effects
#> The area under the summary ROC curve (AUC) is 0.704.
#> The optimal specificity and sensitivity (in the Youden index sense) for summary ROC curve are 0.635 and 0.712, respectively.
#> Hello and welcome to metaROC!
#> Requested model: chu2006glmm
#> This is a GLMM for a single threshold per study.
#> See https://doi.org/10.1177/0272989X09353452 for more details.
#> Hello and welcome to metaROC!
#> Requested model: stoye2024cloglog
#> This is a GLMM for the discrete Hazard for multiple thresholds.
#> See https://doi.org/10.1002/jrsm.1753 for more details.
#> Hello and welcome to metaROC!
#> Requested model: steinhauser2016logitlmm
#> This is an LMM for multiple thresholds.
#> See https://doi.org/10.1186/s12874-016-0196-1 for more details.
#> boundary (singular) fit: see help('isSingular')
#> Hello and welcome to metaROC!
#> Requested model: martinez2017np
#> This is a non-parametric model for multiple thresholds.
#> See https://doi.org/10.1177/0962280214537047 for more details.
#> Number of papers included in meta-analysis: 9
#> Model considered: fixed-effects
#> The area under the summary ROC curve (AUC) is 0.721.
#> The optimal specificity and sensitivity (in the Youden index sense) for summary ROC curve are 0.757 and 0.658, respectively.
#> Hello and welcome to metaROC!
#> Requested model: chu2006glmm
#> This is a GLMM for a single threshold per study.
#> See https://doi.org/10.1177/0272989X09353452 for more details.
#> Hello and welcome to metaROC!
#> Requested model: stoye2024cloglog
#> This is a GLMM for the discrete Hazard for multiple thresholds.
#> See https://doi.org/10.1002/jrsm.1753 for more details.
#> Hello and welcome to metaROC!
#> Requested model: steinhauser2016logitlmm
#> This is an LMM for multiple thresholds.
#> See https://doi.org/10.1186/s12874-016-0196-1 for more details.
#> boundary (singular) fit: see help('isSingular')
#> Hello and welcome to metaROC!
#> Requested model: martinez2017np
#> This is a non-parametric model for multiple thresholds.
#> See https://doi.org/10.1177/0962280214537047 for more details.
#> Number of papers included in meta-analysis: 8
#> Model considered: fixed-effects
#> The area under the summary ROC curve (AUC) is 0.659.
#> The optimal specificity and sensitivity (in the Youden index sense) for summary ROC curve are 0.758 and 0.531, respectively.
#> Hello and welcome to metaROC!
#> Requested model: chu2006glmm
#> This is a GLMM for a single threshold per study.
#> See https://doi.org/10.1177/0272989X09353452 for more details.
#> Hello and welcome to metaROC!
#> Requested model: stoye2024cloglog
#> This is a GLMM for the discrete Hazard for multiple thresholds.
#> See https://doi.org/10.1002/jrsm.1753 for more details.
#> Hello and welcome to metaROC!
#> Requested model: steinhauser2016logitlmm
#> This is an LMM for multiple thresholds.
#> See https://doi.org/10.1186/s12874-016-0196-1 for more details.
#> Hello and welcome to metaROC!
#> Requested model: martinez2017np
#> This is a non-parametric model for multiple thresholds.
#> See https://doi.org/10.1177/0962280214537047 for more details.
#> Number of papers included in meta-analysis: 9
#> Model considered: fixed-effects
#> The area under the summary ROC curve (AUC) is 0.783.
#> The optimal specificity and sensitivity (in the Youden index sense) for summary ROC curve are 0.791 and 0.704, respectively.
#> Hello and welcome to metaROC!
#> Requested model: chu2006glmm
#> This is a GLMM for a single threshold per study.
#> See https://doi.org/10.1177/0272989X09353452 for more details.As can be seen for each model fitting process a printout happens at
the beginning of each fitting process. Additionaly,
"martineznp2017" also directly reports back the estimated
summary AUC. Finally, we can identify warnings and directly relate them
to the model fitted. For instance we can gather for the first dataset
there were some issues during the fitting process of the
"steinhauser2016logitlmm" model. Due to the nested
structure to access the specific result for a replicate and model one
can use:
sim_study_models[[1]][["steinhauser2016logitlmm"]]
#> >> Fitted meta-analysis model of class metaROC <<
#> >> Selected model: steinhauser2016logitlmm <<
#> >> Model estimation is converged <<
#> >> See summary() for more details <<Therefore, the first level of the nested structure corresponds to the
index of the dataset, in this case, the first simulated dataset. The
second level contains the models fitted to that dataset, which, when
accessed, have the same structure as a single model fitted using
fit_metaROC().
Now one can identify that, whilst there were some issues, the model
estimation has still converged. If one particular model is of interest,
one can still call the summary.metaROC() method to gain an
overview of it:
summary(sim_study_models[[1]][["steinhauser2016logitlmm"]])
#>
#> *** Results of Multiple threshold method (MTM) ***
#>
#> Model: LMM for multiple thresholds
#> Type of model: DICS
#>
#> Total number of studies: 7
#> Total number of thresholds: 20
#> Number of different thresholds: 14
#>
#> Results with Wald confidence intervals:
#>
#> Youden index (sensitivity weight = 0.5): 0.2797
#> Optimal threshold value: 6.2674
#>
#> Estimated Sensitivity and Specificity [95% CI]:
#> Sens: 0.4594 [0.2421; 0.6933]
#> Spec: 0.8202 [0.6815; 0.9068]
#>
#> AUC: 0.6535To evaluate these models, however, and to put the results into context, we will need to assess the fitted models.
Model Evaluation
Continuing with evaluation, one simply has to select
action = "evaluate" and supply the data as
well as the data_info from the simulated dataset together
with the fitted models. We select summarize = TRUE to get
results in long format.
sim_study_evaluated = metaROC(action= "evaluate", data = sim_study_data$data,
data_info = sim_study_data$data_info,
model_fit = sim_study_models,summarize = TRUE)
#> Hello and welcome to metaROC!
#> Chosen action: evaluateAgain we can gain model specific results the same way as for the
evaluate_metaROC() function:
sim_study_evaluated$model_specific[[2]][["stoye2024cloglog"]]
#> $AUC
#> [1] 0.7209067
#>
#> $roc_curve
#> $roc_curve$sroc_AUC
#> [1] 0.7209067
#>
#> $roc_curve$sroc_df
#> threshold sens_lo sensitivity sens_up spec_lo specificity
#> 1 4.7 7.906652e-01 0.9365626022 0.98187883 0.02584899 0.2245701
#> 2 4.8 9.487730e-02 0.8136092107 0.97274932 0.02721873 0.4201549
#> 3 4.9 4.672766e-02 0.6421462292 0.89878522 0.12448786 0.7123377
#> 4 5.0 1.844222e-02 0.5114678515 0.85010389 0.15030493 0.8118798
#> 5 5.4 5.322015e-03 0.3327827733 0.73268980 0.24775313 0.9105585
#> 6 5.7 1.170531e-03 0.2134267051 0.64318498 0.33187525 0.9698491
#> 7 5.8 2.141370e-04 0.1167174339 0.51904276 0.47903199 0.9927083
#> 8 5.9 4.663312e-05 0.0718368005 0.44471848 0.56406957 0.9980392
#> 9 6.5 5.284551e-06 0.0310369890 0.32182479 0.72568210 0.9997580
#> 10 6.7 1.503297e-07 0.0095640390 0.21805042 0.79771457 0.9999700
#> 11 7.1 6.702085e-09 0.0031305937 0.14602034 0.87621514 0.9999982
#> 12 7.4 1.590135e-10 0.0013828489 0.12215563 0.89796562 0.9999999
#> 13 7.5 2.318431e-12 0.0002865313 0.06798595 0.94405270 1.0000000
#> spec_up youden_index
#> 1 0.9154182 0.161132662
#> 2 1.0000000 0.233764115
#> 3 1.0000000 0.354483970
#> 4 1.0000000 0.323347658
#> 5 1.0000000 0.243341307
#> 6 1.0000000 0.183275765
#> 7 1.0000000 0.109425724
#> 8 1.0000000 0.069876037
#> 9 1.0000000 0.030794941
#> 10 1.0000000 0.009534067
#> 11 1.0000000 0.003128766
#> 12 1.0000000 0.001382781
#> 13 1.0000000 0.000286528
#>
#> $roc_curve$opt_threshold
#> [1] 4.9
#>
#>
#> $Bias_AUC
#> [1] -0.02911416
#>
#> $Bias_sens
#> [1] -0.09842252
#>
#> $Bias_spec
#> [1] 0.04006801
#>
#> $Bias_threshold
#> [1] -0.7346346
#>
#> $coverage_sens
#> [1] TRUE
#>
#> $coverage_spec
#> [1] TRUE
#>
#> $se_sensspec
#> $se_sensspec$sens
#> [1] 0.3037906
#>
#> $se_sensspec$spec
#> [1] 0.2999289
#>
#>
#> attr(,"class")
#> [1] "evaluated"As we are conducting a simulation study, the results from individual
datasets are relatively uninformative and much more difficult to analyze
collectively. For this reason, the package includes the option to return
a long-format data frame containing all simulation results, which
facilitates summarization and further analysis. Because we have set
summarize = TRUE, we can access this dataframe:
sim_study_evaluated$results_long
#> replicate model AUC bias sensitivity bias
#> 1 1 stoye2024cloglog -0.202104957 -0.41799628
#> 2 2 stoye2024cloglog -0.029114155 -0.09842252
#> 3 3 stoye2024cloglog 0.073938144 -0.11196518
#> 4 4 stoye2024cloglog -0.129163296 -0.17964456
#> 5 5 stoye2024cloglog 0.095267009 -0.09219639
#> 6 1 steinhauser2016logitlmm -0.096553704 -0.28116248
#> 7 2 steinhauser2016logitlmm -0.030282498 -0.15414706
#> 8 3 steinhauser2016logitlmm -0.007196195 -0.07876221
#> 9 4 steinhauser2016logitlmm -0.038767005 -0.14462837
#> 10 5 steinhauser2016logitlmm 0.041890163 -0.03567362
#> 11 1 martinez2017np -0.096813050 -0.16612795
#> 12 2 martinez2017np -0.045588746 -0.02821463
#> 13 3 martinez2017np -0.029469009 -0.08250013
#> 14 4 martinez2017np -0.091164962 -0.20981869
#> 15 5 martinez2017np 0.032489081 -0.03667480
#> 16 1 chu2006glmm 0.001450475 -0.28211246
#> 17 2 chu2006glmm -0.044110947 -0.00212891
#> 18 3 chu2006glmm -0.075189688 -0.20841320
#> 19 4 chu2006glmm -0.028114255 -0.11779018
#> 20 5 chu2006glmm 0.024338004 -0.23151802
#> sensitivity standard error specificity bias specificity standard error
#> 1 0.16379336 0.23569344 0.40358752
#> 2 0.30379056 0.04006801 0.29992892
#> 3 0.18868487 0.21592289 0.31912842
#> 4 0.28619107 0.05305764 0.25971032
#> 5 0.33080831 0.28867850 0.36660565
#> 6 0.11086731 0.14797922 0.07081595
#> 7 0.11416443 0.09407843 0.06275362
#> 8 0.06921813 0.04286651 0.07516836
#> 9 0.12144356 0.06433741 0.08261464
#> 10 0.08312426 0.08267823 0.05190741
#> 11 0.03100705 0.03343598 NA
#> 12 0.03027109 -0.03763509 NA
#> 13 0.02594408 0.08448703 NA
#> 14 0.02978395 0.08548803 NA
#> 15 0.02650891 0.11852106 NA
#> 16 0.10427995 0.17326852 0.06775102
#> 17 0.11646579 -0.10855802 0.11364157
#> 18 0.07811071 0.10200882 0.05902669
#> 19 0.17414527 0.04735612 0.18737761
#> 20 0.08076864 0.22242016 0.03175467
#> threshold bias sensitivity coverage specificity coverage convergence
#> 1 -0.33463463 TRUE TRUE TRUE
#> 2 -0.73463463 TRUE TRUE TRUE
#> 3 -0.53463463 TRUE TRUE TRUE
#> 4 -0.43463463 TRUE TRUE TRUE
#> 5 -0.33463463 TRUE TRUE TRUE
#> 6 0.63281281 FALSE FALSE TRUE
#> 7 0.52689690 TRUE TRUE TRUE
#> 8 0.07620621 TRUE TRUE TRUE
#> 9 0.44914915 TRUE TRUE TRUE
#> 10 0.08863864 TRUE TRUE TRUE
#> 11 NA FALSE NA TRUE
#> 12 NA TRUE NA TRUE
#> 13 NA FALSE NA TRUE
#> 14 NA FALSE NA TRUE
#> 15 NA TRUE NA TRUE
#> 16 NA FALSE FALSE TRUE
#> 17 NA TRUE TRUE TRUE
#> 18 NA FALSE TRUE TRUE
#> 19 NA TRUE TRUE TRUE
#> 20 NA FALSE FALSE TRUEThe summary output can now be used to easily compare models. For
example, one could observe that, for the simulated data and models used,
"steinhauser2016logitlmm" consistently yields the lowest
bias in AUC.
However, this does not yet constitute a quantitative analysis of the
simulation results. Fortunately, with this long-format data frame of
simulation results, it is fairly straightforward to calculate summary
measures for each evaluation criterion.
These summarized data can then be readily processed for creating tables
or plotting simulation results.
For example, we can calculate summary measures of any column by grouping
by the model column of the data frame. Again, be aware that
this is not a complete simulation study, but it serves as a conceptual
illustration of how such an analysis could be conducted. We will start
by comparing the convergence probability for each model:
tapply(sim_study_evaluated$results_long$`convergence`,
sim_study_evaluated$results_long$model, mean, na.rm = TRUE)
#> stoye2024cloglog steinhauser2016logitlmm martinez2017np
#> 1 1 1
#> chu2006glmm
#> 1As we can see, all models demonstrated perfect convergence across the simulated datasets, with no failed fitting processes. Therefore, at least from a computational standpoint, we do not need to worry about any of these models.
Next, let’s look at some coverage measures. As a reminder, the empirical coverage represents the proportion of times the estimated confidence interval includes the true parameter value across simulations. In case of an MTM, we compute the empirical coverage for the estimated optimal sensitivity and specificity at the threshold identified as optimal using the unweighted Youden-index and compare it to the true optimal threshold from the data-generating process that was also computed using the unweighted Youden-index. For the STM, we use the estimated summary pair of sensitivity and specificity.
print("Sensitivity coverage:")
#> [1] "Sensitivity coverage:"
print(tapply(sim_study_evaluated$results_long$`sensitivity coverage`,
sim_study_evaluated$results_long$model,
mean, na.rm = TRUE))
#> stoye2024cloglog steinhauser2016logitlmm martinez2017np
#> 1.0 0.8 0.4
#> chu2006glmm
#> 0.4
print("Specificity coverage:")
#> [1] "Specificity coverage:"
print(tapply(sim_study_evaluated$results_long$`specificity coverage`,
sim_study_evaluated$results_long$model,
mean, na.rm = TRUE))
#> stoye2024cloglog steinhauser2016logitlmm martinez2017np
#> 1.0 0.8 NaN
#> chu2006glmm
#> 0.6Looking at the empirical coverages, the stoye2024cloglog
model achieved perfect coverage for both sensitivity and specificity.
The steinhauser2016logitlmm model followed closely, with a
mean coverage of 80% for both metrics. Since we only simulated five
datasets, this implies that coverage failed in just one case for this
model. Then the STM chu2006glmm had a worse mean coverage
for specificity and sensitivity being 0.6 anf 0.4 respectively. The
non-parametric martinez2017np model, however, had a mean
coverage of 0 for sensitivity and did not allow the computation of
coverage for the specificity.
Next, let’s compare the mean bias in the estimated AUC across the models. This allows us to assess how accurately each model estimates the overall discriminative ability, with smaller absolute bias indicating better performance:
tapply(sim_study_evaluated$results_long$`AUC bias`,
sim_study_evaluated$results_long$model, mean, na.rm = TRUE)
#> stoye2024cloglog steinhauser2016logitlmm martinez2017np
#> -0.03823545 -0.02618185 -0.04610934
#> chu2006glmm
#> -0.02432528For all models, we observe a negative bias in the estimated AUC,
indicating that each model slightly underestimates the true AUC. In
order from smallest to largest absolute bias, the
chu2006glmm model has the smallest underestimation (-0.024)
closely followed by the steinhauser2016logitlmm. Then
followstoye2024cloglog (-0.038), and the
martinez2017np model (-0.053).
As one of the primary aims of a meta-analysis using an MTM is to estimate the optimal threshold for a diagnostic test, we next compare the mean bias of the estimated optimal threshold across models. This allows us to assess how accurately each model identifies the threshold that maximizes diagnostic performance (again, using the unweighted Youden index as the criterion to define the optimal diagnostic threshold).
tapply(sim_study_evaluated$results_long$`threshold bias`,
sim_study_evaluated$results_long$model, mean, na.rm = TRUE)
#> stoye2024cloglog steinhauser2016logitlmm martinez2017np
#> -0.4746346 0.3547407 NaN
#> chu2006glmm
#> NaNComparing the mean bias of the estimated optimal threshold, the
stoye2024cloglog model underestimates the threshold
(-0.475), while the steinhauser2016logitlmm model
overestimates it (0.355). The non-parametric martinez2017np
model as well as the STM chu2066glmm do not estimate an
optimal threshold, as this quantity is not considered in its model
specification.
Lastly, we return to the estimated sensitivity and specificity to examine their mean biases across the models.
tapply(sim_study_evaluated$results_long$`sensitivity bias`,
sim_study_evaluated$results_long$model, mean, na.rm = TRUE)
#> stoye2024cloglog steinhauser2016logitlmm martinez2017np
#> -0.1800450 -0.1388747 -0.1046672
#> chu2006glmm
#> -0.1683926Looking at the mean biases of the estimated optimal sensitivity
across models, we see that all four models slightly underestimate
sensitivity. The steinhauser2016logitlmm model has the
smallest negative bias (-0.139), followed by the
martinez2017np model (-0.148), the chu2006glmm
model (-0.168) and then the stoye2024cloglog model
(-0.180).
tapply(sim_study_evaluated$results_long$`specificity bias`,
sim_study_evaluated$results_long$model, mean, na.rm = TRUE)
#> stoye2024cloglog steinhauser2016logitlmm martinez2017np
#> 0.16668410 0.08638796 0.05685940
#> chu2006glmm
#> 0.08729912For specificity, all models slightly overestimate the values. The
martinez2017np model has the smallest positive bias
(0.073), followed by the steinhauser2016logitlmm (0.086)
and chu2006glmmmodels (0.087), and then the
stoye2024cloglog model (0.167).
With this, we have analyzed all the relevant information returned by
action = "evaluate". At this point, one can draw
conclusions from the results, although these conclusions will depend on
the specific focus of the analysis. For instance, including the
martinez2017np and chu2006glmm models would
not have been advisable if our primary interest had been in comparing
models based on their performance in estimating the optimal threshold,
given that these models do not consider the threshold in their
specifications.
However, the simulation results suggest that all models are
computationally stable, with perfect convergence and no failed fittings.
In terms of empirical coverage, the stoye2024cloglog model
consistently achieved perfect coverage for both sensitivity and
specificity, while the steinhauser2016logitlmm model showed
slightly lower coverage, and the non-parametric
martinez2017np model was limited as it cannot compute the
empirical coverage for specificity. The chu2006glmm
however, did provide coverages for both values but they were
considerably worse. The mean bias in estimated AUC was small and
negative for all models, indicating a minor underestimation, with
steinhauser2016logitlmm and chu2006glmm being
closest to the true value. For the optimal threshold, the two parametric
models showed moderate biases in opposite directions, while the
non-parametric model does not estimate this quantity. Finally, the
estimated sensitivity is underestimated across all models, and
specificity overestimated, but the magnitude of these biases is small.
Therefore, one could conclude that the stoye2024cloglog and
steinhauser2016logitlmm models performed comparably in our
small toy simulation study, with each model showing slightly better
performance in different aspects.
This concludes our guide on simulating metaROC data and conducting
simulation studies with the package. By now, you should be able to
simulate multiple types of data, perform simulation studies using any of
the models included in the package, and analyze the resulting outputs.
It is also useful to note that the long-format results returned by
action = "evaluate" are structured in a way that is fully
compatible with visualization and summarization tools for results of
simulation studies, such as for example provided in the
rsimsum package (Gasparini
2018).
A final guide will cover plotting, as the package provides basic functions to visualize the data, display ROC curves from model estimation, and generate threshold versus sensitivity/specificity plots to help identify the optimal threshold. These plotting functions are particularly useful when a single model is of interest, such as in an applied setting.