Introduction
This vignette provides an introduction to fitting the models
implemented in the metaROC package to real-world data. We
will walk through each class of models, including single-threshold
methods (STM) and multiple-threshold methods (MTM). For each model, we
highlight the estimation procedure, along with the required input
parameters. Afterwards, the summary.metaROC() method can be
used to retrieve information such as the area under the ROC curve (AUC)
and the estimated sensitivity and specificity. If the model provides an
estimated optimal diagnostic threshold, these estimates correspond to
that threshold; for models that do not, the sensitivity and specificity
summarize the overall test performance across reported thresholds.
Importantly, this vignette serves as an overview of the methods used
so that each possible input parameter of the fit_metaROC()
function is showcased. It is not meant to be an in-depth guide for each
model, but rather provides a broad introduction to their functionality
and output. For a more in depth guide on a specific model, check out the
model-specific vignettes.
In this guide, we focus on the hba1c dataset included in
the metaROC package. This dataset, published in Hoyer et al. (2018), examines the diagnostic
accuracy of the biomarker HbA1c (measured as the percentage of
hemoglobin in the blood that is glycated) for detecting type 2 diabetes.
The reference standard used is the 2-hour oral glucose tolerance test
(2h-OGTT).
The dataset comprises 38 primary studies, some of which report results for multiple diagnostic thresholds of HbA1c. To ensure reproducibility of the results, we set a random seed before loading the dataset:
## study threshold TP TN FP FN D H originally_published
## 1 1 5.9 574 1389 682 262 836 2071 1
## 2 2 5.0 617 1005 7735 18 635 8740 0
## 3 2 5.1 607 1617 7123 28 635 8740 0
## 4 2 5.2 600 2438 6302 35 635 8740 0
## 5 2 5.3 581 3409 5331 54 635 8740 0
## 6 2 5.4 563 4422 4318 72 635 8740 0
First, we investigate the structure of the dataset. As we can see, it
contains a study identifier (study), which is a unique
integer used to distinguish between individual studies. Next, the
dataset includes the diagnostic threshold (threshold). It
is important to note that single-threshold methods (STM) only
incorporate one row per study and therefore ignore the threshold
entirely. Consequently, if one is only interested in fitting an STM,
this column is not required. To fit an MTM, several studies ideally need
to report results for more than one diagnostic threshold (although most
MTM can also be applied if all studies report only results on a single
threshold). Having multiple thresholds per study provides the necessary
information to model the full ROC curve and to properly account for the
within-study correlation between thresholds.
The dataset further reports the numbers of true positives
(TP), true negatives (TN), false positives
(FP), and false negatives (FN). Finally, it
includes the number of non-diseased (H) and the number of
diseased individuals (D).
It is not necessary to explicitly provide H and
D, these quantities can be derived from the previously
mentioned values. Specifically, the number of non-diseased individuals
can be calculated as H = TN + FP, while the number of
diseased individuals can be calculated as D = FN + TP.
The dataset also inlcudes a column named
orginally_published. This column indicates which single
threshold was originally published for meta analysis for this dataset
using only STM. Therefore, for applying our STM we will filter the
dataset such that only those entries are included (indicated by the
entry 1).
stm_hba1c <- hba1c[hba1c$originally_published == 1,]Before performing any analysis, it is useful to visualize the data.
Plotting the dataset provides a first overview of the reported study
results and can help with interpreting and understanding the subsequent
model fitting and evaluation results. Fortunately, the package provides
the plot_data() function, which visualizes the individual
ROC curves for each study.
plot_data(hba1c)
## Study specific ROC curves plotted
We can now confirm that the total number of studies is 38. In addition, we observe several ROC curves, indicating that multiple studies report results for at least two diagnostic thresholds. Therefore, the full HbA1c data are suitable for fitting MTM without any issues.
We can observe that the ROC curves start in the bottom-left corner of the plot, where both sensitivity and 1-specificity are low, and move toward the top-right corner as the diagnostic threshold changes. Along the way, the curve typically bends toward the top-left corner, which represents the ideal region of high sensitivity and high specificity. This pattern is consistent with typical ROC curves because a good diagnostic test should correctly identify most diseased individuals (high sensitivity) while also correctly classifying most non-diseased individuals (high specificity). The closer the curve is to the top-left corner, the better the test is at discriminating between diseased and non-diseased subjects, whereas curves near the diagonal line indicate poor discriminative ability. However, between studies there exists substantial variation in estimates.
Having now obtained a comprehensive overview of the dataset, we can proceed to fitting and evaluating models. The following sections are divided into two parts: one focusing on some single threshold methods (STM) and the other on some multiple threshold methods (MTM).
Single treshold methods (STM)
STM are used when each study reports only one diagnostic threshold, or when the focus is on a specific clinically relevant threshold. They are simpler to fit and interpret than multiple threshold methods, as they require only one pair of sensitivity and specificity per study. In this section, we will examine three single threshold models: a linear mixed-effects model (LMM) as proposed by Reitsma et al. (2005), a binomial generalized linear mixed-effects model (GLMM) as described by Chu and Cole (2006) and Chu et al. (2010), and a mixed model that jointly models the bivariate distribution of sensitivities and specificities using copulas (Nikoloulopoulos 2015).
The LMM assumes linearity in the logit-transformed sensitivities and specificities and accounts for between-study variability with random effects. In contrast, the GLMM directly models the counts of true positives and true negatives using binomial distributions, which allows for more accurate modeling of studies with small sample sizes or extreme proportions.
LMM for a single threshold per study
In this section, we focus on the LMM by Reitsma et al. (2005). The model operates on the logit-transformed sensitivities and specificities, using a normal distribution for the random effects to capture between-study variability. By jointly modeling sensitivity and specificity, it accounts for the correlation between these measures across studies, which allows us to estimate a pair of summary sensitivity and specificity, as well as an SROC curve with additional assumptions (Rücker and Schumacher 2010).
To fit this model, no additional parameters need to be specified.
Simply specifying the corresponding model ID
(reitsma2005lmm) and using the data with only a single
threshold per study (stm_hba1c) suffices.
mod_reitsma <- fit_metaROC(stm_hba1c, model = "reitsma2005lmm")## Hello and welcome to metaROC!
## Requested model: reitsma2005lmm
## This is an LMM for a single threshold per study.
## See https://doi.org/10.1016/j.jclinepi.2005.02.022 for more details.
As we can see, the function provides a brief output message that
confirms which model has been fitted. It also supplies the corresponding
DOI for a literature reference, so that users can refer to the original
source for more detailed information. Internally, the function calls the
R-package mada (Doebler and
Sousa-Pinto 2022).
To gain an overview of the most important results, we use the
summary.metaROC() method, passing the fitted model as the
input parameter. Additionally, we specify a weight of 0.5 for
sensitivity in the Youden index computation. We also request Wald
confidence intervals (CIs) for the estimated sensitivity and
specificity.
If we had wanted a CI for the AUC, we would need to specify
ci_type = "bootstrap" along with the number of bootstrap
samples n_boot.
summary(mod_reitsma, sens_weight = 0.5, ci_type = "wald")##
## *** Results of Single threshold method (STM) ***
##
## Model: LMM for single threshold
##
## Total number of studies: 38
## Total number of thresholds: 38
## Number of different thresholds: 13
##
## Results with Wald confidence intervals:
##
## Youden index (sensitivity weight = 0.5): 0.5310
## Estimated Sensitivity and Specificity [95% CI]:
## Sens: 0.7211 [0.6690; 0.7678]
## Spec: 0.8093 [0.7630; 0.8483]
##
## AUC: 0.8312
At the beginning of the output, we are reminded that these results correspond to a single-threshold method. This means that no threshold-specific information is provided. Afterwards, the specific model is stated again. Next, some formal details about the meta-analysis are presented: in total, 38 studies were included, reporting on 13 distinct thresholds.
We are also informed that the CIs, indicated in brackets after the estimates, are Wald CIs. The maximum Youden index at the specified weight is also reported. For STM, the maximum Youden index is generally less relevant, as these methods do not allow threshold-specific inference.
Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.7211 [0.6690; 0.7678], which means that the model estimates the probability of diseased individuals to be correctly tested positive to be 72.11%. The estimated specificity is 0.8093 [0.7630; 0.8483], i.e., 80.93% is the probability that non-diseased individuals are tested negative.
Of particular interest is the estimated AUC of 0.8312, which provides a summary measure of the diagnostic accuracy across all studies, indicating good overall diagnostic accuracy of HbA1c. However, for the diagnosis of type 2 diabetes, there may be superior diagnostic tests, such as fasting plasma glucose (Baechle et al. 2026).
All these numerical results are informative, but visualizations can
often improve understanding. We will now plot the fitted model, choosing
to display the individual study results from the data in the background
by setting studies = TRUE. This will show each study’s
estimated pair of 1-specificity and sensitivity, providing context for
the estimated SROC curve.
plot(mod_reitsma, studies = TRUE)
## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.
The plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed because there are unknown in reality.
The first plot displays the observed data points for each study. Since the analysis is based on originally published data, each study contributes a single sensitivity–specificity pair.
The second plot shows the estimated SROC curve together with the
greyed-out study-specific data points. As the Reitsma model is an STM,
it does not provide a direct pointwise confidence band for the SROC
curve. Instead, uncertainty is represented through a confidence region
around the estimated mean sensitivity–specificity operating point. This
region is visualized as an ellipse in ROC space. When
pred_reg = TRUE, the between-study variance is additionally
incorporated, resulting in a predictive ellipse which can be interpreted
as enclosing point estimates of future studies with 95% probability.
GLMM for a single threshold per study
The GLMM proposed by Chu and Cole (2006) and Chu et al. (2010) takes a slightly different approach compared to the LMM by modeling the counts of true positives and true negatives directly using binomial distributions. This framework allows the model to better handle studies with small sample sizes or extreme proportions of sensitivity and specificity. Like the LMM, it accounts for between-study variability through random effects, but by operating on the raw counts, it can provide more accurate estimates when the standard normal approximations used in the LMM might be less reliable.
To fit the GLMM, we specify the model ID chu2006glmm.
Additionally one could change the link used within the GLMM
framework, but we will simply work with the default logit
link:
mod_chu <- fit_metaROC(stm_hba1c, model = "chu2006glmm")## Hello and welcome to metaROC!
## Requested model: chu2006glmm
## This is a GLMM for a single threshold per study.
## See https://doi.org/10.1177/0272989X09353452 for more details.
As we can see, the function provides a brief output message that confirms which model has been fitted. It also supplies the corresponding DOI to Chu et al. (2010), so that users can refer to the original source for more detailed information.
To gain an overview of the most important results, we use the
summary.metaROC() method, passing the fitted model as the
input parameter. Additionally, we specify a weight of 0.5 for
sensitivity in the Youden index computation. We also request Wald CIs
for the estimated sensitivity and specificity.
If we had wanted a CI for the AUC, we would need to specify
ci_type = "bootstrap" along with the number of bootstrap
samples n_boot.
summary(mod_chu)##
## *** Results of Single threshold method (STM) ***
##
## Model: GLMM for single threshold
## Link: logit
##
## Total number of studies: 38
## Total number of thresholds: 38
## Number of different thresholds: 13
##
## Results with Wald confidence intervals:
##
## Youden index (sensitivity weight = 0.5): 0.5528
## Estimated Sensitivity and Specificity [95% CI]:
## Sens: 0.7276 [0.6739; 0.7754]
## Spec: 0.8107 [0.7650; 0.8493]
##
## AUC: 0.8134
The output of the summary method is analogous to the LMM. Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.7276 [0.6739; 0.7754]. The estimated specificity is 0.8107 [0.7650; 0.8493]. Of particular interest is the estimated AUC of 0.8134 which is slightly lower than the estimated AUC from the LMM.
As the GLMM is an STM, we again only see the single data points for each study. Additionally, it also does not provide a confidence band for the SROC curve, but as for the LMM, we can provide a confidence/prediction ellipse around the estimated mean sensitivity and specificity point.
plot(mod_chu, studies = TRUE, pred_reg = TRUE)
## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.
Copula mixed model for bivariate meta-analysis (Nikoloulopoulos 2015)
Nikoloulopoulos (2015) introduces a copula mixed-effects model for bivariate meta-analysis of diagnostic test accuracy studies, which generalizes the standard GLMM by allowing greater flexibility in the joint distribution of study-specific sensitivity and specificity. Unlike the GLMM, which typically assumes a bivariate normal distribution for the random effects, the copula model uses a copula to represent the dependence between sensitivity and specificity as well as between-study variability. The model operates on the original scale of sensitivity and specificity and includes the GLMM as a special case (by assuming a Gaussian copula). SROC curves can be derived from the fitted model, and the AUC can be estimated. However, the model does not report an optimal threshold, as it focuses on summarizing sensitivity and specificity across all reported thresholds rather than identifying a single point of maximal test performance. This flexible dependence structure allows for a more accurate fit to data compared to standard parametric models while retaining a parametric formulation.
To fit the model, we first specify the model identifier
nikoloulopoulos2015bivariate. The copula_kind
parameter determines the type of copula used to model the dependence
between study-specific sensitivity and specificity. Copulas are
functions that allow us to describe the dependence structure between two
random variables separately from their marginal distributions. In this
analysis, we use the clayton copula, which is able to
capture positive dependence with stronger association in the lower
tail.
mod_nikol <- fit_metaROC(stm_hba1c, model = "nikoloulopoulos2015bivariate")## Hello and welcome to metaROC!
## Requested model: nikoloulopoulos2015bivariate
## This is a copula mixed model for bivariate meta-analysis.
## See https://doi.org/10.1002/sim.6595 for more details.
## Registered S3 method overwritten by 'car':
## method from
## na.action.merMod lme4
The fitted model is reported as a copula mixed model for bivariate meta-analysis, and the corresponding DOI is provided for reference. Notably, this fitting procedure took slightly longer than the other models because the copula mixed-effects model estimates both the marginal distributions and the dependence structure between sensitivity and specificity, which increases computational complexity and requires more intensive likelihood optimization.
To gain an overview of the most important results, we use the
summary.metaROC() method, passing the fitted model as the
input parameter. Additionally, a sensitivity weight of 0.5 is used in
the Youden index computation. We also request Wald confidence intervals
for the estimated sensitivity and specificity.
If we had wanted a CI for the AUC, we would need to specify
ci_type = "bootstrap" along with the number of bootstrap
samples n_boot.
summary(mod_nikol)##
## *** Results of Single threshold method (STM) ***
##
## Model: Copula model for single threshold
##
## Total number of studies: 38
## Total number of thresholds: 38
## Number of different thresholds: 13
##
## Results with Wald confidence intervals:
##
## Youden index (sensitivity weight = 0.5): 0.9046
## Estimated Sensitivity and Specificity [95% CI]:
## Sens: 0.7240 [0.7084; 0.7396]
## Spec: 0.7791 [0.7749; 0.7833]
##
## AUC: 0.8590
The model estimates sensitivity to be 0.7240 [0.7084; 0.7396], specificity to be 0.7791 [0.7749; 0.7833], and an AUC of 0.8590. While the estimated specificity is lower than the estimates from LMM and GLMM, the AUC is slightly higher. Plotting the SROC curve yields a similar estimated structure to the two preceding models, however for this STM no confidence/predictive ellipse is implemented:
plot(mod_nikol, studies = TRUE)
## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.
Multiple threshold methods (MTM)
MTM are used when studies report results for one or more diagnostic thresholds. Unlike STM, MTM take advantage of all available threshold information within each study to model the full ROC curve. This allows for a more comprehensive assessment of diagnostic performance across the entire range of possible thresholds.
MTM are particularly useful when one wants to estimate summary ROC
curves, account for within-study correlation between thresholds, or
identify optimal thresholds using measures such as the Youden index. By
considering multiple thresholds and their values per study, (most) MTM
implemented in metaROC allow inference on the optimal
diagnostic threshold.
For the MTM, we consider three models in this vignette: The LMM
proposed by Steinhauser et al. (2016) (ID
steinhauser2016logitlmm) models logit-transformed
sensitivities and specificities across multiple thresholds. The discrete
GLMM proposed by Stoye et al. (2024) (ID
stoye2024cloglog) models the discrete hazard to have an
unobserved test value in an interval between diagnostic thresholds,
which accounts for varying thresholds using a complementary log-log
link. The non-parametric approach proposed by Martinez-Camblor (2017) (ID
martinez2017np) estimates the SROC curve without assuming a
specific parametric form.
Each of these MTM has particular strengths. The logit LMM and discrete GLMM leverage mixed-effects modeling to account for between-study variability while incorporating information from multiple thresholds. The non-parametric model avoids strong parametric assumptions, providing flexibility when the true ROC shape is unknown.
Compared to the STM, MTM allow us to explore the diagnostic performance across multiple thresholds and potentially identify an optimal threshold for HbA1c. This enables comparison of test accuracies at different thresholds. However, in case of the non-parametric model, only the ROC curve itself is modeled, so we will not be able to extract threshold-specific information from this approach.
LMM for multiple thresholds
The logit LMM (Steinhauser et al. 2016) represents diagnostic test accuracy using a hierarchical regression framework that incorporates information from multiple decision thresholds reported within individual studies. Threshold-specific sensitivities and specificities are treated as correlated observations and are linked to the diagnostic threshold via a linear term. Between-study heterogeneity in overall test performance is accommodated via random effects, allowing study-specific deviations from the average accuracy–threshold relationship.
To fit the model we simply have to supply the corresponding model
identifier steinhauser2016logitlmm to the fitting
function.
mod_steinhauser <- fit_metaROC(hba1c, model = "steinhauser2016logitlmm")## Hello and welcome to metaROC!
## Requested model: steinhauser2016logitlmm
## This is an LMM for multiple thresholds.
## See https://doi.org/10.1186/s12874-016-0196-1 for more details.
The fitted model is reported as an MTM, and the corresponding DOI is
provided for reference. Internally, the function calls the R-package
diagmeta (Rücker et al.
2022).
To gain an overview of the most important results, we use the
summary.metaROC() method, passing the fitted model as the
input parameter. Additionally, we specify a weight of 0.6 for
sensitivity in the Youden index computation. We also request Wald CIs
for the estimated sensitivity and specificity.
If we had wanted a CI for the AUC, we would need to specify
ci_type = "bootstrap" along with the number of bootstrap
samples n_boot.
summary(mod_steinhauser, sens_weight = 0.6)##
## *** Results of Multiple threshold method (MTM) ***
##
## Model: LMM for multiple thresholds
## Type of model: DICS
##
## Total number of studies: 38
## Total number of thresholds: 124
## Number of different thresholds: 26
##
## Results with Wald confidence intervals:
##
## Youden index (sensitivity weight = 0.6): 0.4792
## Optimal threshold value: 5.7141
##
## Estimated Sensitivity and Specificity [95% CI]:
## Sens: 0.7264 [0.6454; 0.7948]
## Spec: 0.7594 [0.6806; 0.8238]
##
## AUC: 0.7992
At the beginning of the output, we are reminded that these results correspond to a MTM. Since this a parametric model, this means that now threshold-specific information is provided. The maximum Youden index, calculated using a sensitivity weight of 0.6, is 0.4792. This optimal value is attained at a threshold of 5.7141.
Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.7264 [0.6454; 0.7948], the estimated specificity is 0.7594 [0.6806; 0.8238], and the estimated AUC is 0.7992, which is lower than the estimated AUC from the STM. We cannot directly compare the estimated sensitivity and specificity to those of the models above, as we varied the weight of sensitivity in the Youden-index here.
Plotting the model results now gives us additional information from the plotting method as the MTM now allows inference on the diagnostic threshold.
plot(mod_steinhauser, studies = TRUE, youd_weight = 0.6)
## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...

## Model sens/spec vs threshold plotted. Press Enter to continue...

## Youden index plotted.
This plot illustrates another advantage of the MTM approach. In addition to the ROC curve, a 95% Wald CI is provided, which reflects the uncertainty around the summary curve. As observed, the interval is relatively wide in the middle of the curve, reflecting the variability across study-specific ROC curves.
Two additional plots are also shown. The first displays sensitivities and specificities (with their 95% CIs) against the threshold, providing insight into how changes in the threshold affect test performance. This can inform decisions about setting the optimal threshold, depending on whether sensitivity or specificity is prioritized. The second plot shows the Youden index as a function of the threshold, calculated using the specified sensitivity weight, highlighting the threshold at which overall test performance is maximized.
GLMM for discrete hazard
The GLMM proposed by Stoye et al. (2024) extends meta‑analytic methodology for diagnostic test accuracy by adapting the time-to-event approach first proposed by Hoyer et al. (2018) to discrete hazards. Instead of modeling (transformations of) sensitivity and specificity, it aims to estimate probabilities of “surviving” (that is being diagnosed as test-positive) a diagnostic threshold. More concretely, the GLMM models the discrete hazard of test values to fall in an interval between subsequent thresholds, simultaneously for diseased and non-diseased individuals. Between-study heterogeneity is accounted for with a bivariate random effect and summary sensitivity (1-specificity) is estimated from the discrete survival functions for diseased (non-diseased) populations.
To fit the model, we have to supply the corresponding model
identifier stoye2024cloglog to the fitting function. Since
this is a GLMM, we have the option to choose an optimization algorithm.
By default, the nAGQ parameter is set to 0, which uses the
penalized iteratively reweighted least squares (PIRLS) algorithm.
Alternatively, we can set it to 1 to use an Laplace approximation, or to
any value greater than 1 to use Gauss-Hermite quadrature. PIRLS and
Laplace approximation are implemented through
lme4::glmer(), while Gauss-Hermite quadrature is computed
using GLMMadaptive::mixed_model(). Choosing the algorithm
can affect computational speed and accuracy, particularly for small
studies or when the data are sparse, but in most cases the default
setting provides reliable estimates.
For this analysis, PIRLS is used. Although this is the default setting and does not need to be explicitly defined, it is included here in the function call for demonstration purposes.
mod_stoye <- fit_metaROC(hba1c, model = "stoye2024cloglog", nAGQ = 0)## Hello and welcome to metaROC!
## Requested model: stoye2024cloglog
## This is a GLMM for the discrete Hazard for multiple thresholds.
## See https://doi.org/10.1002/jrsm.1753 for more details.
## Warning in fit_metaROC_stoye2024dGLMM(data = data, model = "cloglog", nAGQ =
## nAGQ): NAs introduced by coercion
The fitted model is reported as an MTM, and the corresponding DOI is provided for reference.
A warning is displayed during model fitting (“NAs produced by coercion”), but this is not a cause for concern. It typically occurs when a single iteration encounters a value that cannot be transformed, such as during a cloglog or logit transformation. The warning does not indicate a failure of the model, which still converges and produces valid estimates. It serves only as a signal that one iteration required internal handling of a non-numeric or out-of-bounds value.
To gain an overview of the most important results, we use the
summary.metaROC() method, passing the fitted model as the
input parameter. Additionally, a sensitivity weight of 0.5 is used in
the Youden index computation. We also request Wald CIs for the estimated
sensitivity and specificity.
If we had wanted a CI for the AUC, we would need to specify
ci_type = "bootstrap" along with the number of bootstrap
samples n_boot.
summary(mod_stoye)##
## *** Results of Multiple threshold method (MTM) ***
##
## Model: GLMM for multiple thresholds using the discrete hazard
## Link: cloglog
##
## Total number of studies: 38
## Total number of thresholds: 124
## Number of different thresholds: 26
##
## Results with Wald confidence intervals:
##
## Youden index (sensitivity weight = 0.5): 0.5610
## Optimal threshold value: 5.5000
##
## Estimated Sensitivity and Specificity [95% CI]:
## Sens: 0.6457 [0.4917; 0.7627]
## Spec: 0.9152 [0.6727; 0.9966]
##
## AUC: 0.8495
At the beginning of the output, we are reminded that these results correspond to a MTM and the fitted model is a GLMM for discrete hazards with a complementary log-log link. The maximum Youden index, calculated using a sensitivity weight of 0.5, is 0.5610. This optimal value is obtained at a threshold of 5.5. Compared to the logit LMM the maximum Youden index is substantially higher and the optimal threshold lower.
Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.6457 [0.4917; 0.7627], while the estimated specificity is 0.9152 [0.6727; 0.9966]. This result is quite interesting. Based on this model, specificity is much higher than sensitivity, indicating that healthy individuals are almost always correctly classified. In contrast, sensitivity is around two-thirds, meaning that only about two-thirds of diseased individuals are correctly identified by the test.
Of particular interest is the estimated AUC of 0.8495, which provides a summary measure of diagnostic accuracy across the included studies. The AUC can also serve to compare models, with higher values reflecting better discriminative performance. An AUC of 0.8495 indicates high diagnostic accuracy, showing that the test distinguishes well between diseased and non-diseased individuals.
All these numerical results are informative, but visualizations can
often improve understanding. We will now plot the fitted model, choosing
to display the individual study curves from the data in the background
by setting studies = TRUE. This will show each study’s ROC
curve in grey, providing context for the estimated summary ROC curve. As
the threshold is incorporated in the model, we receive additional
information from the plotting method.
plot(mod_stoye, studies = TRUE)
## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...

## Model sens/spec vs threshold plotted. Press Enter to continue...

## Youden index plotted.
The plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed.
Examining the ROC plot, we can see this time that the Estimated ROC-Curve lies towards the upper third of the individual study ROC curves.
This plot illustrates another advantage of the MTM approach. In addition to the ROC curve, a 95% Wald CI is provided, which reflects the uncertainty around the summary curve. As observed, the interval is relatively wide in the middle of the curve, reflecting the variability across study-specific ROC curves. Compared to the Steinhauser model, the CIs appear wider, which is consistent with our earlier observation that the estimated sensitivities and specificities exhibit substantial variability across studies.
Two additional plots are also shown. The first displays sensitivities and specificities (with their 95% CIs) against the threshold, providing insight into how changes in the threshold affect test performance. This can inform decisions about setting the optimal threshold, depending on whether sensitivity or specificity is prioritized. In this plot, the CIs remain wide. Notably, the range of thresholds where sensitivity and specificity are both meaningfully above zero is narrower than in the Steinhauser model. This suggests that the discrete GLMM identifies a relatively constrained threshold range where the test performs reliably, with performance dropping sharply outside this range.
The last plot displays the Youden index as a function of the threshold, calculated using the specified sensitivity weight, highlighting the threshold at which overall test performance is maximized. In this case, the peak of the curve resembles a plateau, spanning thresholds from approximately 5.3 to 5.6. Additionally, the range of threshold values where the Youden index is close to zero is much smaller than in the previous model.
Non-parametric model
Martinez-Camblor (2017) proposes a fully non-parametric approach for estimating an SROC curve in meta-analyses of diagnostic studies with multiple thresholds. Unlike parametric or semiparametric hierarchical models, this method constructs the SROC curve directly from the observed sensitivity–specificity pairs across all reported thresholds without assuming a specific functional form. The approach can incorporate random effects to account for variability between studies while using all reported thresholds rather than just one per study. Fixed-effects versions are also possible when between-study heterogeneity is assumed to be minimal.
To fit the model, the corresponding model identifier
martinez2017np must be supplied to the fitting function.
This model can incorporate either fixed or random effects, controlled
via the variant parameter, which can be set to
"fixed-effects" or "random-effects" (the
latter being the default).
mod_martinez <- fit_metaROC(hba1c, model = "martinez2017np", variant = "random-effects")## Hello and welcome to metaROC!
## Requested model: martinez2017np
## This is a non-parametric model for multiple thresholds.
## See https://doi.org/10.1177/0962280214537047 for more details.
## Number of papers included in meta-analysis: 38
## Model considered: random-effects
## The area under the summary ROC curve (AUC) is 0.773.
## The optimal specificity and sensitivity (in the Youden index sense) for summary ROC curve are 0.797 and 0.645, respectively.
The fitted model is reported as non-parametric MTM, and the corresponding DOI is provided for reference.
The fitting process also returns some basic information that would
normally be provided by the summary.metaROC() method.
Specifically, we are informed that 38 studies were included in the
meta-analysis and that a fixed-effects model was fitted. Additionally,
the output provides the estimated AUC and the optimal sensitivity and
specificity. We will discuss these in more detail when applying the
summary.metaROC method.
We assign the default sensitivity weight of 0.5 and request Wald CIs as before.
summary(mod_martinez)##
## *** Results of Multiple threshold method (MTM) ***
##
## Model: Non-parametric model
## Variant: random-effects
##
## Total number of studies: 38
## Total number of thresholds: 124
## Number of different thresholds: 26
##
## Results with Wald confidence intervals:
##
## Youden index (sensitivity weight = 0.5): 0.4417
## Estimated Sensitivity and Specificity [95% CI]:
## Sens: 0.6449 [0.6287; 0.6610]
## Spec: 0.7968
##
## AUC: 0.7731
At very beginning we are again reminded that we fitted an MTM. However, this method does not report an optimal threshold, as it focuses on jointly summarizing sensitivity and specificity across studies rather than identifying a single point of maximal test performance. Still for estimated sensitivity and specificity we are provided with CIs. Afterwards, we are reminded which model was fitted together with model specific information, such as in this case the variant. Here, we used random effects for model fitting. Next, some formal details about the meta-analysis are presented: in total, 38 studies were included, reporting 124 thresholds, of which 26 were distinct. Also the CIs are wald CIs as requested. The maximum Youden index, calculated using a sensitivity weight of 0.5, is 0.4422. However, the model cannot report on the optimal threshold so this is only of limited interest to us.
Following this, the estimated sensitivity and specificity are presented. We only get a CI for the sensitivity. The estimated sensitivity is 0.6544 [0.6443; 0.6646], indicating that on average about 65.44% of diseased individuals are correctly identified across the included studies. The estimated specificity is 0.7878, meaning that on average about 78.78% of non-diseased individuals are correctly classified.
Of particular interest is the estimated AUC of 0.76845, which provides a summary measure of diagnostic accuracy across the included studies. The AUC can also serve to compare models, with higher values reflecting better discriminative performance. An AUC of 0.76845 indicates reasonable diagnostic accuracy. However, this is the lowest AUC observed among all models computed so far. Additionally, the model does not provide information on the optimal threshold.
As usual we will continue with visualizations as these could provide
more context. We will now plot the fitted model, choosing to display the
individual study curves from the data in the background by setting
studies = TRUE. This will show each study’s ROC curve in
grey, providing context for the estimated summary ROC curve.
plot(mod_martinez, studies = TRUE)
## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.
Again the plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed. Also since we fitted a non-parametric model, we do not receive any threshold related plots.
Copula model
stoye2024copula(reference to be added) propose a copula
model for multiple thresholds. They substitute random effects with
copulas, leading to the ability to directly model the dependence between
sensitivity and specificity and an increased control over the estimation
procedure.
This model can be fitted with the corresponding model identifier
stoye2024copula. The argument copula_kind
specifies the type of copula used to model the dependence between
sensitivity and specificity. We choose a Clayton copula, allowing for
asymmetric dependence. The marginal distributions of the diagnostic
processes are defined via marginal_kind, with the Weibull
distribution providing a flexible parametric form for modeling threshold
effects. The option norm_approx controls whether the
binomial marginal distributions are approximated by a normal
distribution during estimation, which can improve numerical stability
and reduce computational burden in larger datasets.
mod_stoy_cop <- fit_metaROC(hba1c, model = "stoye2024copula", copula_kind = "clayton",
norm_approx = TRUE, marginal_kind = "weibull")## Hello and welcome to metaROC!
## Requested model: stoye2024copula
## This is a copula model for multiple thresholds.
## See XX for more details.
The fitted model is reported as a copula model for multiple thresholds, as the paper is not published yet the DOI is not yet included.
To gain an overview of the most important results, we use the
summary.metaROC() method, passing the fitted model as the
input parameter.
summary(mod_stoy_cop)##
## *** Results of Multiple threshold method (MTM) ***
##
## Model: Copula model for multiple thresholds
## Copula kind: clayton
## Marignal distribution: weibull
## Normal approximation: TRUE
## Fixed copula parameters during estimation: FALSE
##
## Total number of studies: 38
## Total number of thresholds: 124
## Number of different thresholds: 26
##
## Results with Wald confidence intervals:
##
## Youden index (sensitivity weight = 0.5): 0.5873
## Optimal threshold value: 5.8549
##
## Estimated Sensitivity and Specificity [95% CI]:
## Sens: 0.6939 [0.6838; 0.7037]
## Spec: 0.8934 [0.8870; 0.8997]
##
## AUC: 0.8559
At very beginning we are again reminded that we fitted an MTM. Afterwards, again we are reminded which model was fitted together with model specific information, such as in this the copula information such as the copula kind used, the marginal distribution, whether a normal approximation was used or the copula parameters were fixed during estimation. Next, some formal details about the meta-analysis are presented: in total, 38 studies were included, reporting 124 thresholds, of which 26 were distinct.
Then we are told the CIs are wald CIs as requested. The maximum Youden index, calculated using a sensitivity weight of 0.5, is 0.5545. Additionally this Youden Index is achieved at a threshold of 5.8063.
Following this, the estimated sensitivity and specificity are presented. The estimated sensitivity is 0.6593 [0.6313; 0.6847], indicating that on average about 65.93% of diseased individuals are correctly identified across the included studies. The estimated specificity is Spec: 0.8952 [0.8901; 0.9002], meaning that on average about 89.52% of non-diseased individuals are correctly classified. Of particular interest is the estimated AUC of 0.8559, estimating good diagnostic accuracy of HbA1c.
We will continue with visualizations as these could provide more
context. We will now plot the fitted model, choosing to display the
individual study curves from the data in the background by setting
studies = TRUE. This will show each study’s ROC curve in
grey, providing context for the estimated summary ROC curve. As the
threshold is incorporated in the model, we receive additional
information from the plotting method.
plot(mod_stoy_cop, studies = TRUE)
## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...

## Model sens/spec vs threshold plotted. Press Enter to continue...

## Youden index plotted.
Again the plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed. As the model incorporates the threshold, we also receive both a sensitivity and specificity against threshold and a youden index against threshold plot. The confidence bands in the sensitivity and specificity–threshold plots are very narrow, and the curves appear reflectionally symmetric. Finally, the Youden index plot exhibits a bell-shaped curve, with the optimal threshold being clearly defined.