Skip to contents

Introduction

This vignette provides an introduction to fitting the models implemented in the metaROC package to real-world data. We will walk through each class of models, including single-threshold methods (STM) and multiple-threshold methods (MTM). For each model, we highlight the estimation procedure, along with the required input parameters. Afterwards, the summary.metaROC() method can be used to retrieve information such as the area under the ROC curve (AUC) and the estimated sensitivity and specificity. If the model provides an estimated optimal diagnostic threshold, these estimates correspond to that threshold; for models that do not, the sensitivity and specificity summarize the overall test performance across reported thresholds.

Importantly, this vignette serves as an overview of the methods used so that each possible input parameter of the fit_metaROC() function is showcased. It is not meant to be an in-depth guide for each model, but rather provides a broad introduction to their functionality and output. For a more in depth guide on a specific model, check out the model-specific vignettes.

In this guide, we focus on the hba1c dataset included in the metaROC package. This dataset, published in Hoyer et al. (2018), examines the diagnostic accuracy of the biomarker HbA1c (measured as the percentage of hemoglobin in the blood that is glycated) for detecting type 2 diabetes. The reference standard used is the 2-hour oral glucose tolerance test (2h-OGTT).

The dataset comprises 38 primary studies, some of which report results for multiple diagnostic thresholds of HbA1c. To ensure reproducibility of the results, we set a random seed before loading the dataset:

library(metaROC)
set.seed(7)

data(hba1c)
head(hba1c)
##   study threshold  TP   TN   FP  FN   D    H originally_published
## 1     1       5.9 574 1389  682 262 836 2071                    1
## 2     2       5.0 617 1005 7735  18 635 8740                    0
## 3     2       5.1 607 1617 7123  28 635 8740                    0
## 4     2       5.2 600 2438 6302  35 635 8740                    0
## 5     2       5.3 581 3409 5331  54 635 8740                    0
## 6     2       5.4 563 4422 4318  72 635 8740                    0

First, we investigate the structure of the dataset. As we can see, it contains a study identifier (study), which is a unique integer used to distinguish between individual studies. Next, the dataset includes the diagnostic threshold (threshold). It is important to note that single-threshold methods (STM) only incorporate one row per study and therefore ignore the threshold entirely. Consequently, if one is only interested in fitting an STM, this column is not required. To fit an MTM, several studies ideally need to report results for more than one diagnostic threshold (although most MTM can also be applied if all studies report only results on a single threshold). Having multiple thresholds per study provides the necessary information to model the full ROC curve and to properly account for the within-study correlation between thresholds.

The dataset further reports the numbers of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Finally, it includes the number of non-diseased (H) and the number of diseased individuals (D).

It is not necessary to explicitly provide H and D, these quantities can be derived from the previously mentioned values. Specifically, the number of non-diseased individuals can be calculated as H = TN + FP, while the number of diseased individuals can be calculated as D = FN + TP.

The dataset also inlcudes a column named orginally_published. This column indicates which single threshold was originally published for meta analysis for this dataset using only STM. Therefore, for applying our STM we will filter the dataset such that only those entries are included (indicated by the entry 1).

stm_hba1c <- hba1c[hba1c$originally_published == 1,]

Before performing any analysis, it is useful to visualize the data. Plotting the dataset provides a first overview of the reported study results and can help with interpreting and understanding the subsequent model fitting and evaluation results. Fortunately, the package provides the plot_data() function, which visualizes the individual ROC curves for each study.

plot_data(hba1c)

## Study specific ROC curves plotted

We can now confirm that the total number of studies is 38. In addition, we observe several ROC curves, indicating that multiple studies report results for at least two diagnostic thresholds. Therefore, the full HbA1c data are suitable for fitting MTM without any issues.

We can observe that the ROC curves start in the bottom-left corner of the plot, where both sensitivity and 1-specificity are low, and move toward the top-right corner as the diagnostic threshold changes. Along the way, the curve typically bends toward the top-left corner, which represents the ideal region of high sensitivity and high specificity. This pattern is consistent with typical ROC curves because a good diagnostic test should correctly identify most diseased individuals (high sensitivity) while also correctly classifying most non-diseased individuals (high specificity). The closer the curve is to the top-left corner, the better the test is at discriminating between diseased and non-diseased subjects, whereas curves near the diagonal line indicate poor discriminative ability. However, between studies there exists substantial variation in estimates.

Having now obtained a comprehensive overview of the dataset, we can proceed to fitting and evaluating models. The following sections are divided into two parts: one focusing on some single threshold methods (STM) and the other on some multiple threshold methods (MTM).

Single treshold methods (STM)

STM are used when each study reports only one diagnostic threshold, or when the focus is on a specific clinically relevant threshold. They are simpler to fit and interpret than multiple threshold methods, as they require only one pair of sensitivity and specificity per study. In this section, we will examine three single threshold models: a linear mixed-effects model (LMM) as proposed by Reitsma et al. (2005), a binomial generalized linear mixed-effects model (GLMM) as described by Chu and Cole (2006) and Chu et al. (2010), and a mixed model that jointly models the bivariate distribution of sensitivities and specificities using copulas (Nikoloulopoulos 2015).

The LMM assumes linearity in the logit-transformed sensitivities and specificities and accounts for between-study variability with random effects. In contrast, the GLMM directly models the counts of true positives and true negatives using binomial distributions, which allows for more accurate modeling of studies with small sample sizes or extreme proportions.

LMM for a single threshold per study

In this section, we focus on the LMM by Reitsma et al. (2005). The model operates on the logit-transformed sensitivities and specificities, using a normal distribution for the random effects to capture between-study variability. By jointly modeling sensitivity and specificity, it accounts for the correlation between these measures across studies, which allows us to estimate a pair of summary sensitivity and specificity, as well as an SROC curve with additional assumptions (Rücker and Schumacher 2010).

To fit this model, no additional parameters need to be specified. Simply specifying the corresponding model ID (reitsma2005lmm) and using the data with only a single threshold per study (stm_hba1c) suffices.

mod_reitsma <- fit_metaROC(stm_hba1c, model = "reitsma2005lmm")
## Hello and welcome to metaROC!
## Requested model: reitsma2005lmm 
## This is an LMM for a single threshold per study.
##  See https://doi.org/10.1016/j.jclinepi.2005.02.022 for more details.

As we can see, the function provides a brief output message that confirms which model has been fitted. It also supplies the corresponding DOI for a literature reference, so that users can refer to the original source for more detailed information. Internally, the function calls the R-package mada (Doebler and Sousa-Pinto 2022).

To gain an overview of the most important results, we use the summary.metaROC() method, passing the fitted model as the input parameter. Additionally, we specify a weight of 0.5 for sensitivity in the Youden index computation. We also request Wald confidence intervals (CIs) for the estimated sensitivity and specificity.

If we had wanted a CI for the AUC, we would need to specify ci_type = "bootstrap" along with the number of bootstrap samples n_boot.

summary(mod_reitsma, sens_weight = 0.5, ci_type = "wald")
## 
## *** Results of Single threshold method (STM) ***
## 
## Model: LMM for single threshold 
## 
## Total number of studies: 38 
## Total number of thresholds: 38 
## Number of different thresholds: 13 
## 
## Results with Wald confidence intervals: 
## 
## Youden index (sensitivity weight = 0.5): 0.5310
## Estimated Sensitivity and Specificity [95% CI]:
##  Sens: 0.7211 [0.6690; 0.7678]
##  Spec: 0.8093 [0.7630; 0.8483]
## 
## AUC: 0.8312

At the beginning of the output, we are reminded that these results correspond to a single-threshold method. This means that no threshold-specific information is provided. Afterwards, the specific model is stated again. Next, some formal details about the meta-analysis are presented: in total, 38 studies were included, reporting on 13 distinct thresholds.

We are also informed that the CIs, indicated in brackets after the estimates, are Wald CIs. The maximum Youden index at the specified weight is also reported. For STM, the maximum Youden index is generally less relevant, as these methods do not allow threshold-specific inference.

Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.7211 [0.6690; 0.7678], which means that the model estimates the probability of diseased individuals to be correctly tested positive to be 72.11%. The estimated specificity is 0.8093 [0.7630; 0.8483], i.e., 80.93% is the probability that non-diseased individuals are tested negative.

Of particular interest is the estimated AUC of 0.8312, which provides a summary measure of the diagnostic accuracy across all studies, indicating good overall diagnostic accuracy of HbA1c. However, for the diagnosis of type 2 diabetes, there may be superior diagnostic tests, such as fasting plasma glucose (Baechle et al. 2026).

All these numerical results are informative, but visualizations can often improve understanding. We will now plot the fitted model, choosing to display the individual study results from the data in the background by setting studies = TRUE. This will show each study’s estimated pair of 1-specificity and sensitivity, providing context for the estimated SROC curve.

plot(mod_reitsma, studies = TRUE)

## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.

The plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed because there are unknown in reality.

The first plot displays the observed data points for each study. Since the analysis is based on originally published data, each study contributes a single sensitivity–specificity pair.

The second plot shows the estimated SROC curve together with the greyed-out study-specific data points. As the Reitsma model is an STM, it does not provide a direct pointwise confidence band for the SROC curve. Instead, uncertainty is represented through a confidence region around the estimated mean sensitivity–specificity operating point. This region is visualized as an ellipse in ROC space. When pred_reg = TRUE, the between-study variance is additionally incorporated, resulting in a predictive ellipse which can be interpreted as enclosing point estimates of future studies with 95% probability.

GLMM for a single threshold per study

The GLMM proposed by Chu and Cole (2006) and Chu et al. (2010) takes a slightly different approach compared to the LMM by modeling the counts of true positives and true negatives directly using binomial distributions. This framework allows the model to better handle studies with small sample sizes or extreme proportions of sensitivity and specificity. Like the LMM, it accounts for between-study variability through random effects, but by operating on the raw counts, it can provide more accurate estimates when the standard normal approximations used in the LMM might be less reliable.

To fit the GLMM, we specify the model ID chu2006glmm. Additionally one could change the link used within the GLMM framework, but we will simply work with the default logit link:

mod_chu <- fit_metaROC(stm_hba1c, model = "chu2006glmm")
## Hello and welcome to metaROC!
## Requested model: chu2006glmm 
## This is a GLMM for a single threshold per study.
##  See https://doi.org/10.1177/0272989X09353452 for more details.

As we can see, the function provides a brief output message that confirms which model has been fitted. It also supplies the corresponding DOI to Chu et al. (2010), so that users can refer to the original source for more detailed information.

To gain an overview of the most important results, we use the summary.metaROC() method, passing the fitted model as the input parameter. Additionally, we specify a weight of 0.5 for sensitivity in the Youden index computation. We also request Wald CIs for the estimated sensitivity and specificity.

If we had wanted a CI for the AUC, we would need to specify ci_type = "bootstrap" along with the number of bootstrap samples n_boot.

summary(mod_chu)
## 
## *** Results of Single threshold method (STM) ***
## 
## Model: GLMM for single threshold 
## Link: logit 
## 
## Total number of studies: 38 
## Total number of thresholds: 38 
## Number of different thresholds: 13 
## 
## Results with Wald confidence intervals: 
## 
## Youden index (sensitivity weight = 0.5): 0.5528
## Estimated Sensitivity and Specificity [95% CI]:
##  Sens: 0.7276 [0.6739; 0.7754]
##  Spec: 0.8107 [0.7650; 0.8493]
## 
## AUC: 0.8134

The output of the summary method is analogous to the LMM. Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.7276 [0.6739; 0.7754]. The estimated specificity is 0.8107 [0.7650; 0.8493]. Of particular interest is the estimated AUC of 0.8134 which is slightly lower than the estimated AUC from the LMM.

As the GLMM is an STM, we again only see the single data points for each study. Additionally, it also does not provide a confidence band for the SROC curve, but as for the LMM, we can provide a confidence/prediction ellipse around the estimated mean sensitivity and specificity point.

plot(mod_chu, studies = TRUE, pred_reg = TRUE)

## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.

Copula mixed model for bivariate meta-analysis (Nikoloulopoulos 2015)

Nikoloulopoulos (2015) introduces a copula mixed-effects model for bivariate meta-analysis of diagnostic test accuracy studies, which generalizes the standard GLMM by allowing greater flexibility in the joint distribution of study-specific sensitivity and specificity. Unlike the GLMM, which typically assumes a bivariate normal distribution for the random effects, the copula model uses a copula to represent the dependence between sensitivity and specificity as well as between-study variability. The model operates on the original scale of sensitivity and specificity and includes the GLMM as a special case (by assuming a Gaussian copula). SROC curves can be derived from the fitted model, and the AUC can be estimated. However, the model does not report an optimal threshold, as it focuses on summarizing sensitivity and specificity across all reported thresholds rather than identifying a single point of maximal test performance. This flexible dependence structure allows for a more accurate fit to data compared to standard parametric models while retaining a parametric formulation.

To fit the model, we first specify the model identifier nikoloulopoulos2015bivariate. The copula_kind parameter determines the type of copula used to model the dependence between study-specific sensitivity and specificity. Copulas are functions that allow us to describe the dependence structure between two random variables separately from their marginal distributions. In this analysis, we use the clayton copula, which is able to capture positive dependence with stronger association in the lower tail.

mod_nikol <- fit_metaROC(stm_hba1c, model = "nikoloulopoulos2015bivariate")
## Hello and welcome to metaROC!
## Requested model: nikoloulopoulos2015bivariate 
## This is a copula mixed model for bivariate meta-analysis. 
##  See https://doi.org/10.1002/sim.6595 for more details.
## Registered S3 method overwritten by 'car':
##   method           from
##   na.action.merMod lme4

The fitted model is reported as a copula mixed model for bivariate meta-analysis, and the corresponding DOI is provided for reference. Notably, this fitting procedure took slightly longer than the other models because the copula mixed-effects model estimates both the marginal distributions and the dependence structure between sensitivity and specificity, which increases computational complexity and requires more intensive likelihood optimization.

To gain an overview of the most important results, we use the summary.metaROC() method, passing the fitted model as the input parameter. Additionally, a sensitivity weight of 0.5 is used in the Youden index computation. We also request Wald confidence intervals for the estimated sensitivity and specificity.

If we had wanted a CI for the AUC, we would need to specify ci_type = "bootstrap" along with the number of bootstrap samples n_boot.

summary(mod_nikol)
## 
## *** Results of Single threshold method (STM) ***
## 
## Model: Copula model for single threshold 
## 
## Total number of studies: 38 
## Total number of thresholds: 38 
## Number of different thresholds: 13 
## 
## Results with Wald confidence intervals: 
## 
## Youden index (sensitivity weight = 0.5): 0.9046
## Estimated Sensitivity and Specificity [95% CI]:
##  Sens: 0.7240 [0.7084; 0.7396]
##  Spec: 0.7791 [0.7749; 0.7833]
## 
## AUC: 0.8590

The model estimates sensitivity to be 0.7240 [0.7084; 0.7396], specificity to be 0.7791 [0.7749; 0.7833], and an AUC of 0.8590. While the estimated specificity is lower than the estimates from LMM and GLMM, the AUC is slightly higher. Plotting the SROC curve yields a similar estimated structure to the two preceding models, however for this STM no confidence/predictive ellipse is implemented:

plot(mod_nikol, studies = TRUE)

## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.

Multiple threshold methods (MTM)

MTM are used when studies report results for one or more diagnostic thresholds. Unlike STM, MTM take advantage of all available threshold information within each study to model the full ROC curve. This allows for a more comprehensive assessment of diagnostic performance across the entire range of possible thresholds.

MTM are particularly useful when one wants to estimate summary ROC curves, account for within-study correlation between thresholds, or identify optimal thresholds using measures such as the Youden index. By considering multiple thresholds and their values per study, (most) MTM implemented in metaROC allow inference on the optimal diagnostic threshold.

For the MTM, we consider three models in this vignette: The LMM proposed by Steinhauser et al. (2016) (ID steinhauser2016logitlmm) models logit-transformed sensitivities and specificities across multiple thresholds. The discrete GLMM proposed by Stoye et al. (2024) (ID stoye2024cloglog) models the discrete hazard to have an unobserved test value in an interval between diagnostic thresholds, which accounts for varying thresholds using a complementary log-log link. The non-parametric approach proposed by Martinez-Camblor (2017) (ID martinez2017np) estimates the SROC curve without assuming a specific parametric form.

Each of these MTM has particular strengths. The logit LMM and discrete GLMM leverage mixed-effects modeling to account for between-study variability while incorporating information from multiple thresholds. The non-parametric model avoids strong parametric assumptions, providing flexibility when the true ROC shape is unknown.

Compared to the STM, MTM allow us to explore the diagnostic performance across multiple thresholds and potentially identify an optimal threshold for HbA1c. This enables comparison of test accuracies at different thresholds. However, in case of the non-parametric model, only the ROC curve itself is modeled, so we will not be able to extract threshold-specific information from this approach.

LMM for multiple thresholds

The logit LMM (Steinhauser et al. 2016) represents diagnostic test accuracy using a hierarchical regression framework that incorporates information from multiple decision thresholds reported within individual studies. Threshold-specific sensitivities and specificities are treated as correlated observations and are linked to the diagnostic threshold via a linear term. Between-study heterogeneity in overall test performance is accommodated via random effects, allowing study-specific deviations from the average accuracy–threshold relationship.

To fit the model we simply have to supply the corresponding model identifier steinhauser2016logitlmm to the fitting function.

mod_steinhauser <- fit_metaROC(hba1c, model = "steinhauser2016logitlmm")
## Hello and welcome to metaROC!
## Requested model: steinhauser2016logitlmm 
## This is an LMM for multiple thresholds.
##  See https://doi.org/10.1186/s12874-016-0196-1 for more details.

The fitted model is reported as an MTM, and the corresponding DOI is provided for reference. Internally, the function calls the R-package diagmeta (Rücker et al. 2022).

To gain an overview of the most important results, we use the summary.metaROC() method, passing the fitted model as the input parameter. Additionally, we specify a weight of 0.6 for sensitivity in the Youden index computation. We also request Wald CIs for the estimated sensitivity and specificity.

If we had wanted a CI for the AUC, we would need to specify ci_type = "bootstrap" along with the number of bootstrap samples n_boot.

summary(mod_steinhauser, sens_weight = 0.6)
## 
## *** Results of Multiple threshold method (MTM) ***
## 
## Model: LMM for multiple thresholds 
## Type of model: DICS 
## 
## Total number of studies: 38 
## Total number of thresholds: 124 
## Number of different thresholds: 26 
## 
## Results with Wald confidence intervals: 
## 
## Youden index (sensitivity weight = 0.6): 0.4792
## Optimal threshold value: 5.7141
## 
## Estimated Sensitivity and Specificity [95% CI]:
##  Sens: 0.7264 [0.6454; 0.7948]
##  Spec: 0.7594 [0.6806; 0.8238]
## 
## AUC: 0.7992

At the beginning of the output, we are reminded that these results correspond to a MTM. Since this a parametric model, this means that now threshold-specific information is provided. The maximum Youden index, calculated using a sensitivity weight of 0.6, is 0.4792. This optimal value is attained at a threshold of 5.7141.

Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.7264 [0.6454; 0.7948], the estimated specificity is 0.7594 [0.6806; 0.8238], and the estimated AUC is 0.7992, which is lower than the estimated AUC from the STM. We cannot directly compare the estimated sensitivity and specificity to those of the models above, as we varied the weight of sensitivity in the Youden-index here.

Plotting the model results now gives us additional information from the plotting method as the MTM now allows inference on the diagnostic threshold.

plot(mod_steinhauser, studies = TRUE, youd_weight = 0.6)

## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...

## Model sens/spec vs threshold plotted. Press Enter to continue...

## Youden index plotted.

This plot illustrates another advantage of the MTM approach. In addition to the ROC curve, a 95% Wald CI is provided, which reflects the uncertainty around the summary curve. As observed, the interval is relatively wide in the middle of the curve, reflecting the variability across study-specific ROC curves.

Two additional plots are also shown. The first displays sensitivities and specificities (with their 95% CIs) against the threshold, providing insight into how changes in the threshold affect test performance. This can inform decisions about setting the optimal threshold, depending on whether sensitivity or specificity is prioritized. The second plot shows the Youden index as a function of the threshold, calculated using the specified sensitivity weight, highlighting the threshold at which overall test performance is maximized.

GLMM for discrete hazard

The GLMM proposed by Stoye et al. (2024) extends meta‑analytic methodology for diagnostic test accuracy by adapting the time-to-event approach first proposed by Hoyer et al. (2018) to discrete hazards. Instead of modeling (transformations of) sensitivity and specificity, it aims to estimate probabilities of “surviving” (that is being diagnosed as test-positive) a diagnostic threshold. More concretely, the GLMM models the discrete hazard of test values to fall in an interval between subsequent thresholds, simultaneously for diseased and non-diseased individuals. Between-study heterogeneity is accounted for with a bivariate random effect and summary sensitivity (1-specificity) is estimated from the discrete survival functions for diseased (non-diseased) populations.

To fit the model, we have to supply the corresponding model identifier stoye2024cloglog to the fitting function. Since this is a GLMM, we have the option to choose an optimization algorithm. By default, the nAGQ parameter is set to 0, which uses the penalized iteratively reweighted least squares (PIRLS) algorithm. Alternatively, we can set it to 1 to use an Laplace approximation, or to any value greater than 1 to use Gauss-Hermite quadrature. PIRLS and Laplace approximation are implemented through lme4::glmer(), while Gauss-Hermite quadrature is computed using GLMMadaptive::mixed_model(). Choosing the algorithm can affect computational speed and accuracy, particularly for small studies or when the data are sparse, but in most cases the default setting provides reliable estimates.

For this analysis, PIRLS is used. Although this is the default setting and does not need to be explicitly defined, it is included here in the function call for demonstration purposes.

mod_stoye <- fit_metaROC(hba1c, model = "stoye2024cloglog", nAGQ = 0)
## Hello and welcome to metaROC!
## Requested model: stoye2024cloglog 
## This is a GLMM for the discrete Hazard for multiple thresholds.
##  See https://doi.org/10.1002/jrsm.1753 for more details.
## Warning in fit_metaROC_stoye2024dGLMM(data = data, model = "cloglog", nAGQ =
## nAGQ): NAs introduced by coercion

The fitted model is reported as an MTM, and the corresponding DOI is provided for reference.

A warning is displayed during model fitting (“NAs produced by coercion”), but this is not a cause for concern. It typically occurs when a single iteration encounters a value that cannot be transformed, such as during a cloglog or logit transformation. The warning does not indicate a failure of the model, which still converges and produces valid estimates. It serves only as a signal that one iteration required internal handling of a non-numeric or out-of-bounds value.

To gain an overview of the most important results, we use the summary.metaROC() method, passing the fitted model as the input parameter. Additionally, a sensitivity weight of 0.5 is used in the Youden index computation. We also request Wald CIs for the estimated sensitivity and specificity.

If we had wanted a CI for the AUC, we would need to specify ci_type = "bootstrap" along with the number of bootstrap samples n_boot.

summary(mod_stoye)
## 
## *** Results of Multiple threshold method (MTM) ***
## 
## Model: GLMM for multiple thresholds using the discrete hazard 
## Link: cloglog 
## 
## Total number of studies: 38 
## Total number of thresholds: 124 
## Number of different thresholds: 26 
## 
## Results with Wald confidence intervals: 
## 
## Youden index (sensitivity weight = 0.5): 0.5610
## Optimal threshold value: 5.5000
## 
## Estimated Sensitivity and Specificity [95% CI]:
##  Sens: 0.6457 [0.4917; 0.7627]
##  Spec: 0.9152 [0.6727; 0.9966]
## 
## AUC: 0.8495

At the beginning of the output, we are reminded that these results correspond to a MTM and the fitted model is a GLMM for discrete hazards with a complementary log-log link. The maximum Youden index, calculated using a sensitivity weight of 0.5, is 0.5610. This optimal value is obtained at a threshold of 5.5. Compared to the logit LMM the maximum Youden index is substantially higher and the optimal threshold lower.

Following this, the estimated sensitivity and specificity are presented along with their 95% CIs. The estimated sensitivity is 0.6457 [0.4917; 0.7627], while the estimated specificity is 0.9152 [0.6727; 0.9966]. This result is quite interesting. Based on this model, specificity is much higher than sensitivity, indicating that healthy individuals are almost always correctly classified. In contrast, sensitivity is around two-thirds, meaning that only about two-thirds of diseased individuals are correctly identified by the test.

Of particular interest is the estimated AUC of 0.8495, which provides a summary measure of diagnostic accuracy across the included studies. The AUC can also serve to compare models, with higher values reflecting better discriminative performance. An AUC of 0.8495 indicates high diagnostic accuracy, showing that the test distinguishes well between diseased and non-diseased individuals.

All these numerical results are informative, but visualizations can often improve understanding. We will now plot the fitted model, choosing to display the individual study curves from the data in the background by setting studies = TRUE. This will show each study’s ROC curve in grey, providing context for the estimated summary ROC curve. As the threshold is incorporated in the model, we receive additional information from the plotting method.

plot(mod_stoye, studies = TRUE)

## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...

## Model sens/spec vs threshold plotted. Press Enter to continue...

## Youden index plotted.

The plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed.

Examining the ROC plot, we can see this time that the Estimated ROC-Curve lies towards the upper third of the individual study ROC curves.

This plot illustrates another advantage of the MTM approach. In addition to the ROC curve, a 95% Wald CI is provided, which reflects the uncertainty around the summary curve. As observed, the interval is relatively wide in the middle of the curve, reflecting the variability across study-specific ROC curves. Compared to the Steinhauser model, the CIs appear wider, which is consistent with our earlier observation that the estimated sensitivities and specificities exhibit substantial variability across studies.

Two additional plots are also shown. The first displays sensitivities and specificities (with their 95% CIs) against the threshold, providing insight into how changes in the threshold affect test performance. This can inform decisions about setting the optimal threshold, depending on whether sensitivity or specificity is prioritized. In this plot, the CIs remain wide. Notably, the range of thresholds where sensitivity and specificity are both meaningfully above zero is narrower than in the Steinhauser model. This suggests that the discrete GLMM identifies a relatively constrained threshold range where the test performs reliably, with performance dropping sharply outside this range.

The last plot displays the Youden index as a function of the threshold, calculated using the specified sensitivity weight, highlighting the threshold at which overall test performance is maximized. In this case, the peak of the curve resembles a plateau, spanning thresholds from approximately 5.3 to 5.6. Additionally, the range of threshold values where the Youden index is close to zero is much smaller than in the previous model.

Non-parametric model

Martinez-Camblor (2017) proposes a fully non-parametric approach for estimating an SROC curve in meta-analyses of diagnostic studies with multiple thresholds. Unlike parametric or semiparametric hierarchical models, this method constructs the SROC curve directly from the observed sensitivity–specificity pairs across all reported thresholds without assuming a specific functional form. The approach can incorporate random effects to account for variability between studies while using all reported thresholds rather than just one per study. Fixed-effects versions are also possible when between-study heterogeneity is assumed to be minimal.

To fit the model, the corresponding model identifier martinez2017np must be supplied to the fitting function. This model can incorporate either fixed or random effects, controlled via the variant parameter, which can be set to "fixed-effects" or "random-effects" (the latter being the default).

mod_martinez <- fit_metaROC(hba1c, model = "martinez2017np", variant = "random-effects")
## Hello and welcome to metaROC!
## Requested model: martinez2017np 
## This is a non-parametric model for multiple thresholds.
##  See https://doi.org/10.1177/0962280214537047 for more details.
## Number of papers included in meta-analysis: 38 
## Model considered: random-effects
## The area under the summary ROC curve (AUC) is 0.773.
## The optimal specificity and sensitivity (in the Youden index sense) for summary ROC curve are 0.797 and 0.645, respectively.

The fitted model is reported as non-parametric MTM, and the corresponding DOI is provided for reference.

The fitting process also returns some basic information that would normally be provided by the summary.metaROC() method. Specifically, we are informed that 38 studies were included in the meta-analysis and that a fixed-effects model was fitted. Additionally, the output provides the estimated AUC and the optimal sensitivity and specificity. We will discuss these in more detail when applying the summary.metaROC method.

We assign the default sensitivity weight of 0.5 and request Wald CIs as before.

summary(mod_martinez)
## 
## *** Results of Multiple threshold method (MTM) ***
## 
## Model: Non-parametric model 
## Variant: random-effects 
## 
## Total number of studies: 38 
## Total number of thresholds: 124 
## Number of different thresholds: 26 
## 
## Results with Wald confidence intervals: 
## 
## Youden index (sensitivity weight = 0.5): 0.4417
## Estimated Sensitivity and Specificity [95% CI]:
##  Sens: 0.6449 [0.6287; 0.6610]
##  Spec: 0.7968
## 
## AUC: 0.7731

At very beginning we are again reminded that we fitted an MTM. However, this method does not report an optimal threshold, as it focuses on jointly summarizing sensitivity and specificity across studies rather than identifying a single point of maximal test performance. Still for estimated sensitivity and specificity we are provided with CIs. Afterwards, we are reminded which model was fitted together with model specific information, such as in this case the variant. Here, we used random effects for model fitting. Next, some formal details about the meta-analysis are presented: in total, 38 studies were included, reporting 124 thresholds, of which 26 were distinct. Also the CIs are wald CIs as requested. The maximum Youden index, calculated using a sensitivity weight of 0.5, is 0.4422. However, the model cannot report on the optimal threshold so this is only of limited interest to us.

Following this, the estimated sensitivity and specificity are presented. We only get a CI for the sensitivity. The estimated sensitivity is 0.6544 [0.6443; 0.6646], indicating that on average about 65.44% of diseased individuals are correctly identified across the included studies. The estimated specificity is 0.7878, meaning that on average about 78.78% of non-diseased individuals are correctly classified.

Of particular interest is the estimated AUC of 0.76845, which provides a summary measure of diagnostic accuracy across the included studies. The AUC can also serve to compare models, with higher values reflecting better discriminative performance. An AUC of 0.76845 indicates reasonable diagnostic accuracy. However, this is the lowest AUC observed among all models computed so far. Additionally, the model does not provide information on the optimal threshold.

As usual we will continue with visualizations as these could provide more context. We will now plot the fitted model, choosing to display the individual study curves from the data in the background by setting studies = TRUE. This will show each study’s ROC curve in grey, providing context for the estimated summary ROC curve.

plot(mod_martinez, studies = TRUE)

## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...
## you provided a model that does not model the threshold directly, therefore no threshold plots can be computed.

Again the plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed. Also since we fitted a non-parametric model, we do not receive any threshold related plots.

Copula model

stoye2024copula(reference to be added) propose a copula model for multiple thresholds. They substitute random effects with copulas, leading to the ability to directly model the dependence between sensitivity and specificity and an increased control over the estimation procedure.

This model can be fitted with the corresponding model identifier stoye2024copula. The argument copula_kind specifies the type of copula used to model the dependence between sensitivity and specificity. We choose a Clayton copula, allowing for asymmetric dependence. The marginal distributions of the diagnostic processes are defined via marginal_kind, with the Weibull distribution providing a flexible parametric form for modeling threshold effects. The option norm_approx controls whether the binomial marginal distributions are approximated by a normal distribution during estimation, which can improve numerical stability and reduce computational burden in larger datasets.

mod_stoy_cop <- fit_metaROC(hba1c, model = "stoye2024copula", copula_kind = "clayton", 
                            norm_approx = TRUE, marginal_kind = "weibull")
## Hello and welcome to metaROC!
## Requested model: stoye2024copula 
## This is a copula model for multiple thresholds.
##  See XX for more details.

The fitted model is reported as a copula model for multiple thresholds, as the paper is not published yet the DOI is not yet included.

To gain an overview of the most important results, we use the summary.metaROC() method, passing the fitted model as the input parameter.

summary(mod_stoy_cop)
## 
## *** Results of Multiple threshold method (MTM) ***
## 
## Model: Copula model for multiple thresholds 
## Copula kind: clayton 
## Marignal distribution: weibull 
## Normal approximation: TRUE 
## Fixed copula parameters during estimation: FALSE 
## 
## Total number of studies: 38 
## Total number of thresholds: 124 
## Number of different thresholds: 26 
## 
## Results with Wald confidence intervals: 
## 
## Youden index (sensitivity weight = 0.5): 0.5873
## Optimal threshold value: 5.8549
## 
## Estimated Sensitivity and Specificity [95% CI]:
##  Sens: 0.6939 [0.6838; 0.7037]
##  Spec: 0.8934 [0.8870; 0.8997]
## 
## AUC: 0.8559

At very beginning we are again reminded that we fitted an MTM. Afterwards, again we are reminded which model was fitted together with model specific information, such as in this the copula information such as the copula kind used, the marginal distribution, whether a normal approximation was used or the copula parameters were fixed during estimation. Next, some formal details about the meta-analysis are presented: in total, 38 studies were included, reporting 124 thresholds, of which 26 were distinct.

Then we are told the CIs are wald CIs as requested. The maximum Youden index, calculated using a sensitivity weight of 0.5, is 0.5545. Additionally this Youden Index is achieved at a threshold of 5.8063.

Following this, the estimated sensitivity and specificity are presented. The estimated sensitivity is 0.6593 [0.6313; 0.6847], indicating that on average about 65.93% of diseased individuals are correctly identified across the included studies. The estimated specificity is Spec: 0.8952 [0.8901; 0.9002], meaning that on average about 89.52% of non-diseased individuals are correctly classified. Of particular interest is the estimated AUC of 0.8559, estimating good diagnostic accuracy of HbA1c.

We will continue with visualizations as these could provide more context. We will now plot the fitted model, choosing to display the individual study curves from the data in the background by setting studies = TRUE. This will show each study’s ROC curve in grey, providing context for the estimated summary ROC curve. As the threshold is incorporated in the model, we receive additional information from the plotting method.

plot(mod_stoy_cop, studies = TRUE)

## Study specific ROC curves plotted

## Estimated ROC + grey study curves plotted. Press Enter to continue...

## Model sens/spec vs threshold plotted. Press Enter to continue...

## Youden index plotted.

Again the plot reminds us that the data we have used are real-world data. This means that information about the true data-generating process is unavailable, so elements such as the true ROC curve cannot be displayed. As the model incorporates the threshold, we also receive both a sensitivity and specificity against threshold and a youden index against threshold plot. The confidence bands in the sensitivity and specificity–threshold plots are very narrow, and the curves appear reflectionally symmetric. Finally, the Youden index plot exhibits a bell-shaped curve, with the optimal threshold being clearly defined.

References

Baechle, Christina, Ferdinand V Stoye, Nafiseh Shokri-Mashhadi, et al. 2026. “Reassessment of the Diagnostic Accuracy of HbA1c and Glucose for Type 2 Diabetes: A Systematic Review and Meta-Analysis of Observational Studies.” Diabetes/Metabolism Research and Reviews 42 (4): e70160. https://doi.org/10.1002/dmrr.70160.
Chu, Haitao, and Stephen R Cole. 2006. “Bivariate Meta-Analysis of Sensitivity and Specificity with Sparse Data: A Generalized Linear Mixed Model Approach.” Journal of Clinical Epidemiology 59 (12): 1331–32. https://doi.org/10.1016/j.jclinepi.2006.06.011.
Chu, Haitao, Hongfei Guo, and Yijie Zhou. 2010. “Bivariate Random Effects Meta-Analysis of Diagnostic Studies Using Generalized Linear Mixed Models.” Medical Decision Making 30 (4): 499–508. https://doi.org/10.1177/0272989X09353452.
Doebler, Philipp, and Bernardo Sousa-Pinto. 2022. Mada: Meta-Analysis of Diagnostic Accuracy. https://CRAN.R-project.org/package=mada.
Hoyer, Annika, Stefan Hirt, and Oliver Kuss. 2018. “Meta-Analysis of Full ROC Curves Using Bivariate Time-to-Event Models for Interval-Censored Data.” Research Synthesis Methods 9: 62–72. https://doi.org/https://doi.org/10.1002/jrsm.1273.
Martinez-Camblor, Pablo. 2017. “Fully Non-Parametric Receiver Operating Characteristic Curve Estimation for Random-Effects Meta-Analysis.” Statistical Methods in Medical Research 26 (1): 5–20. https://doi.org/10.1177/0962280214537047.
Nikoloulopoulos, Aristidis K. 2015. “A Mixed Effect Model for Bivariate Meta-Analysis of Diagnostic Test Accuracy Studies Using a Copula Representation of the Random Effects Distribution.” Statistics in Medicine 34 (29): 3842–65. https://doi.org/10.1002/sim.6595.
Reitsma, Johannes B, Afina S Glas, Anne WS Rutjes, Rob JPM Scholten, Patrick M Bossuyt, and Aeilko H Zwinderman. 2005. “Bivariate Analysis of Sensitivity and Specificity Produces Informative Summary Measures in Diagnostic Reviews.” Journal of Clinical Epidemiology 58 (10): 982–90. https://doi.org/10.1016/j.jclinepi.2005.02.022.
Rücker, Gerta, and Martin Schumacher. 2010. “Summary ROC Curve Based on a Weighted Youden Index for Selecting an Optimal Cutpoint in Meta-Analysis of Diagnostic Accuracy.” Statistics in Medicine 29 (30): 3069–78. https://doi.org/10.1002/sim.3937.
Rücker, Gerta, Susanne Steinhauser, Srinath Kolampally, and Guido Schwarzer. 2022. Diagmeta: Meta-Analysis of Diagnostic Accuracy Studies with Several Cutpoints. https://CRAN.R-project.org/package=diagmeta.
Steinhauser, Susanne, Martin Schumacher, and Gerta Rücker. 2016. “Modelling Multiple Thresholds in Meta-Analysis of Diagnostic Test Accuracy Studies.” BMC Medical Research Methodology 16 (August): 97. https://doi.org/https://doi.org/10.1186/S12874-016-0196-1.
Stoye, Ferdinand V, Claudia Tschammler, Oliver Kuss, and Annika Hoyer. 2024. “A Discrete Time-to-Event Model for the Meta-Analysis of Full ROC Curves.” Research Synthesis Methods 15 (6): 1031–48. https://doi.org/10.1002/jrsm.1753.