Results: Diagnosis of ADHD

Bradley S. Peterson; Joey Trampush; Margaret Maglione; Maria Bolshakova; Morah Brown; Mary Rozelle; Aneesa Motala; Sachi Yagyu; Jeremy Miles; Sheila Pakdaman; Mario Gastelum; Bich Thuy (Becky) Nguyen; Erin Tokutomi; Esther Lee; Jerusalem Z. Belay; Coleman Schaefer; Benjamin Coughlin; Karin Celosse; Sreya Molakalapalli; Brittany Shaw; Tanzina Sazmin; Anne N. Onyekwuluje; Danica Tolentino; Susanne Hempel

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Peterson BS, Trampush J, Maglione M, et al. ADHD Diagnosis and Treatment in Children and Adolescents [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2024 Mar. (Comparative Effectiveness Review, No. 267.)

Cover of ADHD Diagnosis and Treatment in Children and Adolescents

ADHD Diagnosis and Treatment in Children and Adolescents [Internet].

Show details

Contents

< Prev Next >

4Results: Diagnosis of ADHD

The Key Question (KQ) is divided into four subquestions:

KQ1a. What is the comparative diagnostic accuracy of approaches that can be used in the primary care practice setting or by specialists to diagnose attention deficit hyperactivity disorder (ADHD) among individuals younger than 7 years of age?
KQ1b. What is the comparative diagnostic accuracy of electroencephalogram (EEG), imaging, or approaches assessing executive function that can be used in the primary care practice setting or by specialists to diagnose ADHD among individuals aged 7 through 17?
KQ1c. For both populations, how does the comparative diagnostic accuracy of these approaches vary by clinical setting, including primary care or specialty clinic, or patient subgroup, including, age, sex, or other risk factors associated with ADHD?
KQ1d. What are the adverse effects associated with being labeled correctly or incorrectly as having ADHD?

The gold standard or reference standard against which diagnostic tools were compared was diagnosis by a mental health specialist, such as a psychologist, psychiatrist or other care provider. In many cases, clinicians used published scales or semi-structured diagnostic interviews to ensure a well-validated and reliable process of confirming the diagnosis of ADHD according to the Diagnostic and Statistical Manual of Mental Disorders (DSM), as outlined in more detail in the evidence table. Many identified studies included a broader age range rather than differentiating clearly between younger (KQ1a) or older (KQ1b) than seven years of age. Hence, we added a section describing the results for parental ratings, teacher ratings, clinician tools, and biomarkers before addressing the Key Questions. The section summarizes results by test and most studies evaluated a combined sample of children and adolescents. The KQ1a section describes all diagnostic approaches for children younger than seven years of age regardless of the applied test. The KQ1b section describes EEG, imaging, and executive function tests for children seven and up.

4.1. KQ1, ADHD Diagnosis Key Points

Key points pertaining to the diagnosis of ADHD are as follows.

Multiple approaches showed promising diagnostic performance (e.g., using parental rating scales), but estimates of performance varied considerably across studies, and the strength of evidence (SoE) was generally low.
Diagnostic test performance likely depends on whether youth with ADHD are being differentiated from typically developing children or from clinically referred children who had some kind of mental health or behavioral issue.
Rating scales for parent, teacher, or self-assessment as a diagnostic tool for ADHD have high internal consistency but poor to moderate reliability between raters, indicating that obtaining ratings from multiple informants (the youth, both parents, and teachers) may be valuable to inform clinical judgement.
Studies evaluating neuropsychological tests of executive functioning (e.g., Continuous Performance Test) used study-specific combinations of individual cognitive measures, making it difficult to compare performance across studies.
Diagnostic performance of biomarkers, EEG, and magnetic resonance imaging (MRI) scans show great variability across studies and their ability to aid clinical diagnosis for ADHD remains unclear. Studies have rarely assessed test-retest reliability, no findings have been replicated prospectively using the same measure in independent samples, and real-world effectiveness studies of diagnostic performance have not been conducted.
Very few studies have assessed performance of diagnostic tools for ADHD in children under the age of 7 years and more research is needed.
The identified diagnostic studies did not assess the adverse effects of being labeled correctly or incorrectly as having a diagnosis of ADHD.

4.2. KQ1, ADHD Diagnosis Summary of Findings

We identified 231 studies addressing the performance of tests aiming to diagnose ADHD.¹⁸^, ²¹^, ²⁴^, ²⁷^, ²⁸^, ¹¹¹^, ¹¹²^, ¹¹⁵^, ¹¹⁷^, ¹¹⁹^–¹²¹^, ¹²⁴^, ¹³⁴^, ¹³⁵^, ¹⁴⁰^–¹⁴³^, ¹⁵²^, ¹⁵³^, ¹⁵⁷^, ¹⁵⁹^, ¹⁶²^, ¹⁶⁷^–¹⁷⁰^, ¹⁷²^, ¹⁷⁷^, ¹⁷⁹^, ¹⁸¹^–¹⁹²^, ¹⁹⁷^, ¹⁹⁸^, ²¹⁰^, ²¹¹^, ²¹³^, ²¹⁴^, ²¹⁸^, ²²³^, ²³⁰^, ²³¹^, ²³³^, ²³⁴^, ²³⁷^, ²⁴¹^, ²⁴²^, ²⁴⁴^–²⁴⁶^, ²⁵¹^, ²⁵³^, ²⁶⁰^, ²⁶³^, ²⁶⁷^, ²⁷⁶^, ²⁷⁷^, ²⁸²^–²⁸⁵^, ²⁸⁷^, ²⁹³^, ²⁹⁷^–³⁰¹^, ³⁰³^, ³⁰⁷^, ³⁰⁹^, ³¹¹^, ³¹²^, ³¹⁴^–³¹⁶^, ³¹⁹^, ³²²^, ³²³^, ³²⁷^, ³³¹^, ³³⁶^, ³³⁸^–³⁴⁰^, ³⁴²^, ³⁴⁴^, ³⁴⁶^, ³⁴⁷^, ³⁵¹^, ³⁵²^, ³⁵⁵^, ³⁵⁶^, ³⁵⁹^, ³⁶²^, ³⁶⁵^, ³⁶⁶^, ³⁶⁹^, ³⁷⁰^, ³⁷⁹^, ³⁸²^, ³⁸⁵^, ³⁸⁸^–³⁹¹^, ³⁹³^–³⁹⁵^, ³⁹⁷^, ⁴⁰⁰^–⁴⁰⁵^, ⁴⁰⁷^, ⁴⁰⁸^, ⁴¹²^, ⁴¹³^, ⁴¹⁵^–⁴¹⁷^, ⁴²⁰^–⁴²⁴^, ⁴²⁷^, ⁴²⁹^, ⁴³⁴^, ⁴³⁶^–⁴³⁸^, ⁴⁴⁵^–⁴⁵⁰^, ⁴⁶²^–⁴⁶⁵^, ⁴⁶⁷^–⁴⁷⁰^, ⁴⁷³^, ⁴⁷⁵^, ⁴⁷⁷^, ⁴⁷⁹^, ⁴⁸²^, ⁴⁸⁶^, ⁴⁸⁷^, ⁴⁹¹^, ⁴⁹³^–⁴⁹⁶^, ⁴⁹⁸^–⁵⁰²^, ⁵⁰⁶^, ⁵¹⁴^–⁵¹⁶^, ⁵¹⁸^, ⁵¹⁹^, ⁵²⁴^, ⁵²⁷^, ⁵²⁸^, ⁵³⁶^, ⁵³⁷^, ⁵⁴¹^–⁵⁴³^, ⁵⁴⁶^–⁵⁴⁹^, ⁵⁵³^, ⁵⁵⁸^, ⁵⁵⁹^, ⁵⁶³^, ⁵⁶⁴^, ⁵⁶⁶^, ⁵⁷⁰^, ⁵⁷¹^, ⁵⁷⁶^, ⁵⁸⁰^–⁵⁸⁴^, ⁵⁸⁷^, ⁵⁹¹^, ⁵⁹²^, ⁵⁹⁹^, ⁶⁰⁰^, ⁶⁰³^, ⁶⁰⁵^, ⁶⁰⁷^, ⁶¹⁴^, ⁶¹⁵^, ⁶²⁵^, ⁶²⁷^, ⁶³⁰^–⁶³³^, ⁶³⁵^, ⁶³⁸^, ⁶³⁹^, ⁶⁴¹^, ⁶⁴²^, ⁶⁴⁴^, ⁶⁴⁷ The methodological rigor and the reporting varied substantially in the identified studies. The potential for risk of bias in the studies is documented in Figure 5. The critical appraisal for the individual studies is in Appendix D.

Figure 5

Risk of bias in Key Question 1 ADHD diagnostic studies. Notes: ADHD = attention deficit hyperactivity disorder

Selection bias was likely present in two thirds of studies. Often samples were restricted and did not necessarily represent the full range of children with ADHD. For example, studies explicitly reported using a convenience sampling strategy. Index test issues were present in ten percent of studies. Although the review was restricted to studies reporting a clinical diagnosis of ADHD for participants, reference standard issues were also present in a small number of studies, in particular due to lack of details on procedures and/or diagnosticians.¹¹¹^, ¹⁴²^, ²³³^, ³⁴²^, ⁴⁰⁵^, ⁴¹²^, ⁴⁵⁰^, ⁵¹⁶^, ⁵⁵³^, ⁶⁴² Flow and timing was rated as high risk of bias in several studies.¹¹¹^, ¹²¹^, ¹⁴³^, ¹⁶²^, ¹⁷²^, ³¹²^, ³¹⁹^, ³⁵¹^, ³⁷⁹^, ⁵⁰¹ Typically this was due to an unclear participant flow (e.g., it was unclear whether the diagnosis was known before the results of the index test was known).

We also assessed possible applicability issues that could influence the generalizability of the reported data. Figure 6 shows the summary of rated applicability. The applicability for the individual studies is in Appendix D.

Figure 6

Key Question 1 applicability rating. Notes: N/A = Not applicable

In several studies, samples were employed that do not represent the general population of children with ADHD, usually because children with co-morbidities were excluded. In addition, several papers took place in specialty care settings with diagnostic and treatment options that go beyond the standard course of action for children with ADHD.

4.3. Summary ADHD Diagnosis by Tests for All Age Groups

We broadly differentiated between parental ratings, teacher ratings, tools for clinicians, teen self-reports, neuropsychological tests, imaging, EEG, biomarker, activity markers, and other (e.g., electrocardiogram [EKG] indicators). Studies evaluated a large number of different tools within the broader categories. In addition, where studies used the same diagnostic tool (e.g., a rating scale), authors used different components of the tool (e.g., specific subscales) or combined components in a variety of ways (e.g., different neuropsychological parameter). We identified 68 studies that used machine learning algorithms to determine the best diagnostic approach.²⁸^, ¹¹⁵^, ¹²⁰^, ¹²¹^, ¹⁴³^, ¹⁵²^, ¹⁵⁷^, ¹⁷²^, ¹⁷⁹^, ¹⁸¹^, ¹⁸²^, ¹⁸⁵^–¹⁸⁸^, ¹⁹¹^, ²¹¹^, ²¹⁴^, ²²³^, ²³³^, ²³⁴^, ²⁴⁵^, ²⁵³^, ²⁸²^, ²⁸³^, ²⁹⁹^, ³⁰³^, ³²²^, ³²³^, ³⁴⁰^, ³⁵⁵^, ³⁵⁶^, ³⁶⁹^, ³⁷⁰^, ³⁸⁸^, ³⁹⁴^, ⁴⁰⁰^, ⁴⁰²^, ⁴⁰³^, ⁴⁰⁷^, ⁴⁰⁸^, ⁴¹²^, ⁴²⁰^, ⁴²⁹^, ⁴³⁴^, ⁴³⁸^, ⁴⁴⁹^, ⁴⁵⁰^, ⁴⁶⁷^, ⁴⁶⁸^, ⁴⁷³^, ⁴⁹⁴^, ⁴⁹⁵^, ⁵¹⁸^, ⁵⁴¹^, ⁵⁴³^, ⁵⁷¹^, ⁵⁸¹^, ⁵⁸²^, ⁵⁹¹^, ⁵⁹²^, ⁵⁹⁹^, ⁶⁰³^, ⁶³⁰^–⁶³³^, ⁶⁴¹ Studies were published since 2012²⁸ and came from 21 different countries, but primarily the United States²⁸^, ¹⁵²^, ²²³^, ²³³^, ²³⁴^, ²⁸²^, ²⁹⁹^, ³²³^, ⁴⁰⁰^, ⁴⁰³^, ⁴¹²^, ⁴⁶⁷^, ⁴⁹⁵^, ⁵¹⁸^, ¹¹⁸⁸ and China.¹⁸⁵^, ¹⁸⁷^, ¹⁸⁸^, ¹⁹¹^, ³⁹⁴^, ⁴⁰⁷^, ⁴⁰⁸^, ⁵⁷¹^, ⁵⁸¹^, ⁶³⁰^, ⁶³²^, ⁶⁴¹ A third of identified studies used EEG markers as the data source¹¹⁵^, ¹²⁰^, ¹⁴³^, ¹⁵⁷^, ¹⁷²^, ¹⁷⁹^, ¹⁸⁷^, ¹⁸⁸^, ³²²^, ³⁴⁰^, ³⁷⁰^, ³⁹⁴^, ⁴¹²^, ⁴³⁸^, ⁴⁴⁹^, ⁴⁶⁸^, ⁴⁷³^, ⁴⁹⁴^, ⁵⁹²^, ⁸⁸³ with another third of the studies using MRI ¹⁹¹^, ²⁸²^, ⁴⁹⁵^, ⁵¹⁸^, ⁵⁷¹^, ⁵⁸¹^, ⁶³⁰^, ⁶³³^, ¹¹⁸⁸ The remaining studies used neuropsychological test components, rating scale scores, activity estimates, or other sources. Some studies were able to achieve 100 percent sensitivity with the help of machine learning (corresponding specificity 100%)¹⁴³^, ¹⁵² Other studies maximized specificity, and some achieved 100 percent specificity in machine learning supported diagnostic models (corresponding sensitivities 100, 97, 75, 98, and 100% respectively).¹²¹^, ¹⁴³^, ¹⁵²^, ³⁷⁰^, ⁴⁵⁰ Across machine-learning supported studies, accuracy ranged from 61 percent²⁸² to 100 percent.¹⁴³^, ¹⁵²^, ⁴⁶⁸

Given that most studies included younger (typically 5- and 6-year-olds) and older children, the following section describes diagnostic tools relevant to all age groups. Some studies evaluated more than one test (e.g., a parental rating and a teacher rating).

4.3.1. Parental Ratings

We identified 59 studies using Parental ratings to diagnose ADHD.¹⁸^, ¹¹⁷^, ¹³⁴^, ¹⁶⁸^, ¹⁶⁹^, ¹⁹⁰^, ²¹⁸^, ²²³^, ²³⁰^, ²³³^, ²³⁴^, ²⁴¹^, ²⁴²^, ²⁴⁴^, ²⁵¹^, ²⁶³^, ²⁸⁵^, ²⁸⁷^, ²⁹⁷^, ³⁰⁰^, ³⁰¹^, ³¹¹^, ³¹⁴^, ³³¹^, ³³⁶^, ³³⁹^, ³⁴²^, ³⁴⁴^, ³⁵⁹^, ³⁶²^, ³⁹⁰^, ³⁹¹^, ⁴²³^, ⁴²⁴^, ⁴²⁷^, ⁴⁴⁷^, ⁴⁴⁸^, ⁴⁶³^, ⁴⁶⁴^, ⁴⁸²^, ⁴⁸⁷^, ⁴⁹¹^, ⁴⁹⁸^, ⁵⁰²^, ⁵¹⁴^–⁵¹⁶^, ⁵¹⁹^, ⁵²⁷^, ⁵²⁸^, ⁵⁴⁷^, ⁵⁵³^, ⁵⁵⁸^, ⁵⁵⁹^, ⁵⁸⁴^, ⁵⁸⁷^, ⁶⁰⁵^, ⁶³⁸^, ⁶⁴² The earliest study meeting inclusion criteria was published in 1985.⁵¹⁴ Evaluations of parental rating tools came from five different English-language speaking countries, but most studies were from the United States.¹³⁴^, ¹⁶⁹^, ¹⁹⁰^, ²³⁰^, ²³³^, ²³⁴^, ²⁴¹^, ²⁴²^, ²⁴⁴^, ²⁵¹^, ²⁶³^, ²⁸⁵^, ²⁹⁷^, ²⁹⁹^, ³¹¹^, ³³¹^, ³³⁶^, ³³⁹^, ³⁴²^, ³⁴⁴^, ³⁵⁹^, ³⁹⁰^, ³⁹¹^, ⁴²³^, ⁴²⁴^, ⁴²⁷^, ⁴⁴⁸^, ⁴⁶³^, ⁴⁶⁴^, ⁴⁸²^, ⁴⁸⁷^, ⁴⁹¹^, ⁴⁹⁸^, ⁵⁰²^, ⁵¹⁴^–⁵¹⁶^, ⁵¹⁹^, ⁵²⁷^, ⁵²⁸^, ⁵⁴⁷^, ⁵⁵³^, ⁵⁵⁸^, ⁵⁵⁹^, ⁵⁸⁴^, ⁶⁰⁵^, ⁶³⁸^, ⁶⁴²The populations studied were predominately males and included participants ranged between the ages of two and 18. Four studies exclusively included children younger than seven years old.³³¹^, ⁵¹⁶^, ⁵¹⁹^, ⁵⁵⁹ For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined or inattentive presentations. In one study focusing on preschool age children who presented with disruptive behavior disorders, 57 percent of participants were diagnosed with the hyperactive/impulsive presentation.³³¹ While ADHD participants with co-occurring disorders were not excluded from most studies, only a few purposely included children with specific co-occurring disorders such as disruptive behavior disorders³³¹ or autism.²³⁴^, ⁴⁴⁷ However, about half of identified studies came from clinical samples, rather than general neurotypically developing children–- i.e., they identified children undergoing a diagnostic workup for a potential diagnosis of ADHD, conduct disorders, autism, or depression.

In half of the identified studies, White participants made up more than 70 percent of the sample. One study evaluated diagnostic accuracy a sample in which over 50 percent of participants were Black/African American,⁴⁶²^, ⁵³⁶ and one study was identified in which 85 percent of participants were Hispanic or Latino.⁵⁵³ Studies reported predominantly on the estimated sensitivity and specificity. Some studies also reported on the area under the curve (AUC) as a summary test performance, but other key outcomes were less frequent. Figure 7 plots the sensitivity and specificity for the parental rating scale evaluated in the study.

Figure 7

Sensitivity and specificity of parental rating scales. Notes: Evaluated tools: ADHD-RS-IV, ADHD-SC4-P, ADHD-SRS-H, ADHD-SRS-Im, ARS, BASC-2-EF, BASC-3, BASC-PRS, BRIEF, BRIEF2, BRIEF-P+BRIEF-T+DKEFS, CBCL, CBCL-A, CBCL-AD/H, CBCL-Ag, CBCL-SP, Conner’’, (more...)

The studies reporting sensitivity and specificity (the measures are not independent from each other, and high sensitivity can come at a cost of low specificity and vice versa) show the wide variation in diagnostic accuracy estimates. The figure also shows that studies evaluated a large range of different parental rating scales, with few studies reporting on the same tool.

The most frequently evaluated diagnostic tool was the CBCL (Child Behavior Checklist), either alone or in combination with other scales, using different cutoffs, and evaluating different subscales (the attention deficit/hyperactivity problems subscale most frequently). Reported sensitivity for the CBCL ranged from 71 percent in a study differentiating ADHD and oppositional defiance disorder³³¹ to 84 percent in two studies, one using an outpatient pediatric medical clinic, the other one a sample of children with traumatic brain injury.¹⁹⁰^, ⁶⁰⁵ Reported specificity for this parental scale ranged from 33 percent⁵⁸⁷ to 93 percent¹⁹⁰ in the pediatric medical clinic sample. The reported AUC ranged from 0.55³⁴⁴ to 0.93¹⁹⁰ with three independent studies reporting estimates of 0.83 or 0.84 for this diagnostic measure for the CBCL.²⁵¹^, ³³¹^, ⁴⁹⁸ The evidence table in the appendix shows the results for all diagnostic and psychometric outcomes of interest for all identified studies.

Table 3 shows the findings for the outcomes of interest together with the number of studies and study identifiers for parental rating scales. For the main results, we report findings from population samples that differentiated ADHD from neurotypical developing children separately from results obtained in clinical samples, given that the study population was identified as one of the sources of heterogeneity in reported results as documented in KQ1c. Results are shown across studies and tools for the main analyses. Where at least two different author groups reported on the same rating scale, we provide results for a specific scale.

Table 3

KQ1 summary of findings and strength of evidence for parental ratings.

Parental ratings reported mainly on the sensitivity and specificity. A few studies reported perfect diagnostic performance for parental ratings for either sensitivity or specificity, but not both together. Little information was provided in these diagnostic studies regarding the reliability of the measures given the large range of different measures evaluated by study authors. We downgraded the strength of evidence for study limitation (lack of detailed reporting), imprecision (large variation in reported diagnostic performance) and for inconsistency (when consistency could not be assessed because no study was identified, or only one study was identified reporting on the test and outcome of interest and results have not been replicated by another author group, or only limited data points were available). None of the included studies provided information on the effect of misdiagnosis. None of the identified studies reported the costs associated with obtaining parental ratings.

4.3.2. Teacher Ratings

We identified 23 studies using Teacher ratings to diagnose ADHD.¹⁸^, ¹¹⁹^, ¹⁸³^, ²¹⁸^, ²⁴²^, ²⁹⁹^, ³⁰¹^, ³¹⁴^, ³⁴²^, ³⁵⁹^, ³⁶²^, ³⁹¹^, ⁴⁶³^, ⁴⁷⁹^, ⁴⁸²^, ⁴⁹¹^, ⁵¹⁹^, ⁵²⁷^, ⁵²⁸^, ⁵⁵⁸^, ⁵⁵⁹^, ⁵⁸⁷^, ⁶⁴² The earliest study meeting eligibility criteria was published 1998⁴⁷⁹ from four different English-speaking countries, primarily the United States.²⁴²^, ²⁹⁹^, ³⁴²^, ³⁵⁹^, ³⁹¹^, ⁴⁶³^, ⁴⁷⁹^, ⁴⁸²^, ⁴⁹¹^, ⁵¹⁹^, ⁵²⁷^, ⁵²⁸^, ⁵⁵⁸^, ⁵⁵⁹^, ⁶⁴² The populations studied were predominately males between the ages of three and 18. Two studies exclusively included children younger than seven years old⁵¹⁹^, ⁵⁵⁹ and two exclusively in children eight years or older.¹¹⁹^, ³⁵⁹ For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined or inattentive presentations. Almost all of the studies mention race and ethnicity demographics, with 14 studies where White participants made up greater than 70 percent of the sample, and one study in which over 85 percent of the participants were Black/African American.

ADHD participants with co-occurring disorders were not excluded from most of the studies. Studies were divided into clinical samples and those recruited from a less selective population. None of the studies included children who all had a dual diagnosis, such as ADHD and conduct disorder.

Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Figure 8 plots the reported sensitivity and specificity for teacher rating scales.

Figure 8

Sensitivity and specificity of teacher rating scales. Notes: Evaluated tools: ADHD-RS, ADHD-RS-IV-I, ADHD-SC4-T, BASC-2-EF, BASC-3, BRIEF, BRIEF-P+BRIEF-T+DKEFS, CTRS, CTRS-R, DBD, ECI-4, SNAP-IV, TRF, TRF+Conners-3-T(S), TRF-A, TRF-Ag, WMRS. More information (more...)

The figure shows the large range in reported sensitivity and specificity. It also shows that studies have evaluated many different teacher rating tools.

The Teacher Report Form, alone or in combination with Conners teacher rating scales, and using the total or the subscale of attention problems, was evaluated in more than one study.²⁴²^, ³⁰¹^, ³⁴²^, ⁵⁸⁷ Reported sensitivity ranged from 72 percent³⁰¹ to 79 percent.⁵⁸⁷ Reported specificity estimates ranged from 64 percent⁵⁸⁷ to 76 percent.²⁴² Two of the studies reported on AUC and found 0.65³⁴² for the attention problem subscale and 0.77³⁰¹ in combination with the Conners 3 teacher short form. No two studies reported on rater agreement, internal consistency, or test-retest reliability for the same teacher rating scale.

Table 4 shows the findings for the outcomes of interest together with the number of studies and study identifiers.

Table 4

KQ1 summary of findings and strength of evidence for teacher ratings.

Across all teacher rating studies, reported sensitivity in individual studies were up to 97 percent in a clinical sample, but the corresponding specificity was only 26 percent.³¹⁴ We downgraded the strength of evidence for imprecision (large variation in reported diagnostic performance) and for inconsistency (when consistency could not be assessed because only one study was identified reporting on the test and outcome of interest and results had not been replicated by another author group). Identified diagnostic accuracy studies did not report on several of the other key outcomes.

4.3.3. Teen/Child Self-Reports

We identified six studies using teen/child self-reports to diagnose ADHD.¹⁴²^, ¹⁶⁸^, ²³¹^, ²⁹⁷^, ⁴⁹¹^, ⁵⁰⁶ The earliest study was published in 2002⁵⁰⁶ and data came from two countries, the United States²³¹^, ²⁹⁷^, ⁴⁹¹ and Canada,¹⁴²^, ¹⁶⁸^, ⁵⁰⁶ respectively. Self-reports were primarily completed by adolescents, however one study provided a research assistant to help read the questions for participants under 11 years old.²⁹⁷ Only one study documented the ADHD presentation: 10 percent inattentive presentation, 4 percent hyperactive/impulsive presentation, and 25 percent combined presentation.⁴⁹¹ Two studies mentioned race and ethnicity demographics. In one study, White participants made up 61 percent of the sample²⁹⁷ and one study reported 89 percent of the participants were Black/African American.⁴⁹¹

Studies reported a limited number of outcomes, with sensitivity, specificity, and AUC being the most frequently reported outcomes. No two identified studies reported on the same self-report measure. Reported diagnostic success varied widely. Table 5 shows the findings for the outcomes of interest together with the number of studies and study identifiers. None of the tools was evaluated in more than one study.

Table 5

KQ1 summary of findings and strength of evidence for self reports.

The reported diagnostic performance of teen self-reports was limited. We downgraded for the domain inconsistency (inability to judge the consistency across studies because only one study was identified reporting on the test and outcome of interest). In several cases, our searches identified no studies and the strength of evidence is insufficient for the outcome.

4.3.4. Combined Ratings

We identified 13 studies that assessed the diagnostic performance of ratings combined across informants.¹⁸^, ¹⁸⁹^, ²⁷⁷^, ²⁹⁷^, ³⁰³^, ⁴⁰⁵^, ⁴⁶⁷^, ⁴⁷⁹^, ⁵²⁷^, ⁵⁴⁸^, ⁵⁵⁹^, ⁵⁷⁰^, ⁶⁰⁰ The studies compared the information from multiple raters to the reference standard. Studies combined information sources in different ways, often selecting individual variable with the help of machine learning. Only one of these studies compared the performance when combining data from multiple informants to that of single informants: it found negligible improvement when combining youth self-report to the parent report alone using an adaptive testing questionnaire (AUC youth only 0.71; parent only 0.85; combined 0.86) in a treatment-seeking population.²⁹⁷

The studies reported only on selected accuracy measures. One study combined parent and teacher ratings on the Conners scales by requiring youth to meet diagnostic cutoffs (T-score ≥65) in one setting and substantial symptoms in the other setting (T-score ≥60). It reported a diagnostic sensitivity of 84 percent and specificity of 36 percent for the combined rating when distinguishing ADHD from other clinically referred youth.¹⁸ One study reported findings from a discriminant function analysis of mother, father, and teacher ratings on the Conners scale when distinguishing ADHD youth who were considered either intellectually gifted or not from typically developing, intellectually gifted youth. It found that the discriminant function using all three informants distinguished the typically developing youth from the two ADHD groups but did not distinguish the two ADHD groups from one another.²⁷⁷ A study in four to seven year old children used machine learning to combine parent and teacher ratings on the BRIEF in distinguishing youth with ADHD from typically developing controls. It reported an average diagnostic accuracy of 0.93, with teacher ratings being the most informative in the machine learning algorithm, though it did not formally compare accuracy for combined informants with accuracy for either informant alone. The study also found that the addition of neuropsychological test measures and cortical thickness measures to the machine learning algorithm did not meaningfully improved diagnostic performance over use of the BRIEF alone.⁴⁶⁷ The best AUC was reported by a machine learning supported study combining parent and teacher ratings (AUC 0.98).⁴⁰⁵

The studies did not report reliability measures for ratings combined across informants; studies reported only psychometric performance in individual informant groups. For example, one of the studies reported that individual ratings of the BRIEF using parent and teacher ratings found intraclass correlation coefficients (ICCs) from 0.31 to 0.59 across subscales.⁵⁷⁰Another study reported the range of Cronbach’s alpha estimates across teacher and parent ratings for individual scales, all indicating substantial internal consistency (with the lowed Cronbach’s also of 0.72, all other values were above 0.90).⁴⁶⁷

4.3.5. Clinician Tools

We identified 24 of studies evaluating additional tools that could be used by clinicians or the healthcare system (beyond neuropsychological tests; parent, teacher, or self-report ratings; biomarkers; or imaging) to aid the diagnosis of ADHD.²⁷^, ¹²¹^, ¹⁶⁷^, ¹⁸¹^, ²⁹⁸^, ²⁹⁹^, ³¹¹^, ³³⁸^, ³⁵⁵^, ³⁶²^, ³⁸⁵^, ³⁸⁸^, ³⁸⁹^, ⁴⁰⁰^, ⁴⁰³^, ⁴⁰⁷^, ⁴¹⁶^, ⁴¹⁷^, ⁴³⁴^, ⁴³⁷^, ⁴⁹⁹^, ⁵⁴²^, ⁵⁶⁶^, ⁶²⁷ The earliest identified study was published in 2009.⁶²⁷ Evaluations were published in three different countries, including eight from the United States.²⁷^, ²⁹⁹^, ³¹¹^, ³⁸⁹^, ⁴⁰⁰^, ⁴⁰³^, ⁵⁴²^, ⁵⁶⁶ The populations studied were predominately males and included youth were between the ages of three and 18. Most studies did not distinguish between ADHD presentations but three studies restricted to the combined ADHD type.¹²¹^, ⁴¹⁶^, ⁶²⁷ Where studies mentioned race and ethnicity demographics of the sample composition, the percentage of White children ranged from 52 to 100 percent, the number of Black or African American children ranged from two to 44 percent, Hispanic/Latino children three to 20 percent, and Asian children one to three percent.

Studies used different tools, including diagnostic interview guides and observation tools. Several studies measured child activity levels as an objective test, for example through an actometer or commercially available activity tracker¹²¹^, ¹⁸¹^, ²⁹⁸^, ³⁵⁵^, ⁴⁰⁰^, ⁴⁰³^, ⁴¹⁶^, ⁴³⁷^, ⁶²⁷ and two evaluated direct observation as a diagnostic tool.¹⁶⁷^, ³⁶² Three studies used insurance claim-based algorithms or medical health record indicators⁴³⁴^, ⁵⁴²^, ⁵⁶⁶ The remaining studies addressed unique interventions and questions, for example, one study focused on the clinical utility of International Classification of Diseases [ICD]-11 diagnostic guidelines⁴⁹⁹ and a clinician diagnosis combined with an assessment aid that involved integrating EEG and theta/beta ratio data.²⁷

Studies are difficult to compare since they assess different tools and approaches. Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Table 6 shows the findings for the key outcomes of interest together with the number of studies and study identifiers. Where all identified studies evaluated the same tool, the first column of the study indicates the tool, otherwise estimates are reported across all tools.

Table 6

KQ1 summary of findings and strength of evidence for clinician tools.

We downgraded the strength of evidence for imprecision (very large variation in reported diagnostic performance) and for inconsistency (when consistency could not be assessed because only one study was identified reporting on the test, and outcome of interest and results had not been replicated by another author group). The tools were difficult to compare and answered study-specific questions.

4.3.6. Biomarkers

We identified seven studies using proposed biomarkers obtained from biospecimen to diagnose ADHD.³⁰⁹^, ⁵⁰¹^, ⁵⁶³^, ⁵⁸³^, ⁶⁰³^, ⁶³⁵^, ⁶⁴⁴ EEG and imaging approaches are reported in section 4.3.7 and the evidence table (Appendix C, Table C.1.) shows additional, more unique approaches using other approaches such as eye movement tracking to diagnose ADHD. Five identified studies used blood measures, including membrane potential ratio⁵⁶³ and erythropoietin/erythropoietin receptor,³⁰⁹ and three of these studies analyzed miRNA obtained from blood samples.⁶⁰³^, ⁶³⁵^, ⁶⁴⁴ The other studies evaluated urine indicators.⁵⁰¹^, ⁵⁸³ The earliest identified study was published in 2007.⁵⁰¹ Evaluations were published in five different countries, including one from the United States.⁵⁶³

The populations studied were predominately males between the ages of six and 17. Most studies required participants to not be taking stimulant medication. For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined presentation.⁵⁶³^, ⁶³⁵^, ⁶⁴⁴ Only two studies mentioned race and ethnicity demographics, one where all of the participants were Han Chinese⁶⁰³ and the other where the majority of participants were Black/African American.⁵⁶³ None of the studies used a clinical sample or children with a consistent co-morbidity.

Table 7 shows the findings for the outcomes of interest together with the number of studies and study identifiers. Given the clinical diversity of the biomarkers (e.g., differences in invasiveness and technological requirements of tests), we include results across all biospecimen evaluations, blood markers, miRNA specifically, and urine indicators where more than one study was identified that reported on the outcome.

Table 7

KQ1 summary of findings and strength of evidence for biomarkers.

Biomarker studies reported mainly on sensitivity and specificity. Selected studies achieved very high sensitivity.³⁰⁹ Little information was provided in the studies regarding the reliability of the markers or combinations of markers. None of the included studies provided information on the effect of misdiagnosis. None of the identified studies reported the costs associated with analyzing biomarkers.

4.3.7. EEG

We identified 45 studies using EEG markers to diagnose ADHD.²⁷^, ¹¹¹^, ¹¹⁵^, ¹²⁰^, ¹²⁴^, ¹⁴³^, ¹⁵⁷^, ¹⁷²^, ¹⁷⁹^, ¹⁸²^, ¹⁸⁶^–¹⁸⁹^, ¹⁹²^, ¹⁹⁷^, ²⁴⁵^, ³¹²^, ³²²^, ³⁴⁰^, ³⁵¹^, ³⁵⁶^, ³⁶⁵^, ³⁶⁶^, ³⁷⁰^, ³⁹⁴^, ³⁹⁵^, ³⁹⁷^, ⁴⁰⁴^, ⁴⁰⁸^, ⁴¹²^, ⁴¹³^, ⁴¹⁵^, ⁴²⁰^, ⁴³⁸^, ⁴⁴⁹^, ⁴⁶⁵^, ⁴⁶⁸^, ⁴⁷³^, ⁴⁸⁷^, ⁴⁹⁴^, ⁵⁴⁶^, ⁵⁴⁸^, ⁵⁹²^, ⁶⁴¹ The earliest identified study was published in 2003.⁵⁴⁶ EEG evaluations were published in 17 different countries, primarily Iran and China, with four studies published in the United States.²⁷^, ⁴¹²^, ⁴⁸⁷^, ⁵⁴⁸ The populations studied were predominately males between the ages of six and 17, with only three studies including children as young as four years old.¹⁵⁷^, ³⁴⁰ One study included only female participants,¹⁹⁷ and seven studies included only males.¹¹¹^, ¹⁷⁹^, ⁴¹²^, ⁴¹³^, ⁴⁴⁹^, ⁴⁶⁸^, ⁴⁷³ In several studies, participants were required to demonstrate an IQ of 80 or higher and almost half of the studies required that participants not take stimulant medication or stop medication several days before testing. For studies that distinguished between ADHD presentations, most focused on the combined and inattentive presentations. Race and ethnicity demographics were not mentioned in most studies.

While ADHD participants with co-occurring disorders were not excluded from most studies, only a few studies purposely included specific co-occurring disorders to evaluate the diagnostic test performance in children with co-occurring conduct disorder or other behavioral disorders.¹⁴³ The large majority of studies had unselected samples, i.e., comparing children with ADHD to neurotypical developing children.

Studies used EEG signals obtained during a resting state with eyes closed, eyes open, while performing neuropsychological tests, and/or recording event-related potentials. Studies varied in the reported detail (e.g., number of electrodes, channels, frequency and duration of the recording); the documented information is shown in the evidence table in the appendix. Two thirds of studies used machine learning algorithms to select parameter for classification. Several studies explicitly reported combining EEG data with specific demographic variables or rating scale results.²⁷^, ¹²⁴^, ¹⁴³^, ¹⁸⁹^, ¹⁹²^, ³¹²^, ³⁵¹

Table 8 shows findings for the outcomes of interest together with the number of studies and study identifiers.

Table 8

KQ1 summary of findings and strength of evidence for EEG.

EEG studies predominantly reported accuracy estimates. Sensitivity in individual studies ranged widely from 46 percent¹⁹⁷ to perfect sensitivity (corresponding specificities 71%);¹⁴³^, ⁴¹³ the range was reduced in studies restricting to older children. Studies in clinical samples reported a reduced range of sensitivity and specificity compared to studies differentiating children with ADHD from neurotypically developing children, but the identified samples were either small or they augmented EEG predictions with demographic variables. Some studies combined EEG data with demographics; the achieved sensitivity was reported as 100 percent (corresponding specificity 100%) in one study.¹⁴³ We downgraded the strength of evidence for imprecision (large variation in performance across studies). In addition, we downgraded for study limitations as diagnostic approaches were often not well described. For some outcome measures, no study was identified that assessed it and determining the effects associated with the test was not possible.

4.3.8. Imaging

We identified 19 studies using neuroimaging.²⁸^, ¹⁹¹^, ²⁸²^, ³¹⁹^, ⁴⁰⁰^, ⁴⁶⁴^, ⁴⁶⁷^, ⁴⁹⁵^, ⁵¹⁸^, ⁵²⁴^, ⁵⁴⁹^, ⁵⁷¹^, ⁵⁸⁰^, ⁵⁸¹^, ⁵⁹¹^, ⁶³⁰^, ⁶³¹^, ⁶³³ Studies were predominantly published in the U.S. and China. A publicly available dataset (ADHD-200) produced numerous analyses.¹⁹¹^, ²⁸²^, ⁴⁹⁵^, ⁵⁸¹ The populations studied were predominately males between the ages of six and 17, with one study including only male participants.⁶³⁰ In several studies, participants were required to demonstrate an IQ of 80 or higher to be included in the sample.⁴⁹⁵^, ⁵⁴⁹^, ⁵⁷¹^, ⁶³⁰^, ⁶³¹ A quarter of the studies required participants not be taking stimulant medication or to stop medication several days before testing.⁵⁷¹^, ⁶³⁰^, ⁶³³ A third of the studies included only right-handed participants⁴⁰⁰^, ⁴⁹⁵^, ⁵⁷¹^, ⁶³⁰ In studies that distinguished between ADHD presentations, most focused on the combined and inattentive presentations. A minority specified including individuals with the hyperactive/impulsive presentation.¹⁹¹^, ²⁸²^, ⁵⁴⁹^, ⁶³³ Nearly all studies did not include race and ethnicity demographics.

While ADHD participants with co-occurring disorders were not excluded from most of the studies, no studies specifically assessed test performance in children with specific co-occurring disorders. One study differentiated children with ADHD from those with dyslexia.⁵²⁴ One evaluated the diagnostic performance of an algorithm differentiating ADHD from autism.²⁸² All studies used unselected, general samples, rather than clinical samples referred for further diagnostic workup (where a large proportion of children will either be diagnosed with ADHD, conduct disorders, autism, or depression).

All but two imaging studies used MRI to diagnose ADHD. However, studies utilized MRI in different ways. Some studies used functional MRI, some structural MRI, some used combinations of structural and functional MRI, with or without magnetic resonance spectroscopy. Two studies used near-infrared spectroscopy but the applications and diagnostic models differed.²¹¹^, ⁶³¹ Most of the imaging studies used a large number of indicators and utilized machine learning algorithms to detect markers to optimize the classifications. The reporting of the variable selection process varied, and it was often not clearly reported which exact indicators were included in the model used to determine diagnostic accuracy. Sone of the identified studies combined imaging parameter with demographic or other clinical data for the prediction model.¹⁹¹^, ²¹¹^, ²⁸²^, ⁴⁰⁰^, ⁴⁶⁷^, ⁴⁹⁵^, ⁶³¹^, ⁶³³

Reported diagnostic accuracy estimates varied widely. Table 9 shows the findings for the outcomes of interest, together with the number of studies and study identifiers. The table summarizing findings across all imaging studies, findings for MRI studies specifically, and imaging studies that combine imaging parameters with other variables (e.g., demographics) for predictions.

Table 9

KQ1 summary of findings and strength of evidence for neuroimaging.

Studies reported primarily on sensitivity, specificity, and accuracy. Across all neuroimaging studies, reported sensitivity varied widely. We downgraded the strength of evidence for imprecision (large variation in performance reported across studies). In addition, we downgraded for study limitations as the individual diagnostic models were often not well described and the number and type of predictor variables feeding into the model was unclear. For some outcomes, no study was identified, and it was not possible to determine the effects associated with the diagnostic modality. Some studies combined neuroimaging data and demographics, though the relevance is unclear, since the only demographic characteristic that is likely associated with a diagnosis of ADHD is sex, with a higher prevalence in males.

4.3.9. Neuropsychological Tests

We identified 74 studies using neuropsychological tests, assessing executive function and/or encompassing a variety of cognitive assessments, including continuous performance tests, to diagnose ADHD.¹⁸^, ²¹^, ²⁴^, ¹¹²^, ¹¹⁹^, ¹³⁵^, ¹⁴⁰^, ¹⁴¹^, ¹⁵²^, ¹⁵³^, ¹⁵⁹^, ¹⁶²^, ¹⁷⁰^, ¹⁷⁷^, ¹⁸⁴^, ¹⁸⁵^, ¹⁹⁰^, ¹⁹⁸^, ²¹³^, ²³⁷^, ²⁴⁶^, ²⁵³^, ²⁶³^, ²⁶⁷^, ²⁷⁶^, ²⁸⁴^, ²⁹³^, ²⁹⁸^, ³⁰⁷^, ³¹⁵^, ³¹⁶^, ³²³^, ³²⁷^, ³⁴⁶^, ³⁴⁷^, ³⁵¹^, ³⁵²^, ³⁷⁹^, ³⁸²^, ³⁹³^, ⁴⁰¹^, ⁴⁰²^, ⁴¹⁷^, ⁴²¹^, ⁴²²^, ⁴³⁶^, ⁴⁴⁵^, ⁴⁴⁶^, ⁴⁵⁰^, ⁴⁶²^, ⁴⁶⁷^, ⁴⁶⁹^, ⁴⁷⁰^, ⁴⁷⁵^, ⁴⁷⁷^, ⁴⁸²^, ⁴⁸⁶^, ⁴⁹³^, ⁴⁹⁶^, ⁵⁰⁰^, ⁵¹⁵^, ⁵³⁷^, ⁵⁴¹^, ⁵⁴³^, ⁵⁶⁴^, ⁵⁷⁶^, ⁶⁰⁷^, ⁶¹⁴^, ⁶¹⁵^, ⁶²⁵^, ⁶²⁷^, ⁶³²^, ⁶³⁹^, ⁶⁴⁷ Rating scales of executive function are described in the parent and teacher rating section in the beginning of the chapter.

The earliest study evaluating a neuropsychological tests as diagnostic tools was published in 1999⁴⁹⁶ and evaluations came from 18 different countries, primarily the United States. The populations studied were predominately males between the ages of six and 18. Three studies included three and four year old children.¹⁶²^, ³¹⁵^, ⁴⁶⁷ In several studies, participants were required to demonstrate an IQ of 70 or higher²⁴^, ³⁴⁶^, ³⁵²^, ³⁶⁵^, ⁴⁶⁷^, ⁴⁶⁹^, ⁵⁰⁰ with some studies requiring IQ to be at least 80²¹^, ¹⁵²^, ²⁵³^, ⁶⁴⁷ or 85.³⁷⁹^, ⁴⁴⁶^, ⁴⁸⁶ Two thirds of the studies required participants not take stimulant medication or stop medication several days before testing. For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined or inattentive presentations. About a third of the studies mentioned race and ethnicity demographics, with seven studies where White participants made up half or more of the sample,²¹^, ¹⁶²^, ¹⁷⁰^, ²⁶³^, ⁴⁶²^, ⁶⁰⁷ one study where all of the participants were Asian,³⁹³ one study where over 50 percent were Black/African American,⁴⁶² and one study where 83 percent of the participants were Hispanic or Latino.⁴⁶⁷

ADHD participants with co-occurring disorders were not excluded from most of the studies. Some studies used clinical samples with participants who were referred for diagnostic work-up where all children presented with attention issues or other symptoms indicative of ADHD or a different clinical diagnosis.²⁴^, ¹⁵³^, ¹⁶²^, ²⁶³^, ³¹⁵ One study specifically looked at distinguishing between children with ADHD, developmental dyslexia, and those who had both disorders.⁴⁴⁶ The remaining studies used samples of neurotypically developing children as controls rather than clinical samples.

ADHD participants with co-occurring disorders were not excluded from most of the studies. Some studies used clinical samples with participants who were referred for diagnostic work-up where all children presented with attention issues or other symptoms indicative of ADHD or a different clinical diagnosis²⁴^, ¹⁵³^, ¹⁶²^, ²⁶³^, ³¹⁵ One study specifically looked at distinguishing between children with ADHD, developmental dyslexia, and those who had both disorders.⁴⁴⁶ The remaining studies used samples of neurotypically developing children as controls rather than clinical samples.

Studies described a wide range of test batteries, but over 50 studies used continuous performance testing (CPT) to diagnose children and adolescents. CPTs provide multiple behavioral outputs relevant to ADHD, including omission errors (reflecting inattention), commission errors (reflecting impulsivity), and reaction time standard deviation (or reflecting moment-to-moment response variability). Studies varied in their use of traditional visual CPTs, such as the TOVA (Test of Variables of Attention), or more novel, multifaceted CPT approaches. These latter “hybrid” CPT paradigms included CPTs that combined auditory and visual attentional processing demands together in the same task, those that monitored physical movements during task administration, and virtual reality CPTs built upon environments designed to emulate real-world distractibility in a classroom setting. The included studies used idiosyncratic combinations of individual cognitive measures to achieve the best performance. However, multiple studies reported on attention and impulsivity measures included in the continuous performance tests.

Studies reported a variety of statistical parameters to determine the accuracy of the diagnostic approach. Sensitivity, specificity, and accuracy were the most frequently reported diagnostic measures. Table 10 shows the findings for the outcomes of interest together with the number of studies and study identifiers for all key outcomes. Where we found more than one study reporting on the same test or test component, the table also summarizes the performance for those, specifically.

Table 10

KQ1 summary of findings and strength of evidence for neuropsychological tests.

Studies evaluating neuropsychological tests reported predominantly on sensitivity and specificity. Although selected studies reported perfect diagnostic performance for neuropsychological tests,¹⁵² those studies reported the diagnostic performance for composite measures (unique and study-specific combinations of individual cognitive measures), making it difficult to compare test performance across studies. The wide range in performance was narrower in studies restricting to children seven and above. Reliability measures were rarely reported in the identified studies. No study addressed the effects of misdiagnosis. Costs were reported in only one study. We downgraded the strength of evidence for imprecision (large variation in performance reported across studies). For some outcome measures, no study was identified, and it was not possible to determine the effects associated with the test.

4.4. KQ1a. What is the comparative diagnostic accuracy of approaches that can be used in the primary care practice setting or by specialists to diagnose ADHD among individuals younger than 7 years of age?

We identified only 12 studies that reported exclusively on children younger than seven years of age.¹⁶²^, ¹⁶⁷^, ¹⁸⁹^, ³¹⁶^, ³³¹^, ⁴¹²^, ⁴¹⁶^, ⁴³⁷^, ⁴⁶⁷^, ⁵¹⁶^, ⁵¹⁹^, ⁵⁵⁹ The earliest identified study was published in 2002⁵⁵⁹ and data came from the United States, Portugal, Spain, The Netherlands, Germany, Taiwan, and New Zealand. The percent female ranged from none to 41 percent, where reported, and the proportion of Caucasian children ranged from 54 to 90 percent. We identified three studies that explicitly reported on diagnostic performance data collected in primary care.¹⁶²^, ⁴⁴⁵^, ⁶⁰⁵ Several studies used clinic populations of children referred for diagnostic purposes and children often presented with multiple co-occurring disorders.

Studies evaluated parent ratings, teacher ratings, combined ratings, activity, EEG, imaging, and neuropsychological tests. Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Sensitivity achieved in this age group reached up to 97 percent in a study evaluating the use of activity ratings,⁴¹⁶ while a study evaluated a continuous performance tests showed the lowest sensitivity (42%).¹⁸⁹ Reported specificity was 91 percent in a study using parental ratings to diagnose ADHD ³³¹, but EEG data achieved only a specificity of 38 percent.¹⁸⁹ Few of these diagnostic studies reported reliability measures. The results across studies for the key outcomes are shown in the summary of findings table at the end of the chapter, all other measures (where reported) are shown in the evidence table in the appendix. We did not identify any study reporting on the adverse effect following a misdiagnosis (not being diagnosed or being incorrectly diagnosed) in this age group. In addition, none of the diagnostic studies mentioned costs of tests in this subsample.

The summary of findings table at the end of this chapter shows the diagnostic performance in this young age group in more detail. The table summarizes the limited available evidence across identified studies, together with the strength of evidence. Strength of evidence was either low due to the limited evidence, or insufficient due to the lack of studies in this age group reporting on the outcomes of interest.

4.5. KQ1b. What is the comparative diagnostic accuracy of EEG, imaging, or approaches assessing executive function that can be used in the primary care practice setting or by specialists to diagnose ADHD among individuals aged 7 through 17?

We identified 61 studies that reported exclusively on children aged seven and older. The earliest identified study was published in 1989. Data came from 23 different countries, most frequently U.S. and Chinese studies. Six studies restricted to boys, but one study included 75 percent girls.⁴⁴⁶ The proportion of White children ranged from 44⁴⁶⁴ to 100¹¹² percent. The proportion of Hispanic or Latino children ranged from one⁶⁰⁷ to 20⁴⁰⁰ percent. The proportion of Black or African American children ranged from five³⁵⁹ to 34⁶⁰⁷ percent. The proportion of Asian children ranged from one⁵⁷⁰ to 100⁶⁴¹ percent. The proportion of multiracial youth (where reported) ranged from eight⁴⁰⁰ to 20⁴⁶⁴ percent.

Studies evaluated parent ratings, teacher ratings, combined ratings, teen/child self-report, continuous performance, executive functioning, activity, EEG, MRI imaging, and neuropsychological tests. Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Few of these diagnostic studies reported reliability measures. We did not identify any study reporting on the adverse effect following a misdiagnosis (not being diagnosed or incorrectly diagnosed) in this age group. In addition, none of the diagnostic studies mentioned costs of tests in this subsample. The results across studies for the key outcomes and interventions are shown in the summary of findings table at the end of the chapter, all other measures (where reported) and results for other interventions evaluated in this age group are shown in the Appendix C, Table C.1.

4.5.1. Diagnostic Accuracy of EEG in Youth Aged 7 Through 17

We identified 16 studies that used EEG to diagnose youth.¹¹¹^, ¹²⁰^, ¹⁷²^, ²⁴⁵^, ³¹²^, ³⁵¹^, ³⁷⁰^, ³⁹⁴^, ³⁹⁷^, ⁴⁰⁸^, ⁴³⁸^, ⁴⁴⁹^, ⁴⁶⁵^, ⁴⁹⁴^, ⁵⁴⁶^, ⁶⁴¹ The first study meeting eligibility criteria was published in 2003.¹¹¹^, ¹²⁰^, ¹⁷²^, ²⁴⁵^, ³¹²^, ³⁵¹^, ³⁷⁰^, ³⁹⁴^, ³⁹⁷^, ⁴⁰⁸^, ⁴³⁸^, ⁴⁴⁹^, ⁴⁶⁵^, ⁴⁹⁴^, ⁵⁴⁶^, ⁶⁴¹ Study locations included 11 different countries, with several studies being conducted in China³⁵¹^, ³⁹⁴^, ⁴⁰⁸^, ⁶⁴¹ and Iran²⁴⁵^, ⁴³⁸^, ⁴⁹⁴ The proportion of included girls ranged from none¹¹¹^, ⁴⁴⁹ to 56 percent³⁹⁴ Race and ethnicity was rarely reported, one study included 100% Asian youth.³⁵¹ The ADHD presentation was often not reported but where reported, but two studies reported two thirds of children with combined presentation³¹²^, ⁴⁶⁵ and one study restricted to inattentive ADHD³⁵¹ Studies did usually not exclude children with comorbidities but only one study specifically assessed the effect of ODD (oppositional defiant disorder) co-morbidity on diagnostic accuracy.³⁷⁰

Reported sensitivity, specificity, accuracy and AUC values ranged widely across studies as documented in the summary of findings table. Studies varied in how much detail they provided on the parameters that contributed to the diagnostic performance, which in combination with the wide range of reported diagnostic performance resulted in low strength of evidence statement for these outcomes of interest.

Studies did not report on rater agreement between EEG readers, internal consistency of measurements, or test-retest reliability. Identified studies also did not describe the impact of misdiagnosis and they did not mention costs. Hence, the evidence was determined to be insufficient for these outcomes of interest.

4.5.2. Diagnostic Accuracy of Imaging in Youth Aged 7 Through 17

We identified eight studies that used imagining for diagnosing in this age group, all evaluated the use of MRI.¹⁹¹^, ²⁸²^, ⁴⁰⁰^, ⁴⁶⁴^, ⁴⁹⁵^, ⁵¹⁸^, ⁵⁷¹^, ⁵⁸¹ The first studies meeting eligibility criteria published data in 2018¹⁹¹^, ⁵⁷¹ Study locations were the United States and China. The proportion of included girls ranged from 14⁵⁷¹ to 45²⁸² percent. Race and ethnicity was rarely reported, but in studies that provided a participant breakdown, the proportion of White children was 44 and 55 percent, Hispanic 19 and 20 percent, Black six and 14 percent, and Asian two and six percent in two U.S. studies.⁴⁰⁰^, ⁴⁶⁴ Several studies stated that youth with all ADHD presentations were included. Studies typically did not exclude youth with other comorbidities, but only one study assessed the effect of autism on the diagnostic accuracy.⁵¹⁸

The reported sensitivity, specificity, accuracy, and AUC values varied widely across studies. Given the wide range of reported diagnostic accuracy measures in this age group, strength of evidence was judged to be low regarding successfully diagnosing ADHD with imaging data. Rater agreement for human imaging readers, internal consistency, test-retest reliability, impact of misdiagnosis, and costs were not described. The strength of evidence was insufficient for evidence statements for these outcomes of interest.

4.5.3. Diagnostic Accuracy of Executive Function in Youth Aged 7 Through 17

While a number of studies evaluated neuropsychological tests in this age group, not all emphasized utilizing executive function characteristics for the diagnosis of ADHD. We identified 14 studies with an emphasis on executive function assessment.¹¹⁹^, ¹⁵³^, ¹⁵⁹^, ²¹³^, ²⁸⁴^, ³⁵¹^, ³⁵²^, ³⁷⁹^, ⁴⁴⁶^, ⁴⁶⁵^, ⁵⁴¹^, ⁶⁰⁷^, ⁶¹⁴^, ⁶²⁵ The earliest study was published in 1989.¹⁵⁹ Evaluations were conducted in six countries, with the United States being the most frequent country.¹⁵⁹^, ²¹³^, ⁶⁰⁷^, ⁶²⁵ The reported proportion of girls ranged from none³⁵²^, ⁶¹⁴ to 74 percent⁴⁴⁶ across studies. Race and ethnicity was rarely reported, but several identified studies included only or predominantly White youth.¹¹²^, ²¹³^, ⁶⁰⁷^, ⁶²⁵ Several studies restricted to or predominantly included youth with combined ADHD presentation,¹¹⁹^, ²⁵³^, ³⁵²^, ⁶²⁵ Studies typically did not exclude youth with comorbidities but none of the samples assessed the effect of a specific comorbidity on the diagnostic performance of the executive function test.

Sensitivity, specificity, accuracy, and AUC values ranged widely within and across the identified studies as documented in the summary of findings table. None of the identified studies assessed the performance of the same diagnostic test, and most of the studies described unique combinations of test components that were used to diagnose ADHD. All identified studies are documented in detail in the appendix. We determined the strength of evidence to be low for diagnostic outcomes of interest.

Studies did not report on rater agreement or internal consistency of the test components, but one study reported on temporal stability. The study reported correlations between tests on two occasions of 0.81 (p<0.05) for the total test score in a Tower of London–- Drexel task (assessing total move and rule violation scores), 0.79 (p<0.05) for total time violations, and 0.42 (p<0.005) for total rule violations.²¹³ Studies did not report on the impact associated with a misdiagnosis or costs of the tests. Given the lack of studies or our inability to judge consistency reported in results across studies, we determined the strength of evidence to be insufficient.

4.6. KQ1c. For both populations, how does the comparative diagnostic accuracy of these approaches vary by clinical setting, including primary care or specialty clinic, or patient subgroup, including age, sex, or other risk factors associated with ADHD?

We did not identify studies comparing the accuracy in different settings in direct, head-to-head comparisons. Hence, we had to address this KQ in indirect analyses across studies. Our analyses were further limited by studies providing insufficient details on the accuracy of performance (e.g., reporting clearly on the false positives and false negatives) and could not be based on a meta-analytic model. Instead, we used the reported summary performance measures as reported by the study authors to explore potential effect modifiers. The most common reported diagnostic performance measures were sensitivity and specificity and most analyses were only possible for these outcomes.

Figure 9 plots reported sensitivity by setting.

Figure 9

Sensitivity by setting. Notes: N/A = not available

The figure plots the sensitivity in different settings that are included in the dataset. It also shows the range within and across settings. Comparing the reported sensitivities, a simple regression analysis indicated that setting is associated with reported sensitivity (p 0.03). However, the result should be interpreted with caution, as it does not take study size or quality into account, and it was not established within a meta-analytic model. The corresponding reported specificities are shown in Figure 10.

Figure 10

Specificity by setting. Notes: N/A = not available

Reported specificity values ranged considerably, within as well as across settings. Comparing the reported specificities, a simple regression analysis did not indicate that setting is systematically associated with reported specificity (p 0.70). However, the result should be interpreted with caution, as it does not take study size or quality into account, and it was not established within a meta-analytic model. The equivalent analyses for reported accuracy (p 0.006) indicated that the reported estimate is statistically significantly associated with setting. The analysis for AUC was not significant (p 0.28).

We also evaluated whether the studies in clinical samples (i.e., referred for a clinical diagnosis, oppositional defiance disorder, or autism) and those with primarily neurotypical developing children reported different diagnostic performance values. The figure plots the sensitivity results for the two participant populations (Figure 11).

Figure 11

Sensitivity by clinical population.

Across studies, analyses detected a statistically significant difference in reported sensitivity results depending on whether a study reported on a clinical sample or children were compared to neurotypically developing children (p 0.04). On average, the sensitivity was lower in clinical samples compared to studies differentiating youth with ADHD from neurotypically developing youth (mean 75, SD 18 vs mean 81, SD 15). However, the analysis should be interpreted with caution, as it does not use a meta-analytic model for the analysis and uses reported sensitivity values as reported by the original authors.

Figure 12 plots the specificity stratified by population.

Figure 12

Specificity by clinical population.

The analysis indicated that the reported specificity was associated with the population that was used to establish diagnostic accuracy (p<0.001). On average, clinical samples reported lower specificities than studies in neurotypical samples (mean 68, SD 24 vs mean 83, SD 14). The result suggests that the clinical population appears to be a source of heterogeneity seen in the studies. However, the result should be interpreted with caution as the data were not analyzed in a meta-analytical model and used the diagnostic performance data as reported by the original authors.

Figure 13 plots the AUC values reported in included studies stratified by clinical versus neurotypical samples.

Figure 13

Specificity by clinical versus neurotypical samples.

The analyses also detected a statistically significant difference in the reported accuracy based on the population included in the evaluation sample (p<0.001). On average, the reported accuracy was lower in clinical samples than in studies that differentiated youth with ADHD from neurotypically development youth (mean 0.76, SD 0.13 versus mean 0.88, SD 0.09). However, the analysis should be interpreted with caution as it is not based on a meta-analytic model, and the number of included datapoints is smaller than for sensitivity and specificity. There were insufficient data available for analyses of other outcomes.

We further aimed to investigate whether the age of the participants is associated with the achieved diagnostic performance. Most studies included a range of ages, but studies differed in whether they included young children. Figure 14 plots sensitivity by minimum age in the sample.

Figure 14

Sensitivity by minimum age.

Across studies, we did not detect a statistically significant linear association between samples including younger children versus not on reported sensitivity (p 0.54). However, it should be noted that the number of studies that included smaller children was low and thus hindered statistical power to detect differences and this is an indirect comparison across studies that also does not take study size into account and hence should be interpreted with caution. The equivalent figure for the specificity is shown in Figure 15.

Figure 15

Specificity by minimum age.

Across studies, we did not detect a statistically significant linear association between samples including younger children or not on reported specificity (p 0.37). However, this analysis is an indirect analysis across studies which is also not based on the meta-analytic model and should therefore be interpreted with caution. We also categorized studies as younger versus older children. Using a dichotomous indicator differentiating between young (under 7) and older children (7 and over) also did not indicate a systematic effect for sensitivity (p 0.98), specificity (p 0.35), accuracy (p 0.09), or AUC (p 0.28).

We also analyzed the gender distribution in the identified studies, as the reported accuracy of a diagnosis may be associated with the gender of the participants. Figure 16 plots the percent female participants, the sensitivity, and specificity.

Figure 16

Sensitivity and specificity by proportion of female participants.

Across samples, the proportion of girls was not associated with reported sensitivity (p 0.63) or specificity (p 0.80). Analysis for reported accuracy also did not detect an effect (p 0.34) nor did an analysis of the reported AUCs (p 0.90) and there were insufficient data for further analyses. However, the number of female participants was small across studies, which lowers the statistical power to detect an effect.

There were insufficient numbers of studies to evaluate any other risk factors or participant variables on the diagnostic outcomes of interest.

4.7. KQ1d. What are the adverse effects associated with being labeled correctly or incorrectly as having ADHD?

Identified studies did not address consequence for patients correctly or not correctly receiving a diagnosis of ADHD or adverse effects associated with being labeled correctly or incorrectly as having ADHD. One study highlighted that a missed diagnosis has implications for accessing funding in the Australian healthcare system (e.g., national Disability Insurance Scheme) but provided no further empirical data.⁴⁴⁷ None of the included studies reported on stigma associated with being diagnosed or labeled with ADHD.

4.8. Summary of Findings. KQ1a–d

Table 11 provides a very broad overview of the identified research. Results of the individual studies are shown in the evidence table in Appendix C, Table C.1.

Table 11

KQ1 summary of findings and strength of evidence for the diagnosis of ADHD.

As documented in the summary of findings table, tests to diagnose ADHD were very diverse, and studies reported a large range of diagnostic and psychometric performance. Strength of evidence assessments for this group were low or insufficient for all outcomes. We downgraded results for study limitation (lack of details on the selected tests, employed machine learning algorithm used to select variables, and lack of details on the exact variables included in the final model contributing to the effect estimate), imprecision (large variation in reported diagnostic performance across studies), and/or lack of replication in more than one study assessing the same test (i.e., consistency could not be assessed). Few studies were available to diagnose ADHD in young children. More studies were available for the older children; however, studies did not report on all outcomes of interest. We downgraded the strength of evidence for study limitations where the evidence base consisted primarily of studies that provided insufficient detail on the diagnostic strategy (e.g., which cut offs, which variables exactly entered models). We downgraded for imprecision where studies reported a large range of possible diagnostic performance. The strength of evidence for other outcomes was downgraded for the domain inconsistency because consistency could not be assessed as no replication of the document effect has been identified.

Effect modifier analyses were hindered by the lack of reported detail needed to assess effects in meta-regressions. Indirect analyses using simple regression indicated that the diagnostic setting may influence diagnostic accuracy estimates. Further analyses assessing study population characteristics (e.g., whether the comparison is to neurotypical developing or was made in clinical samples) may affect estimates. Given that both aspects (e.g., clinical samples are seen in specialty care) may be associated with key outcomes for this review, we stratified the test-specific result presentation by neurotypical or clinical sample.

We did not identify studies reporting on the impact of correctly or incorrectly labeling youth as having ADHD or the impact of an incorrect diagnosis, and the strength of evidence is insufficient to make any evidence statements.

Bookshelf ID: NBK602987

Contents