This report describes the 2023 YRBSS methodology, including sampling, data collection, data processing, weighting, and data analyses. This overview and methods report is one of 11 reports in the MMWR supplement featuring 2023 YRBS data. The other 10 reports provide the most recent national data on the following topics: 1) health behaviors and experiences among AI/AN students; 2) social media use; 3) experiences of racism at school; 4) adverse childhood experiences (ACEs); 5) mental health and suicidal thoughts and behaviors; 6) transgender identity; 7) asking for consent, verbally, at last sexual contact; 8) breakfast consumption; 9) physical activity; and 10) report of unfair discipline at school. In total, five individual questions and one set of eight questions (ACEs) were added to the 2023 YRBS questionnaire to examine urgent and emerging student health behaviors and experiences. Along with results from site-level surveys, public health practitioners and researchers can use YRBS data to examine the prevalence of youth health behaviors, experiences, and conditions; monitor trends; and guide interventions. This supplement does not include data from site-level surveys; however, those results can be found in CDC's web-based applications for YRBSS data, including YRBS Explorer (https://yrbs-explorer.services.cdc.gov), Youth Online (https://nccd.cdc.gov/youthonline/App/Default.aspx), and the YRBS Analysis Tool (https://yrbs-analysis.cdc.gov).
Historically, YRBS has been administered during the spring of odd-numbered years to students in grades 9-12 enrolled in U.S. public and private schools. Although the previous YRBS was not administered until fall 2021 because of the COVID-19 pandemic, the 2023 survey resumed the typical timing and was conducted during the spring semester (January-June) 2023. Biennial administration of the YRBS allows CDC to assess temporal changes in behaviors among the U.S. high school population. YRBS, conducted among a nationally representative sample of students in grades 9-12 enrolled in U.S. public and private schools, provides comparable data across survey years and allows for comparisons between national and site-level data.
The YRBS questionnaire uses single-item measures to monitor and describe a wide variety of health behaviors and conditions. In 2023, the questionnaire consisted of 107 questions. Of those, 87 questions were included in the standard questionnaire all sites used as the basis for their site-level questionnaires. Twenty questions were added to the standard questionnaire that reflected areas of particular interest for CDC and other partners. As in all cycles, the previous year's standard questionnaire was revised to allow for the inclusion of questions assessing emerging issues and risk behaviors among high school students. Subject matter experts from CDC, academia, other Federal agencies, and nongovernmental organizations proposed changes, additions, and deletions to the questionnaire. CDC made further refinements to the questionnaire on the basis of feedback from cognitive testing with high school students. The YRBS questionnaire was offered in both English and Spanish.
All questions, except those assessing height, weight, and race, were multiple choice, with a maximum of eight mutually exclusive response options and only one possible answer per question. A recent test-retest study of most of the 2023 survey questions demonstrated substantial reliability among these questions (1). The wording of each question, including recall periods, response options, and operational definitions for each variable, are available in the 2023 YRBS questionnaire and data user's guide. (YRBSS data and documentation are available at https://www.cdc.gov/yrbs/data/index.html.)
The shift from paper-and-pencil to electronic survey administration allowed CDC to introduce new questionnaire features. First, for questions related to tobacco products, prescription opioid medicine, and contraceptives, the tablet displayed images to enhance students' understanding of the question or response options. Second, the questionnaire included skip patterns, meaning that students who responded that they did not engage in a particular behavior (e.g., current cigarette smoking) were not shown subsequent questions regarding that behavior (e.g., number of cigarettes smoked per day). Questions that were skipped appropriately based on responses to a previous question were not coded as missing in the data set, but instead with a response option noting that the student did not engage in the behavior measured in the subsequent question. For example, a student who responded "no" to "Have you ever smoked a cigarette, even one or two puffs?" would not be shown the question, "During the past 30 days, on how many days did you smoke cigarettes?" but their response to that question in the data set would be coded as 0 days. Third, electronic data collection allowed for real-time logic checks, reducing the amount of editing required after data collection (i.e., the questionnaire was programmed so that if students entered an invalid response for items such as height and weight, they were prompted to correct it).
The sample for the 2023 YRBS included two components. The main sample was designed to provide nationally representative data. The supplemental sample was designed to be used in combination with the main sample to increase the number of AI/AN participants.
For the main sample, the sampling frame consisted of all regular public schools (including charter schools), parochial schools, and other private schools with students in at least one of grades 9-12 in the 50 U.S. states and the District of Columbia. Alternative schools, special education schools, schools operated by the U.S. Department of Defense or the Bureau of Indian Education, and vocational schools serving students who also attended another school were excluded. Schools with ≤40 students enrolled in grades 9-12 (combined) also were excluded. The sampling frame was constructed from data files obtained from MDR (formerly Market Data Retrieval) and the National Center for Education Statistics (NCES). NCES data sources included the Common Core of Data (https://nces.ed.gov/ccd) for public schools and the Private School Survey (https://nces.ed.gov/surveys/pss) for private schools.
A three-stage cluster sampling design was used to produce a nationally representative sample of students in grades 9-12 who attend public and private schools. The first-stage sampling frame comprised 1,257 primary sampling units (PSUs), which consisted of entire counties, groups of smaller adjacent counties, or parts of larger counties. PSUs were categorized into 16 strata according to their metropolitan statistical area status (i.e., urban or nonurban) and the percentages of Black or African American (Black) and Hispanic or Latino (Hispanic) students in each PSU. Of the 1,257 PSUs, 60 were sampled with probability proportional to overall school enrollment size for that PSU. For the second-stage sampling, secondary sampling units (SSUs) were defined as a physical school with grades 9-12 or a school created by combining nearby schools to provide all four grades. From the 60 PSUs, 180 SSUs were sampled with probability proportional to school enrollment size. To provide adequate coverage of students in small schools, an additional 20 small SSUs were selected from a subsample of 20 of the 60 PSUs. These 200 SSUs corresponded to 204 physical schools. The third stage of sampling comprised random sampling of one or two classrooms in each of grades 9-12 from either a required subject (e.g., English or social studies) or a required period (e.g., homeroom or second period). All students in sampled classes who could independently complete the questionnaire were eligible to participate. Schools, classes, and students that refused to participate were not replaced.
The sampling frame for the AI/AN supplemental sample was constructed using the same data sources and process used for the main sampling frame. As an additional step, the sampling frame was restricted to public schools with an estimated enrollment of ≥28 students in each grade to most efficiently reach AI/AN students. As with the main sample, Bureau of Indian Education schools were not included in the frame because of their unique nature and location on lands that often are tribally controlled (2). Although this more restricted frame limited the coverage when using the supplemental sample alone, sample representation of the AI/AN population was expanded when the supplemental sample was combined with the main sample, which represents all schools, including schools with <28 students in each grade as well as nonpublic schools.
As with the main sample, the supplemental sample used a three-stage cluster sampling design. The first-stage sampling frame comprised the same 1,257 PSUs, of which 55 SSUs were sampled with probability proportional to the aggregate AI/AN school enrollment size in grades 9-12. These 55 SSUs corresponded to 114 physical schools. The third stage of sampling followed the same process as for the main sample, except that two classrooms in each grade were selected to participate to maximize the number of AI/AN students.
Institutional review boards at CDC and ICF, the survey contractor, approved the protocol for YRBS. Data collection was conducted consistent with applicable Federal law and CDC policy.* Survey procedures were designed to protect students' privacy by allowing for anonymous participation. Participation was voluntary, and local parental permission procedures were followed before survey administration. During survey administration, students completed the self-administered questionnaire during one class period using tablets that had been programmed with the survey instrument. Trained data collectors visited each school to distribute the tablets to the students and collect them after survey completion. The tablets were not connected to the Internet. Instead, students' data were saved to the tablets, and data collectors synchronized all locally stored data to a central repository at the end of each day.
The shift from paper-and-pencil to electronic questionnaire administration provided several benefits. First, electronic data collection reduced the time needed for students to complete the survey. Whereas the paper-and-pencil version of the survey used in previous cycles took a full 45-minute class period to complete, the tablet version was typically completed in 25 minutes. This decrease is a result of the increased speed of touching a response on a tablet compared with filling a bubble on a scannable booklet using a pencil, as well as the use of skip patterns. Further, students have been found to prefer electronic surveys over paper-and-pencil surveys because of their familiarity with and comfort using electronic devices (3). Third, electronic administration eliminated the use of paper. Not only is this a more environmentally friendly approach, but it also increased the speed at which the data could be compiled. Rather than waiting for completed booklets to be shipped and scanned, data were available for processing as soon as the tablets were synchronized. This also allowed CDC to track data collection progress in nearly real-time. Finally, students who were absent on the day of data collection and could not complete the questionnaire on a tablet were able to complete a web-based version of the questionnaire in a setting similar to the tablet administration when they returned to school; 323 surveys were completed using this web-based platform rather than the tablet, which increased overall completion rates by eliminating the need for schools to mail questionnaires back to the survey contractor.
The main sample and the AI/AN supplemental sample were combined to create a single sample file for the 2023 national survey. At the end of the data collection period, 20,386 questionnaires were completed in 155 schools. The national data set was cleaned and edited for inconsistencies. Missing data were not statistically imputed. A questionnaire failed quality control when <20 responses remained after editing or when it contained the same answer to ≥15 consecutive questions. Among the 20,386 completed questionnaires, 283 failed quality control and were excluded from analysis, resulting in 20,103 usable questionnaires. The school response rate was 49.8%, the student response rate was 71.0%, and the overall response rate (i.e., [student response rate] x [school response rate]) was 35.4%.
Race and ethnicity were ascertained from two questions: 1) "Are you Hispanic or Latino?" (yes or no) and 2) "What is your race?" (American Indian or Alaska Native [AI/AN], Asian, Black or African American [Black], Native Hawaiian or other Pacific Islander [NH/OPI], or White). For the second question, students could select more than one response option. (Persons of Hispanic or Latino origin might be of any race but are categorized as Hispanic; all racial groups are non-Hispanic.) Except for the report in this MMWR supplement that focused on AI/AN students, students were classified as Hispanic or Latino and are referred to as Hispanic if they answered "yes" to the first question, regardless of how they answered the second question. For example, students who answered "no" to the first question and selected only Black or African American to the second question were classified as Black or African American and are referred to as Black. Likewise, students who answered "no" to the first question and selected only White to the second question were classified and are referred to as White. Race and ethnicity were classified as missing for students who did not answer the first question and for students who answered "no" to the first question and did not answer the second question. Students who selected more than one response option to "What is your race?" were classified as multiracial. This classification of race and ethnicity aligns with the Office of Management and Budget standards in place at the time of the survey (https://www.govinfo.gov/content/pkg/FR-1997-10-30/pdf/97-28653.pdf). Although using uniform classifications facilitates trend interpretation and between-group comparisons, preferred terminology classification practices are evolving; the Office of Management and Budget released new standards after the 2023 YRBS cycle was completed (https://www.federalregister.gov/documents/2024/03/29/2024-06469/revisions-to-ombs-statistical-policy-directive-no-15-standards-for-maintaining-collecting-and). In addition, the unilateral classification of race and ethnicity does not describe the heterogeneity and unique experiences of students within a particular racial or ethnic group (4).
To obtain a sufficient sample size for analyses of health behaviors, experiences, and conditions by sexual identity, students were categorized as heterosexual if they chose that response option, and students who responded as gay or lesbian, bisexual, "I describe my sexual identity some other way," or "I am not sure about my sexual identity/questioning" were usually grouped together as LGBQ+ (Table 1). Although this binary categorization often was necessary for statistical analysis, LGBQ+ populations are not a single homogeneous group, and this categorization might result in a loss of understanding the unique experiences of these sexual identity subgroups (5). Students also were categorized into those who had no sexual contact, those who had sexual contact with only the opposite sex, or those who had sexual contact with only the same sex or with both sexes on the basis of their responses to the question, "During your life, with whom have you had sexual contact?" Students who had no sexual contact were excluded from analyses related to sexual behaviors. Female students who had sexual contact with only females were excluded from analyses on condom use.
Weights were applied to the final sample so that responses were generalizable to the U.S. student population in grades 9-12. For the 2023 YRBS, weights were calculated separately for the main sample and the AI/AN supplemental sample. The calculation of the weights followed the same process for both samples. First, a weight was applied based on student sex, race and ethnicity, and grade to each record to adjust for school and student nonresponse. Next, the two weighted data sets were concatenated and combined weights were calculated as final survey weights. Finally, the overall weights were scaled so that the weighted count of students equaled the total sample size, and the weighted proportions of students in each grade matched the national population proportions. Therefore, in the national data set, weighted estimates are nationally representative of all students in grades 9-12 attending U.S. public and nonpublic schools.
Findings presented in this MMWR supplement are derived from analytic procedures similar to what is described in this overview report. For more information about the detailed analyses presented in other reports in this supplement (e.g., variables analyzed, custom measures, and data years), see Methods in each individual report.
All statistical analyses used SAS-callable SUDAAN (version 11.0.3 or 11.0.4; RTI International) to account for the complex sampling design and weighting. In all reports, prevalence estimates and CIs were computed for variables used in those reports. Prevalence estimates where the denominator was <30 were considered statistically unreliable and therefore were suppressed. In certain reports, chi-square tests were used to examine associations between health behaviors, experiences, or conditions and demographic characteristics (e.g., sex, race and ethnicity, grade, sexual identity, and sex of sexual contacts). Pairwise differences between groups (e.g., male versus female students) were determined using t-tests. All analyses used a domain analysis approach to make certain the accurate calculation of standard errors, CIs, and p values despite missing data in certain variables. Prevalence differences and ratios were calculated using logistic regression with predicted marginals. All prevalence estimates and measures of association used Taylor series linearization. All tests were considered statistically significant at the p<0.05 level. Prevalence ratios were considered statistically significant if 95% CIs did not cross the null value of 1.0.
For analyses of temporal trends reported in the YRBSS web applications and the Youth Risk Behavior Survey Data Summary & Trends Report: 2013-2023 (https://www.cdc.gov/yrbs/dstr/index.html), logistic regression analyses were used to examine linear and quadratic changes in estimates, controlling for sex, grade, and racial and ethnic changes over time. A p value of <0.05 associated with a regression coefficient was considered statistically significant. Linear and quadratic time variables were treated as continuous and were coded using orthogonal coefficients calculated with PROC IML in SAS (version 9.4; SAS Institute). A minimum of 3 survey years was required for calculating linear trends, and a minimum of 6 survey years was required to calculate quadratic trends. Separate regression models were used to assess linear and quadratic trends. When a significant quadratic trend was identified, Joinpoint (version 5.02; National Cancer Institute) was used to automate identification of the year when the trend changed. Then, regression models were used to identify linear trends occurring before and after the change in trend. A quadratic trend indicates a statistically significant but nonlinear change in prevalence over time. A long-term temporal change that includes a significant linear and quadratic trend demonstrates nonlinear variation (e.g., leveling off or change in direction) in addition to an overall increase or decrease over time. Cubic and higher-order trends were not assessed.
For analyses of 2-year changes in the YRBSS web applications, prevalence estimates from 2021 and 2023 were compared by using t-tests for behaviors, experiences, or conditions assessed with identically worded questions in both survey years. Prevalence estimates were considered statistically different if the t-test p value was <0.05.
National and site-level YRBS data (1991-2023) are available in a combined data set from the YRBSS data and documentation website (https://www.cdc.gov/yrbs/data/index.html), as are additional resources, including data documentation and analysis guides. Data are available in both Access and ASCII formats, and SAS and SPSS programs are provided for converting the ASCII data into SAS and SPSS data sets. Variables are standardized to facilitate trend analyses and for combining data. YRBSS data also are available online via three web-based data dissemination tools: Youth Online, YRBS Analysis Tool, and YRBS Explorer. Youth Online allows point-and-click data analysis and creation of customized tables, graphs, maps, and fact sheets (https://nccd.cdc.gov/Youthonline/App/Default.aspx). Youth Online also performs statistical tests by health topic and filters and sorts data by race and ethnicity, sex, grade, and sexual orientation. The YRBS Analysis Tool allows real-time data analysis of YRBS data that generates frequencies, cross-tabulations, and stratified results (https://yrbs-analysis.cdc.gov). YRBS Explorer is an application featuring options to view and compare national, state, and local data via tables and graphs (https://yrbs-explorer.services.cdc.gov).