*Robert Brame. 21st Century Criminology: A Reference Handbook. Editor: J Mitchell Miller. Sage Publication. 2009.*

Since its inception as a field of scientific inquiry, criminology and criminal justice (CCJ) researchers have used quantitative data to describe and explain criminal behavior and social responses to criminal behavior. Although other types of data have been used to make important contributions to criminological thought, the analysis of quantitative data has always played an important role in the development of knowledge about crime. This chapter discusses the various types of quantitative data typically encountered by CCJ researchers. Then, some of the logical and inferential issues that arise when researchers work with quantitative data are described. Next, the chapter considers different analytic frameworks for evaluating evidence, testing hypotheses, and answering research questions. Finally, a discussion of the range of methodological approaches used by contemporary CCJ researchers is provided.

## Quantitative Data Sources

CCJ researchers commonly work with data collected for official recordkeeping by government or quasi-government agencies. Such data often include records of criminal events, offender and victim characteristics, and information about how cases are handled or disposed. Detailed information about crimes known to the police and crimes cleared by arrest are available in the Uniform Crime Reports (UCR) and the National Incident Based Reporting System (NIBRS). In addition, for purposes of specific research projects, criminal justice agencies often make their administrative records available to criminologists—provided that appropriate steps are taken to protect individual identities. For example, the Bureau of Justice Statistics has conducted two major studies of recidivism rates for prisoners returning to the community in multiple states. Such projects require coordinated use of state correctional databases and access to criminal records, including arrests, convictions, and reincarceration.

More recently, researchers have also relied on information collected through direct interviews and surveys with various populations. In these surveys, respondents are asked about their involvement in offending activities, victimization experiences, background characteristics, perceptions, and life circumstances. Analyses from data collected through the National Crime Victimization Survey; the Arrestee Drug Abuse Monitoring program; the RAND inmate survey; the National Youth Survey; the National Longitudinal Survey of Youth; the Adolescent Health Study; Monitoring the Future (MTF); Research on Pathways to Desistance, and the Office of Juvenile Justice and Delinquency Prevention’s longitudinal youth studies in Rochester, New York, Pittsburgh, Pennsylvania, and Denver, Colorado, have all made important contributions to criminological thought and public policy.

Researchers have also attempted, in some studies, to collect detailed quantitative databases composed of information from both administrative and direct surveys on the same individuals. Among other findings, this research has consistently shown that most crime victimizations are not reported to the police and that most offending activities do not result in an arrest.

## Logical and Inferential Issues

The analysis of quantitative crime-related data, like any other type of analysis, depends primarily on the question one is asking and the capabilities of the data available. This section briefly discusses some of the most prominent issues that crime researchers consider when analyzing quantitative data.

__Time Horizon__

Regardless of the data source, research projects using quantitative data can generally be characterized as crosssectional or longitudinal. Cross-sectional studies examine individuals or populations at a single point in time, whereas longitudinal studies follow the same individuals or populations over a period of time. Among longitudinal studies, an important consideration is whether the data will be collected prospectively or retrospectively. In prospective studies, individuals are enrolled in the study and then followed to see what happens to them. In retrospective studies, individuals are enrolled in the study, and researchers then examine historical information about them. Some studies include both prospective and retrospective elements. For example, the Research on Pathways to Desistance study enrolled adolescent offenders in Phoenix, Arizona, and Philadelphia to see how these offenders adapt to the transition from adolescence to adulthood. In that sense, the study is prospective; however, historical information about the individuals included in the study is available and has been collected retrospectively as well.

In most studies, it is clear whether the project is crosssectional or longitudinal, but there are exceptions. For example, the MTF study repeatedly surveys nationally representative samples of high school seniors. This study can be viewed as cross-sectional because it does not survey the same individuals repeatedly, but it can also be viewed as longitudinal because the same methodology for drawing the sample and analyzing the data is repeated over time. Similar issues arise with UCR and NIBRS data. Often, specific studies using a repeated cross-sectional data source, such as MTF, UCR, or NIBRS, will tend to emphasize either crosssectional or longitudinal features of the data.

__Unit of Analysis__

It is also useful to think about research projects in terms of the basic source of variation to be studied. For example, some studies focus on variation in crime between communities, whereas other studies examine variation in criminality between individual persons. Still other studies attempt to describe and explain variation in behavior over time for the same community or individual. In some studies, the unit of analysis is unambiguous, whereas in other instances, there may be multiple logical analysis units (e.g., multiple observations on the same person and multiple persons per community). These studies are generally referred to as hierarchical or multilevel analyses. An important issue arising in these analyses is lack of independence among observations belonging to a logical higher-order group. For example, individuals who live in the same community or who attend the same school are not likely to be truly independent of each other.

__Sampling__

The list of all cases that are eligible to be included in a study is called the sampling frame. The sample included in the study will either be identical to the sampling frame or it will be a subset of the sampling frame. In some instances, the sampling frame is explicitly defined; at other times, the sampling frame is vague. Researchers generally describe the manner in which the sample was selected from the sampling frame in terms of probability or nonprobability sampling. In probability sampling, each case in the sampling frame has a known, non-zero probability of being selected for the sample. Samples selected in any other way are called nonprobability samples. The most basic form of probability sampling is simple random sampling, when each member of the sampling frame has an equal probability of being selected for the sample. More complicated forms of probability sampling, such as stratified random sampling, cluster sampling, and stratified multistage cluster sampling, are all commonly used in CCJ research.

The use of probability sampling allows researchers to make clear statements about the generalizability of their results. Although this is a desirable feature of probability samples, much CCJ research is based on nonprobability samples. The 1945 and 1958 Philadelphia birth cohort studies conducted by Marvin Wolfgang and his colleagues (Wolfgang, Figlio, & Sellin, 1972) focused on an entire population of individuals rather than a sample. Still, one can view the choice of the years 1945 and 1958 as a means of sampling. In fact, when populations are studied, there is almost always a way to conceive of them as nonprobability samples. In other studies, a researcher may survey all children in attendance at a school on a particular day. The resulting sample would be called a convenience or availability sample. Still other research projects rely on the purposive selection of certain numbers of people meeting particular criteria to ensure representation of people from different groups (i.e., males, females, blacks, whites, etc.). These samples are usually called quota samples. A key feature of nonprobability samples is that one is not able to make explicit probabilistic statements about quantities in the population based on what one observes in the sample. Nevertheless, nonprobability samples are quite useful and necessary for addressing many interesting research and policy questions that arise in CCJ research.

__Target Population__

A key aspect of any scientific work is the identification of empirical regularities that transcend specific individuals, places, or times. Thus, the population to which the results of a study generalize is of considerable importance. In general, researchers tend to prefer studies that identify the target population and discuss how well the results are likely to generalize to that population. But the target population is sometimes ambiguous. If one studies all individuals in attendance at a particular school on a given day, one could argue that the sample is synonymous with the target population. The research community, however, is not likely to be interested in what is occurring at that individual school unless it somehow relates to what is occurring at other schools in other locations and at other times. This ambiguity means that one cannot make precise statements about the generalizability of the results to other settings. Thus, clear statements about the composition and boundaries of the target population are often the exception rather than the rule.

__Concepts and Variables__

Scientific theories describe relationships between concepts. In this sense, concepts represent the key elements of a well-developed theory. Concepts are verbal cues or symbols that sometimes refer to simple or complicated sources of variation. Sex (male vs. female), for example, refers to a simple, objective source of variation, whereas the meaning of concepts such as delinquency or socioeconomic status is potentially quite complicated. Still, reference to concepts for purposes of theory and hypothesis development can be sufficient. For purposes of conducting empirical tests of theories and hypotheses, however, more rigor and specificity are required.

Variables are the language of actual empirical work. A researcher’s description of a variable explicitly defines how the concept in question is to be measured for purposes of an actual research project. An operational description or definition of a variable attends to how the variable was measured and what values the variable can take on. Variables such as sex and race are categorical, whereas variables such as age and income are quantitative. Categorical variables can be nominal (unordered categories) or ordinal (ordered categories, but the distance between categories is not welldefined). Quantitative variablescan be interval (equal distance between categories) or ratio (existence of a true zero). Still another type of variable, of particular interest to criminologists, is a count of events. Event-count variables represent the number of times an event occurs within some period of time. One way to think of an event-count variable is to consider a two-category variable: Either an event occurs or does not occur within some small time interval. If one adds up the number of times an event occurs over many of these small time intervals, one gets a total count of events.

Some concepts are too broad to be measured effectively with a single variable. Socioeconomic status, for example, is often linked to a combination of at least three subordinate concepts: (1) educational attainment, (2) income, and (3) occupational prestige. Often, variables associated with closely related subordinate concepts can be combined into a scale or index that measures the conceptual variation of interest. There are different ways to form scales and indexes. Some are driven by mathematical decision rules based on correlations between the items comprising the scale or index, and others are based on conceptual considerations.

__Descriptive and Causal Inference__

Still another important feature of any quantitative study is whether it emphasizes description or the identification of cause–effect relationships. Descriptive inference is a characterization or summary of important features of a population. For example, the main objective of the 1993 Bureau of Justice Statistics recidivism study was to estimate the percentage of offenders released from prison in 1993 who experienced subsequent involvement with the criminal justice system within 3 years of their release. No effort was made to explain variation in the recidivism rate; instead, the goal was pure description.

Causal inference is the process of distinguishing between a correlation or statistical association between two or more variables and a cause–effect relationship between those variables. In order for a variable xto be considered a cause of variable y, three criteria must be satisfied: (1) x precedes y in time, (2) x and y are statistically associated, and (3) the statistical association between x and y is not spurious (i.e., there is no other variable that can account for or explain the statistical association between x and y). It turns out that establishing the first two criteria is reasonably straightforward. Convincingly demonstrating nonspuriousness, however, is much more difficult. This issue is discussed in more detail in the “Analytic Methods for Causal Inference” section.

__Validity__

The word validity is often used in two broad contexts in CCJ research. It may be used to indicate whether (or to what extent) a specific measure is an accurate characterization of the concept being studied. For example, one might ask whether an IQ test is a valid measure of intelligence. The word validity is also used as a way of characterizing a study or particular methodological approach. In this case, the concern is whether the study or method is likely to faithfully present the world as it really operates or whether it will distort the phenomena under study in some important way. As an example of this usage, one might consider whether a study with a pretest outcome measurement followed by an intervention and then a posttest outcome measurement but no control group (a group that does not experience the intervention) is a valid study.

A number of different types of validity appear in the CCJ literature. A few common types are discussed here. Assessments of face validity are subjective judgments about whether a measurement or methodology is likely to yield accurate results. If a measure successfully predicts variation in a logically linked outcome, one can say that it rates high on criterion or predictive validity. For example, if one has a parole risk assessment instrument that is designed to predict likelihood of recidivism and the instrument, in fact, does do a good job of recidivism prediction, then one can say that it exhibits criterion validity. Measures with good construct validity are correlated with wellestablished indicators of the phenomenon in question. Such measures should also be independent of indicators that are not relevant to the phenomenon in question.

Studies with high internal validity take convincing steps to ensure that the logic of the study as applied to the individuals actually being studied is sound. External validity, on the other hand, refers to the generalizability of the study’s results to individuals other than those actually included in the study. Internal validity tends to be maximized when the researcher is able to exert a great deal of control over the study and the environment in which the study is conducted (i.e., a laboratory setting). Unfortunately, when the researcher exerts great control, the conditions of the study sometimes become more artificial and less realistic. This raises questions about how well the study results will generalize to other cases. To the extent that the researcher attempts to allow for more realistic study environments (and greater external validity), this will often lead to less control over the study, which produces threats to internal validity. Researchers desire studies that maximize both internal and external validity, but this is often difficult to achieve.

__Reliability__

Reliability refers to the consistency, stability, or repeatability of results when a particular measurement procedure or instrument is used. Researchers aspire to the use of instruments and procedures that will produce consistent results (provided that the phenomena under study have not changed). There are different ways of assessing and quantifying reliability. One approach is to take a measurement at a particular point in time and then repeat that same measurement at a later point in time. The correlation between the two measurements is called test–retest reliability. Another approach is to conduct multiple measurements with some variation in the precise measurement method; for example, multiple questionnaires with variations in the wording of various items can be administered to the same individuals. The correlation between the various instruments is called parallel forms reliability.

In some instances, researchers need to code various pieces of information into quantitative research data. A concern often arises about whether the coding rules are written in such a way that multiple properly trained coders will reach the same coding decisions. Interrater reliability is considered to be high when there is a high correlation between the decisions of multiple coders who have reviewed the same information.

Reliability can also be assessed by examining correlations between multiple indicators of the same underlying concept. Assume, for example, that a researcher believes that a key influence on criminal behavior is an individual’s level of self-control. Because there is no single definitive measure of self-control, the researcher might measure many indicators and characteristics of individuals that he believes to be manifestations of one’s level of self-control (i.e., time spent on homework each day, grades in school, time spent watching television, etc.). One way of assessing the reliability of a scale or index that combines this information is to calculate the correlations between all of the indicators, which can then be used to calculate internalconsistency reliability. High levels of internal-consistency reliability imply that the various characteristics and indicators being studied are closely related to each other.

__Relationship between Reliability and Validity__

Measures or procedures for capturing measurements can be highly reliable but also invalid. It is possible, for example, to obtain consistent but wrong or misleading measurements. Measures or procedures can also be both unreliable and invalid. In general, however, if a measure is valid it must also, by definition, be reliable.

__Estimates and Estimators__

An estimate is a person’s guess about the value of some interesting quantity or parameter for a target population. Researchers obtain an estimate by applying a formula or estimator to observed data that can be used to develop inferences about the target population. The most straightforward case is when one studies observed data from a simple random sample drawn from a well-defined target population. The goal is to infer the value of a parameter or quantity in the population on the basis of what one observes in the sample. A researcher plugs the observed data into an estimator and then uses the estimator, or formula, to calculate an estimate of the quantity of interest in the population.

__Estimator Properties: Bias, Efficiency, and Consistency__

In the case of a probability sample drawn from a welldefined population, there is a true population parameter or quantity that researchers seek to estimate on the basis of what they see in the sample. An important issue is whether the estimator applied to the sample will—over the course of drawing many, many probability samples—on average lead to the correct inference about the population parameter. If the average of the parameter estimates is different from the true population parameter, one says that the estimator is biased.

Sometimes there are different unbiased estimators or formulas that could be used to estimate a population quantity. An important question is how to choose one estimator over another. Generally speaking, in this situation researchers would prefer the unbiased estimator that exhibits the least amount of variation in the estimates generated over many samples drawn from the same population. The estimator that exhibits the minimum amount of sample-to-sample variation in the estimates is the most efficient estimator. For example, the sample mean, the sample median, and the sample mode (see “Measures of Central Tendency” section) are both valid estimators for the population mean of a normally distributed variable. The sample mean, however, is a more efficient estimator than the sample median, which is itself more efficient than the sample mode.

In some circumstances, an unbiased estimator is not available. When this happens, researchers typically try to use a consistent estimator. A consistent estimator is biased in small samples, but the bias decreases as the size of the sample increases. Many commonly used estimators in the social sciences, such as logistic regression (discussed later in this chapter), are consistent rather than unbiased.

## Assessing Evidence

A statistical model is a description of a process that explains (or fails to explain) the distribution of the observed data. A problem that arises in quantitative CCJ research is how to consider the extent to which a particular statistical model is consistent with the observed data. This section describes several common frameworks for thinking about this correspondence.

__Relative Frequency__

In quantitative crime research, decisions about whether to reject or fail to reject a particular hypothesis are often of central importance. For example, a hypothesis may assert that there is no statistical association between two variables in the target population. A test of this hypothesis amounts to asking the following question: What is the probability of observing a statistical association at least as large (either in absolute value or in a single direction) as the one observed in this sample if the true statistical association in the target population is equal to zero? Put another way, assume that there is a target population in which the statistical association is truly equal to zero. If a researcher drew many simple random samples from that population and calculated the statistical association in each of those samples, he or she she would have a sampling distribution of the statistical association parameter estimates. This theoretical sampling distribution could be used to indicate what percentage of the time the statistical association would be at least as large as the association the researcher observed in the original random sample.

Generally speaking, if the percentage is sufficiently low (often, less than 5%), one would reject the hypothesis of no statistical association in the target population. A concern that arises in these kinds of tests is that the hypothesis to be tested is usually very specific (i.e., the statistical association in the target population is equal to zero). With a very large sample size it becomes quite likely that the so-called test statistic will lead a person to reject the hypothesis even if it is only slightly wrong. With a very small sample size, the test statistic is less likely to lead one to reject the hypothesis even if it is very wrong. With this in mind, it is important for researchers to remember that hypothesis tests based on the relative frequency approach are not tests of whether the statistical association in question is large or substantively meaningful. It is also important to keep in mind that the interpretation of statistical tests outside of the framework of well-defined target populations and probability samples is much more ambiguous and controversial.

__Bayesian Methods__

Researchers often find the relative frequency framework to be technically easy to use but conceptually difficult to interpret. In fact, researchers and policymakers are not necessarily so concerned with the truth or falsehood of a specific hypothesis (e.g., that a population parameter is equal to zero) as they are with the probability distribution of that parameter. For example, it might be of more interest to estimate the probability that a parameter is greater than zero rather than the probability that a sample test statistic could be as least as large as it is if the population parameter is equal to zero. Analysis conducted in the Bayesian tradition (named after the Rev. Thomas Bayes, who developed the well-known conditional probability theorem) places most of its emphasis on the estimation of the full probability distribution of the parameter(s) of interest. In general, Bayesian methods tend not to be as widely used as relative frequency (or frequentist) methods in CCJ research. This is probably due to the training received by most criminologists, which tends to underemphasize Bayesian analysis. Because Bayesian analyses can often be presented in terms that are easier for policy and lay audiences to understand, it is likely that Bayesian methods will become more prominent in the years ahead.

__Parameter Estimation and Model Selection__

CCJ researchers typically rely on quantitative criteria to estimate parameters and select statistical models. Common criteria for parameter estimation include least squares (LS) and maximum likelihood (ML). LS estimators minimize the sum of the squared deviations between the predicted and actual values of the outcome variable. ML estimators produce estimates that maximize the probability of the data looking the way they do. Provided the necessary assumptions are met, LS estimators are unbiased and exhibit minimum sampling variation (efficiency). ML estimators, on the other hand, are typically consistent, and they become efficient as the sample size grows (asymptotic efficiency).

Model selection involves the choice of one model from a comparison of two or more models (i.e., a model space). The most prominent model selection tools include F tests (selection based on explained variation) and likelihoodratio tests (selection based on likelihood comparisons). An important issue with these tests is that they typically require that one model be a special case of the other models in the model space. For these approaches, tests are therefore limited to comparisons of models that are closely related to each other. Increasingly, model selection problems require researchers to make comparisons between models that are not special cases of each other. In recent years, two more general model selection criteria have become more widely used: (1) the Akaike information criterion (AIC) and (2) the Bayesian information criterion (BIC). These criteria can be used to compare both nested and non-nested models provided the outcome data being used for the comparison are the same. Like F tests and likelihood-ratio tests, AIC and BIC penalize for the number of parameters being estimated. The logic for penalizing is that, all other things equal, we expect a model with more parameters to be more consistent with the observed data. In addition to penalizing for parameters, the BIC also penalizes for increasing sample size. This provides a counterweight to tests of statistical significance, such as the F test and the likelihood-ratio test, which are more likely to select more complicated models when the sample size is large. As modeling choices continue to proliferate, it seems likely that use of AIC and BIC will continue to increase.

## Methods for Descriptive Inference

This section briefly considers some descriptive parameters often studied in CCJ research. The first two subsections deal with parameters that are usually of interest to all social scientists. The final three subsections emphasize issues of particular importance for CCJ research.

__Measures of Central Tendency__

Central tendency measures provide researchers with information about what is typical for the cases involved in a study for a particular variable. The mean or arithmetic average (i.e., the sum of the variable scores divided by the number of scores) is a common measure of central tendency for quantitative variables. The mean has an advantage in that each case’s numerical value has a direct effect on the estimate; thus, the mean uses all of the information in the scores to describe the “typical” case. A problem with the mean is that cases with extreme scores can cause the mean to be much higher or much lower than what is typical for the cases in the study. In situations where the mean is affected by extreme scores, researchers often prefer to use the median as a measure of central tendency. The median is the middle score of the distribution; half of the cases have scores above the median, and the other half have scores below the median. The median can also be viewed as the 50th percentile of the distribution. Unlike the mean, the median does not use all of the information in the data, but it is also not susceptible to the influence of extreme scores. For categorical variables, the mode (i.e., the most frequently occurring category) is often used as a measure of central tendency. For dichotomous or two-category variables, the most commonly used measure of central tendency is the proportion of cases in one of the categories.

__Measures of Dispersion__

In addition to summarizing what is typical for the cases in a study, researchers usually consider the amount of variation as well. Several common summaries of variation, or dispersion, are commonly reported in the literature. The most common measure of dispersion for quantitative variables is the variance and/or its square root, the standard deviation. Many interesting social science variables are either normally or approximately normally distributed (i.e., the distribution looks like a bell-shaped curve). In these types of distributions, approximately two thirds of the cases fall within 1 standard deviation of the mean, and about 95% of the cases fall within 2 standard deviations of the mean. Thus, for variables with a bell-shaped distribution, the standard deviation has a very clear interpretation. This is particularly important because sampling distributions are often assumed to have normal distributions. Thus, the standard error calculation that appears in much quantitative CCJ research is actually an estimate of the standard deviation of the sampling distribution. It can be used to form confidence intervals and other measures of uncertainty for parameter estimates in the relative frequency framework.

For qualitative or categorical variables, a common measure of dispersion is the diversity index, which measures the probability that cases come from different categories. Some CCJ researchers have used the diversity index to study offending specialization and ethnic–racial heterogeneity in communities and neighborhoods. A generalized version of the diversity index that adjusts for the number of categories is theindex of qualitative variation, which indicates the extent to which individuals are clustered within the same category or distributed across multiple categories.

__Criminal Careers__

Over the past three to four decades, criminologists have developed the concept of the criminal career. According to researchers who study criminal career issues, within any given time period the population can be divided into two groups: (1) active offenders and (2) everyone else. The percentage of the population in the active offender category is the crime participation rate. Within that same time period, active offenders vary in several respects: (a) the number of offenses committed, (b) the seriousness of the offenses committed, and (c) the length of time the offender is actively involved in criminal activity. A key idea within the criminal career framework is that the causes of participation may not be the same as the causes of offense frequency, seriousness, or the length of time the offender is active.

There is an extensive body of research devoted to estimating these parameters for general and higher-risk populations, and more recent research has treated these criminal career dimensions as outcomes in their own right. For example, a large amount of research has been devoted to the study of offense frequency distributions. This literature shows that in both general and high-risk populations offense frequency distributions tend to be highly skewed, with most individuals exhibiting low frequencies and a relatively small number of individuals exhibiting high frequencies. Among the most prominent findings in the field came from Wolfgang et al.’s (1972) study of the 1945 Philadelphia male birth cohort, which showed that about 6% of the boys in the cohort were responsible for over 50% of the police contacts for the entire cohort.

__Recidivism Rates__

A particularly important parameter for criminal justice policy is the rate at which individuals who have offended in the past commit new crimes in the future (the recidivism or reoffending rate). Recidivism rates are based on three key pieces of information: (1) the size of the population of prior offenders at risk to recidivate in the future, (2) the number of individuals who actually do reoffend by whatever measure is used (i.e., self-report of new criminal activity, rearrest, reconviction, return to prison), and (3) a known follow-up period or length of time that individuals will be followed. Recidivism is also sometimes studied in terms of the length of time that lapses between one’s entry into the population of offenders at risk to recidivate and the timing of one’s first recidivism incident.

__Trajectories and Developmental Pathways__

With the advent of a large number of longitudinal studies of criminal and precriminal antisocial and aggressive behaviors, researchers have become increasingly interested in the developmental course of criminality as people age. To aid in the discovery of developmental trends and patterns, criminologists have turned to several types of statistical models that provide helpful lenses through which to view behavior change. The most prominent of these models are growth curve models, semiparametric trajectory models, and growth curve mixture models. These all assume that there is important variation in longitudinal patterns of offending. Some individuals begin offending early and continue at a sustained high rate of offending throughout their lives, whereas others who begin offending early seem to stop offending during adolescence and early adulthood. Some individuals avoid offending at all, whereas others offend in fairly unsystematic ways over time. Growth and trajectory models provide ways of summarizing and describing variation in the development of criminal behavior as individuals move through the life span.

## Analytic Methods for Causal Inference

The foundation of a sound quantitative criminology is a solid base of descriptive information. Descriptive inference in criminology turns out to be quite challenging. Criminal offending is covert activity, and exclusive reliance on official records leads to highly deficient inferences. Despite important challenges in descriptive analysis, researchers and policymakers still strive to reach a better understanding of the effects of interventions, policies, and life experiences on criminal behavior. Much of the CCJ literature is therefore focused on efforts to develop valid causal inferences. This section discusses some of the most prominent analytic methods used for studying cause and effect in CCJ research.

__Independent Variables and Outcomes__

CCJ researchers typically distinguish between independent variables and dependent or outcome variables. In general, researchers conceive of dependent or outcome variables as variation that depends on the independent or predictor variables. Thus, independent variables explain variation in dependent or outcome variables. Sometimes researchers use stronger language, suggesting that independent variables cause variation in dependent variables. The burden of proof for use of the word cause is very high, however, and many researchers are careful to qualify their results if they do not think this burden of proof has been met.

__Contingency Tables__

Contingency tables are a useful way of presenting frequency distributions for two or three categorical variables at the same time. For example, if a person wanted to create a measure of offending participation (either someone offends in a particular time period or he or she does not) and then compare the distribution of that variable for individuals who are employed and those who are not employed, a contingency table could be constructed to display this information. Several measures of the strength of the statistical association (analogous to a correlation coefficient) have been designed for contingency tables. Although contingency tables are not often used for studying cause–effect relationships (except in randomized experiments), they are quite useful for exploratory data analysis and foundational work for more elaborate statistical models.

__Measures of Association__

Researchers often want to summarize the strength of the statistical association between two variables. Correlation coefficients and other measures of association are used for this purpose. In general, measures of association are arrayed on a scale of 1 to 1 or 0 to 1, where 0 usually represents no association at all and 1 or 1 represents a perfect negative or positive association. Measures of association have been developed for categorical and quantitative variables. Some measures of association, such as the relative risk ratio and the odds ratio, are calibrated so that 1 implies no statistical association, whereas numbers close to zero and large positive numbers indicate strong association. Researchers often conduct tests of statistical significance to test the hypothesis of “no association” in the population.

__Chi-Square, Tests, and Analysis of Variance__

CCJ researchers are able to draw on a wide variety of tools for conducting tests of statistical significance. In a contingency table setting, researchers often are interested in testing the hypothesis that two categorical variables are statistically independent. The chi-square test of independence is frequently used for this purpose. Sometimes, a researcher will want to test the hypothesis that the mean of a continuous variable is the same for two populations. The independent samples t test is most often used to conduct this test. In addition, researchers may need to test the hypothesis that the mean of a continuous variable remains the same at two time points. In this setting, the paired samples t test will most likely be used. Finally, if a researcher wants to test the hypothesis that a continuous variable has the same mean in three or more populations, then analysis of variance will be used. There are many statistical tests for many types of problems. Although these are among the most common applications, many others are available for more complicated situations.

__Linear Regression__

Linear regression models are a class of statistical models summarizing the relationship between a quantitative or continuous outcome variable and one or more independent variables. Careful use of these models requires attention to a number of assumptions about the distribution of the outcome variable, the correctness of the model’s specification, and the independence of the observations in the analysis. If the assumptions underlying the model are valid, then the parameter estimates can provide useful information about the relationship between the independent variable or variables and the outcome variable.

__Regression for Qualitative and Counted Outcomes__

Many outcome variables in CCJ are not continuous or do not meet some of the distributional assumptions required for linear regression. Statistical models for these variables, therefore, do not fit well into the linear regression framework. Examples of this problem include dichotomous and event-count outcomes. For dichotomous outcomes, researchers often estimate logistic or probit regression models; for counted outcomes, specialized models for event counts are usually estimated (i.e., binomial, Poisson, negative binomial).

__Structural Equation Models__

CCJ researchers sometimes have well-developed ideas about the relationships between a complex system of independent and dependent variables. These ideas are usually based on theories or findings from previous empirical research. Structural equation models can be used to investigate whether the relationships between the variables in the system are in accord with the researcher’s predictions.

__Interrupted Time Series Analysis__

A time series analysis is based on the study of a particular cross-sectional unit (e.g., a community or city) over a sustained period of time. Over that period of time, the study takes repeated measurements of the phenomenon of interest (e.g., the number of gun homicides each month). Sometimes, an intervention occurs (e.g., the introduction of a new law restricting access to handguns) and the researcher has access to both the preintervention time series and the postintervention time series. These time series can be combined into a single interrupted time series analysis to study the effect of the intervention on the series. Researchers conducting interrupted time series analysis usually include both a series in which the intervention occurs and a series in which there is no intervention (a control series). If there is an apparent effect of the intervention in the interrupted time series analysis and the effect reflects a genuine causal effect, then there should be no corresponding change in the control series.

__Models for Hierarchical and Panel Data__

As discussed earlier (see the “Unit of Analysis” section), some data sets have more than one logical unit of analysis. For example, the National Longitudinal Survey of Youth follows the same individuals repeatedly over a sustained period of time (panel data). Other studies, such as the MTF study, sample schools and then sample multiple individuals within each school. A variety of modeling tools (i.e., fixed effect, random effect, hierarchical, and multilevel models) exist for working these kinds of data. An important feature of all of these tools is that they attend specifically to dependence within higher order units of analysis.

__Counterfactual Reasoning and Treatment Effects__

Increasingly, CCJ researchers are thinking about cause and effect in terms of counterfactual reasoning. Ultimately, this is an exercise in observing what actually occurs under a specific set of circumstances and then asking how things might have occurred differently if the circumstances had been different. The hypothetical aspect of the problem is a counterfactual, because it involves speculation about what might have occurred but actually did not occur. Counterfactual reasoning is particularly applicable to the problem of estimating treatment effects. For example, a researcher considers a group of people who received a particular treatment and observes their outcomes. What he would like to know (but cannot know for sure) is what outcomes these same people would have experienced if they had not received the treatment. The difference between the actual, observed outcome and the hypothetical outcome is the treatment effect. CCJ researchers usually look to the experience of a control group to estimate the hypothetical outcome. An important problem in CCJ research is the identification of appropriate control groups.

__Randomized Experiments__

A randomized experiment is a study in which individuals are randomly assigned to treatment or control groups prior to treatment. They provide a useful framework for estimating valid counterfactuals because random assignment to treatment and control conditions ensures that the groups are statistically comparable to each other prior to treatment. Thus, the experience of the control group provides a very convincing answer to the question of what would happen to the treatment group if the treatment group did not receive treatment.

__Natural Experiments and Instrumental Variable Estimators__

For a variety of reasons, randomized experiments are not possible in many instances, but sometimes conditions that closely approximate an experiment occur because of a key event or policy change. When researchers recognize these conditions, a natural experiment is possible—even when more conventional studies fail. Consider the problem of estimating the effect of police strength on crime rates. Estimating correlations and conventional regression models cannot help much with this problem. The critical ambiguity is that street crime almost certainly has an effect on police strength and that police strength almost certainly has some effect on street crime. Natural experiments can provide more convincing evidence. A recent study conducted in Washington, D.C., is illustrative (Klick & Tabarrok, 2005). It was based on the insight that changes in terror alert levels lead to meaningful changes in the presence of police on the street. The researchers examined what happened to crime rates when street-level police presence increased and decreased as terror alert levels changed. Researchers sometimes refer to natural experimentally based treatments as instrumental variable estimators, and they can provide a powerful method for estimating treatment effects when randomized experiments cannot be conducted.

__Matching__

Another approach to developing valid counterfactuals is to identify a group of cases that receive treatment and then identify another group of cases—the control group—that are similar to the treatment cases but do not receive treatment. To ensure that the treatment and control groups are similar, researchers match the groups on characteristics that are thought to be important. The direct matching approachguarantees that the treatment and control groups look alike on the matched characteristics. A problem is that the groups may look different from each other on characteristics that were not matched. Thus, in general, counterfactuals produced by the matching approach will not be as convincing as those produced by a randomized or natural experiment. However, in instances where experiments are not possible, direct matching designs can still provide convincing evidence about treatment effects. A generalization of the matching design involves matching on indexes based on combinations of variables. Propensity scores, which increasingly appear in the CCJ literature, are one such index. It can be shown that matching on a properly created index can lead to treatment and control groups that look like each other on many characteristics. It is likely that CCJ researchers will rely more and more heavily on matching designs and propensity scores to study treatment effects, in particular when randomized experiments are not possible.

## Conclusion

Some aspects of quantitative CCJ research have remained relatively constant throughout the field’s history. Some CCJ research problems are very much like problems studied in other fields, and some are quite different, yet there has always been a major emphasis on description and learning about how much crime is occurring and what populations are at highest risk of criminal involvement and victimization. Other aspects, such as repeatedly and systematically following the same individuals over time and rigorously measuring the effects of changing policies, are more recent developments. CCJ is an interdisciplinary field that relies on insights from sociology, psychology, economics, political science, and statistics as well as its own rapidly emerging traditions. One thing is certain: Analytic methods in the field will continue to evolve. It is critical that quantitative CCJ researchers monitor developments in their own field and stay well connected with developments in other allied fields to strengthen their efforts at descriptive and causal inference.