Measurement and Research in Criminology; Program Evaluation

Jodi Lane. 21st Century Criminology: A Reference Handbook. Editor: J Mitchell Miller. 2009. Sage Publications.

Program evaluation is one of the most important methods for assessing the process and impact of criminal justice programs and policies. In general, in program evaluations, researchers (e.g., university professors or people who work for private research organizations) work with practitioners (e.g., police or probation officers) to determine the effectiveness of programs designed to change offenders or people who might become offenders. This latter group of clients might be considered those at risk of committing crime; often, they are juveniles, but sometimes they are adults. In program evaluation researchers collect information, or data, about the program, staff, and clients; analyze it; and then report back to the program leaders and other important stakeholders regarding the results. Stakeholders might include policymakers, people who designed or developed the program, those who funded it, or interested community members. Sometimes evaluation results are then used by policymakers and practitioners to inform changes in their programs and policies.

Program evaluations have been increasingly important to criminal justice practice in the last few decades, in part because of limited resources. Criminal justice programs cost a lot of money (often millions of dollars), and decision makers, such as legislators, governors, county councils, or funding agencies such as the National Institute of Justice (NIJ) and the Office of Juvenile Justice and Delinquency Prevention, often have to make decisions about which programs to fund or continue to fund. There are typically many more programs requesting money than there is money to fund them. One way to distinguish between programs is to use the scientific evidence provided in program evaluations to determine which programs are most effective at reaching their goals. For example, legislators might want to fund programs that are best at reducing crime, decreasing recidivism, or preventing youth from getting into trouble. It is now typical for funding organizations to require agencies to include an evaluation component in their program plan when asking for funding, especially if there is not hard evidence of success already. Evaluations conducted to determine whether programs are performing to expectations are called summative evaluations (Rossi, Lipsey, & Freeman, 2004, p. 36).

Evaluations are also useful in helping people involved in administering the program learn more about it. Although it might seem that those involved in administering the program would know the most about it, they often do not know everything that might be useful. Evaluators can often provide some of this information, such as how well they are meeting the goals of the program (e.g., preventing recidivism or increasing school attendance), whether the program is serving the people it is intended to serve, and whether staff members are providing the program to their clients in the way they are supposed to. Program evaluations that are conducted specifically to help programs improve are called formative evaluations (Rossi et al., 2004, p. 34).

Staff members who do the daily work in programs sometimes truly believe that they are making a difference in the lives of their clients but have no scientific evidence on which to base their opinions. Instead, they use their own personal experience with individual cases to judge the impact of their actions. For example, a staff member might remember that one of his clients stopped committing crime, went to college, and started a successful career, or he might remember that he spent many hours with a youth and that the girl said it changed her life. Individual instances of success can make staff members feel very good about doing their jobs and can convince them that the program is working. This might be enough information for them to want to continue implementing the program. Some might believe that changing one life is worth all the effort. Program decision makers, however, are often more interested in the effects of their program on clients more generally. They may wonder about the percentages of clients who are remaining crime or drug free, staying in school, maintaining jobs, and so on, and how their program compares to other programs on these measures. Program evaluations can answer these types of questions.

The results of program evaluations also help other jurisdictions make decisions about what types of programs to implement. For example, a county probation department might be interested in making program changes to help prevent its clients from progressing into juvenile correctional facilities (being locked up). The administration might look to other jurisdictions to determine what programs have been implemented for juveniles on probation, how well they worked, and for whom they worked. They might also look for specific details about how to implement the programs (e.g., what to include or not include as treatment components, the types of staff needed, or problems to anticipate when creating or putting a new program into operation).

Academic audiences, or university professors and their students, also learn from program evaluations. Professors teach students about which programs are effective and which are not. Just as important, members of the academic community contribute to the scientific understanding of criminal justice by conducting research on programs and policies to determine whether they work. Sometimes researchers conduct program evaluations because they want their unique research skills to apply to real-world issues—they want to make a difference. Researchers usually realize that the results they produce from their studies may have an effect on how staff, offenders, and others are treated in the programs they evaluate or on whether a program receives or continues to receive funding. Collaboration among researchers and policymakers and practitioners is critical, because policymakers and practitioners are rarely trained in the scientific method and often do not know where to look for scientific research. Researchers who work in the field with people in criminal and juvenile justice programs can make research accessible and understandable. Policymakers and practitioners, in turn, provide academics with key details about the programs and their daily activities to help with the research. Sometimes, although not always, researchers also go beyond the purely practical reasons for conducting evaluations to test academic theories of crime, crime prevention, or offender treatment to see whether they work when applied in real situations.

Important Types and Components of Program Evaluation

According to Rossi et al. (2004), authors of one of the most important and widely used textbooks on program evaluation, there are generally five types of program evaluation that might occur: (1) needs assessment, (2) assessment of program theory, (3) process evaluation, (4) impact evaluation, and (5) efficiency analysis. Often, researchers conduct more than one type of evaluation when examining a particular program. This chapter discusses each of these types of program evaluation in more detail, including what they are and the types of research methods they might include. This chapter also describes examples of important evaluation studies that have been conducted regarding juvenile and criminal justice, to illustrate the types of results that these studies produce.

Needs Assessment

According to Rossi et al. (2004), needs assessment involves the activities that researchers use to “systematically describe and diagnose social needs” (p. 102). In terms of program evaluation, researchers conducting the needs assessment aim to answer questions such as “Is this proposed (or current) program needed?” and “If so, what specific services would fill the need?” In the case of ongoing programs, questions might include “Are the provided services meeting the needs of the clients?” In truth, programs are often implemented without conducting a systematic needs assessment, because the individuals implementing the program simply believe the program is needed.

One might begin a needs assessment by examining the character and degree of the problem a program plans to address. First, the researchers might want to determine how the people who are developing and implementing the program define the problem (see Rossi et al., 2004). For example, if the program staff believe they need to address juvenile crime, researchers might determine what the staff mean by juvenile crime. Do they mean all juvenile crime, from minor offenses to very serious ones; violent crime only; property crime only; status offenses (i.e., those offenses that would not be crimes if the youth were adults); or crimes committed by particular offenders (e.g., gangs)? Researchers often can determine the answer by repeatedly asking probing questions to prompt staff to be more specific. For example, if staff say “juvenile crime,” the researchers can ask “What do you mean?” If the staff say, “You know, violent crime and property crime,” then the researcher can say “What do you mean by that?” Then, if the response is “Crimes like murder, burglary, and assault,” the researchers will know exactly what the staff hope to address with the program.

Another important component of needs assessment is helping program implementers articulate its target group; that is, whom the program intends to serve (Rossi et al., 2004). In other words, who will or should the program’s clients be? Targets are usually individuals with certain demographic and personal characteristics (e.g., at-risk youth, probationers, or prisoners), but sometimes the targets might be organizations (e.g., police or probation departments, or prisons). A program evaluator might help a program better define who the targets will be—for example, which at-risk youth or what type of police departments. It is very common for program evaluators to spend a lot of time talking to program developers and staff members to further define what might at first be discussed in general terms. Sometimes, different staff members disagree about the definition of their target group, even if they use the same words to describe the problem, and the evaluator can help them agree on a compromise definition. If the program developers decide that they want to help (or target) at-risk youth, a researcher might set out to determine how program developers and staff define at risk (do they want to focus on those who come from singleparent families, are skipping school, have been arrested for petty crimes, are gang members or associates, etc?). If the program is focusing on serving police departments (e.g., providing weapons training), the evaluator might ask the staff to further define the type of department on which they would like to focus. For example, would they like to focus on urban, suburban, or rural departments; large or small ones; or those that deal with a lot of crime or just a little? Does the program hope to target departments with certain types of policing styles or administrative structures? Do they want to focus on training all staff, only line officers, only administration, and so on?

Once the program has better defined the target group, evaluators might set out to determine the degree of the defined problem, or how much of it is out there and where it is located (see Rossi et al., 2004). So, if a program defines at-risk youth as those coming from single-parent families, evaluators might look to see whether they can determine how many single-parent families live in the community and whether they are concentrated in certain areas (e.g., particular neighborhoods or school districts) or are equally dispersed throughout the city. Learning where these families live will help the program determine where to deliver services. Evaluators might also try to determine the characteristics of people affected by the problem. For example, are there more girls or boys in single-parent families? What are the racial and ethnic origins of those living in these family structures? What are the average income levels of these families? What are the specific problems these people face that might lead the children to criminal behavior?

If the focus is on training police departments on the efficient use of weapons, the researcher might look to determine how many of them need this type of training module. For example, the evaluator might look to see whether relevant departments already have a training module on weapon use, how the content of the training in use compares with the content of that to be offered, whether some departments have more complaints about the misuse of weapons than others, and so on. Answering questions such as these will help determine whether a program is needed, who needs it, and what type of services would be most relevant to those who would need or use the program, as well as where the services might best be located (see Rossi et al., 2004).

Program Theory Assessment

Another type of evaluation that one might conduct involves determining whether the program itself is defined, thought out, and developed well (Rossi et al., 2004). Program theory is not the same as academic theory, although criminological or criminal justice theories are sometimes embedded in programs. Instead, program theory is typically practice driven, and program developers and staff often determine their theory from personal experience, training workshops or presentations they have attended, articles they have read in practitioner magazines, or other people to whom they have talked about the issues they face in their work. They often use laymen’s terms to discuss their program philosophy and thoughts, but well-versed evaluators can often find the propositions of academic theories in practitioners’ descriptions. For example, a probation officer might mention that children get into trouble when their parents or friends are also troublemakers and so might argue that the focus should be on removing children from negative peers or focusing on increasing parenting skills. After more discussion, an evaluator might see elements of learning theory or social control theory in the thoughts behind the program.

As Rossi et al. (2004) noted, “The program theory explains why the program does what it does and provides the rationale for expecting that doing so will achieve the desired results” (p. 134). To some people, it might seem obvious that programs would start with this sort of program description (or their reason for existing) before they started developing the particular service components they wanted to deliver. However, the program theory may or may not be well articulated, either in written or verbal form. Sometimes staff members go about their work without a clearly defined reasoning for doing so or at least without a reasoning that they can verbalize well. For example, staff may choose to provide services a certain way because they were trained to do so or because it is what they have been doing for a long time and is what they know and are comfortable providing. Sometimes the program theory is a good one, and sometimes it is not. Evaluators trying to assess program theory attempt first to understand what the program theory is and then to evaluate whether it is a good one (see Rossi et al., 2004).

Readers will notice an important theme emerging: that one of the most important tasks of evaluators throughout the process is asking questions. With regard to program theory, evaluators often look at program documents to see whether they can glean details from written information. Important documents might include annual program reports, notes from presentations given by staff, grant proposals written for the program, flyers or pamphlets, and so on. This is a good place to start, because it provides an evaluator with some context with which to ask more questions of the people involved in designing or implementing the program. Documents are rarely sufficient, though, because what is written down may or may not represent the reality for the program.

A lot of the ideas that guide programs are not written down but embedded in the design of program service components. Evaluators usually work to understand program theory by asking probing questions regarding ideas behind service plans and by visiting the program site to observe daily activities there. Example questions might include the following: (a) “Why do you do what you do?” (e.g., give youth individual counseling) (b) “How do you think doing this will change the clients’ behavior?” (e.g., Will it keep them from committing crime? Prevent angry outbursts?)” and (c) “Why do you think it will have this effect?” (e.g., Why will individual counseling keep youths from committing crime?). Researchers generally continue to ask probing questions until they feel they understand the rationale behind the program. When evaluators look for the assumptions regarding how a program changes clients’ behaviors (i.e., the cause and effect), they are examining the program’s impact theory (Rossi et al., 2004, p. 139). For example, program personnel might say something like “Allowing offenders to release their pain through individual counseling sessions will prevent them committing crime again.” Although this is not necessarily an uncommon sentiment expressed in social service programs, academic research indicates that this is an example of a faulty assumption, because less structured treatments (e.g., those that allow people to express their emotions) alone rarely prevent recidivism (Andrews et al., 1990). When evaluators examine assumptions about the process of service delivery (e.g., how it should work and what should be provided), they are looking at the program’s process theory (Rossi et al., 2004, p. 141).

In general, researchers ask questions of multiple people involved in the design and implementation of the program, in hopes of getting a more rounded, less biased view of what is driving service delivery. Yet talking to multiple people often illustrates important inconsistencies in key stakeholders’ views of why the program exists. Some of these stakeholders will have more power than others, and so their influence may be greater. Sometimes the evaluator may lead discussions among stakeholders to help them articulate their ideas and possibly compromise if there is disagreement. It is often the evaluator’s job to understand and synthesize this information into a coherent description of the program theory, if one is not articulated already. Then, it is often helpful for the evaluator to draft a statement of the program theory to provide to the people working there to see whether what is written adequately and appropriately describes the philosophy behind their program (Rossi et al., 2004).

Program missions, goals, and objectives are important pieces of program theory, and these become critical later in evaluating the process and impact of the program (see Rossi et al., 2004). The mission, when written, is a brief statement of the organization’s philosophy and reason for existence (Sylvia, Sylvia, & Gunn, 1997). Program goals include broad statements of anticipated accomplishments. A goal for a new prison program might be to reduce recidivism among program clients. Objectives follow from the goals and are more specific statements about the goals that include measurable criteria to determine whether those goals and the related objectives are met. For example, an objective related to the goal of reducing recidivism might be to reduce the number of client rearrests for violent crime (e.g., murder, rape, robbery, and assault) within 1 year of being released from prison. Another related objective might be to reduce the number of reincarcerations among clients within 3 years of prison release. Programs may have many objectives related to each goal, but again, goals and objectives may not be enumerated by the program in measurable ways until a program evaluator works with staff to articulate them in these terms.

After the program theory and the more specific goals and objectives are determined, evaluators who are assessing program theory then might attempt to determine whether the program theory is a good one. There are different issues to consider. First, does the program theory address the needs noted in the needs assessment (if one was conducted)? Second, are the goals and objectives well defined and reasonable? In other words, can this program realistically accomplish the goals and objectives it wants to accomplish? Some programs indicate goals that are out of their reach. For example, they might wish to reduce the local crime rate, when they might do better if they focused on reducing crime among the clients they serve. Third, are the program’s plans for making changes in their clients’ behavior logical based on scientific research? Also, do the program theory and service plans make sense given the resources available to the program (e.g., are the plans too lofty based on the amount of money, staff, services, etc., that are available to the program?) An evaluation of program theory is important, because if the theory is faulty, then it may be impossible for the program to be successful no matter how well people follow the implementation plans (see Rossi et al., 2004).

Process Evaluation

The third and one of the most common types of evaluation is that focused on measuring the process or implementation of a program. The goal of this type of evaluation is to determine how well the program is maintaining fidelity to the original program design and to describe what services are being delivered (see Rossi et al., 2004). This is a critically important piece of the puzzle when researchers are trying to understand the impact of a program. Consequently, it is often good to conduct a process evaluation and an impact evaluation together. Knowing what goes on in a program helps explain client results. For example, if people who participate in a program fail, is it because the program was delivered poorly or because there was something about the people involved that prevented the desired effects from occurring? (Rossi et al., 2004).

Usually, researchers conducting a process evaluation spend a lot of time observing the daily activities of the program and watching what people do, how they go about their jobs, how clients are treated, and what service components are actually being delivered and to whom. Researchers might sit in on treatment groups and write down what is being covered or taught and record attendance, for example. They might also observe normal activities of the program, being careful to see whether different staff members do things differently. This might involve “hanging out” and taking a lot of notes. Researchers might also interview staff and clients, asking detailed questions about their perceptions of what is delivered and how well. In addition, evaluators usually look at program records, such as case files, to determine what services are being delivered to each client (e.g., how often a probation officer is meeting with the client, to what services the client has been referred, or whether the client is being drug tested). Some of the questions evaluators might hope to answer by observing the program include the following: (a) Are the intended targets (e.g., at-risk youth) receiving the services, (b) are the services being delivered as designed, (c) does service delivery vary by site or staff member, (d) are clients happy with the services they receive, and (e) do staff believe they are delivering the services well (Rossi et al., 2004)?

Ideally, program monitoring occurs throughout the implementation of the program, because what goes on in a program can vary day by day. Sometimes events occur that make major changes in how programs work (e.g., people are hired, fired, or quit; program managers make different decisions about how people should deliver services; or referrals of clients to the program increase or decrease). Researchers prefer to have a solid understanding of the inner workings of programs over time, so they can describe what is really happening there rather than relying solely on what is supposed to be occurring. It is not uncommon for evaluators to find that program delivery is very different from the plan. For example, counselors might have a large binder of sequential lessons to use in their treatment groups, but they may not ever open the book and instead use their own knowledge and experience to guide what they discuss with clients. A process evaluation would attempt to find out the content the counselors were actually delivering and how it matched or varied from the curriculum in the binder as well as whether the people served in the groups were those who were supposed to be treated.

A major goal of process evaluation, then, is to systematically record and document the activities of the program (including what is delivered and to whom it is delivered) and to determine whether the activities match the original plan for service delivery (or the program as it is described publicly in documents and presentations). The results of a process evaluation may most importantly help explain program results (e.g., client impact). For example, if a program is not showing the anticipated results with clients, the process evaluation may point to problems with program implementation that could explain the lack of impact. Process evaluations may also be useful purely for managerial purposes. Program managers might learn that certain services are not being delivered at all, some are being delivered poorly, and others are being delivered well—or they might learn that their targets are not being serviced. For example, a program might target school dropouts but find that the clients using the services are all good students who excel in school. This knowledge could then be used to help program leaders make management decisions about how to change the program and client recruitment and help mold staff guidance and training. In addition, process evaluation can help provide important information about service delivery to key stakeholders, such as those that funded the program or community members. The evaluation might tell them whether the program is delivering what it promised (see Rossi et al, 2004).

Impact Evaluation

Probably the most common form of evaluation (published) in criminology and criminal justice is impact or outcome evaluation, which is designed to determine whether a program creates positive change in its clients. Even if a program addresses the needs of its clients, has a good program theory, and is implemented the way it was designed, it is unlikely to continue or be funded if it cannot demonstrate that it can reach its goals and objectives and helps its clients. An outcome is the observable characteristic or behavior of the client that the program is supposed to change (e.g., self-esteem, crime commission, drug use, school attendance, employment, peer choices, etc.). Impact or outcome evaluations are designed to determine whether the program actually did change these key client characteristics. Sometimes clients change while in the program but because of something other than the program itself (e.g., getting older, or internal motivation changes not addressed by the program; see Rossi et al., 2004).

The characteristics that are considered outcomes usually can be measured in people not served by the program also, which can help determine whether the change was caused by the program itself. Researchers might collect the same measures (e.g., criminal behavior) for people in a comparison group to see whether these individuals are different than those in the program. In experimental evaluations, which are considered the gold standard of evaluation methods, clients might be randomly assigned (in a method similar to flipping a coin) to either the program of interest (the experimental group) or to no treatment or a different treatment (a control group; see Boruch, 1997; Campbell & Stanley, 1963; Rossi et al., 2004). For example, the experimental group might be youth in a new juvenile prevention program, and the control group might be those sent to typical diversion programs or to no program at all. In the case of random assignment, the experimental and control groups as a whole should be similar on relevant characteristics at the outset (e.g., sex, race, prior criminal history), and any change demonstrated in the experimental group should be attributed to the program rather than something else (e.g., gender, different backgrounds, different living conditions, etc.). In theory, if the program is causing the change, the people in the control group should not change on the outcomes the program addresses but participants in the program should change (see Rossi et al., 2004). If both groups change, the evaluator must determine why. Sometimes random assignment to groups is not an option (e.g., people are already in prison or on probation), and so researchers use other methods to create study groups. For example, they might determine what the key background characteristics are (e.g., age, race, gender, criminal history, and family background) that might affect the outcomes of interest and match people in each group on these traits. One way to match is to make sure each group contains a certain percentage of people with each characteristic (e.g., half women and half men). Another, more difficult but better approach is to match individuals in each group. So, if one group has a 16-year-old black female who has committed a burglary, then the other group should include someone with those characteristics as well.

To determine which outcomes to measure, the evaluator usually looks to the program impact theory, mission, goals, and objectives (Rossi et al., 2004). For example, if goals of a program include reducing crime and increasing the number of prosocial (nondelinquent) friends among youth on probation, the evaluator might measure official records of arrest as well as interview youth about their participation in crime and the characteristics of their friends both before and after the program. Sometimes evaluators also measure outcomes of interest to program developers and stakeholders that are not written in the formal impact theory and goals of the program. For example, practitioners might indicate that they think a side effect of their program aimed at reducing delinquency among juveniles is that parents and their children have better relationships. An evaluator might measure the quality of these relationships either through observation or surveys of clients and their families. A third way to determine what outcomes to measure is to look at scientific studies of similar programs to see what they found to be relevant program impacts. Prior research might be especially instructive on the unintended consequences, or those the staff and program evaluators do not naturally anticipate (see Rossi et al., 2004). For example, a side effect of intensive supervision probation (ISP) might be that clients are caught more for technical violations of probation (e.g., dirty drug tests or failure to get a job) because they are watched more (Petersilia & Turner, 1993).

Once an evaluator determines what outcomes he or she wants to measure (e.g., arrests, school attendance), he or she must determine what measures to use to gauge these client characteristics. For example, the evaluator must determine which arrests to measure (e.g., all arrests, arrests for violent crime, property crime, etc., and what offenses are included in each type) and how to gather the information (e.g., through official records from the police or probation or through interviews with staff and clients). The method of measurement may depend on what is available to the researcher. If the researcher cannot gain access to the official records, he or she might be forced to rely on self-reports by clients. In contrast, if the clients are not easy to track down, the researcher may be forced to rely solely on official records. Often, evaluators attempt to gather information on outcomes from multiple sources, to ensure that they can get a better understanding of the true impact of the program.

Efficiency Analysis

The final major type of program evaluation is what Rossi et al. (2004) called efficiency analysis, but these are not as common in criminal justice as process and impact evaluations. This may be in part because evaluators are already very busy collecting data for the process and impact studies or because the researchers do not have the expertise to conduct efficiency evaluations well. In addition, sometimes decisions about the continuation of projects are made without cost being the primary consideration (e.g., programs may be stopped because the outcomes are not as good as expected, or they may be continued because someone in power likes the program). Although the costs and benefits can be estimated before a program is implemented, it is more common to analyze this information after the information is known (e.g., the program knows how much money was spent and what the effects of the program were; see Rossi et al., 2004).

Rossi et al. (2004) discussed two basic types of efficiency analyses: (1) cost-benefit analysis and (2) costeffectiveness analysis. The goal of this type of program evaluation is to determine whether the benefits gained from a program justify its costs. In a cost-benefit analysis, researchers put monetary values on the program effort (e.g., salaries, supplies, and other program costs) and on the effects (e.g., the medical and property replacement costs saved by preventing victimizations or corrections costs saved by preventing reincarceration in a prison facility). Often, costbenefit analyses are difficult to conduct in criminal justice, because it is hard to put monetary values on some program effects (e.g., how much does a victimization cost, and so how much was saved by preventing one?). In a costeffectiveness analysis, researchers examine the costs of reaching certain outcomes but do not put monetary value on the outcomes themselves. For example, if the goal is to sanction offenders, a cost-effectiveness analysis might look at how much it costs per year to put a person in prison versus on probation in the community. In this case, the researcher probably would not try to put a monetary value on the “prevented” behaviors that occurred because the person was on probation or in prison.

To conduct these types of evaluations well, researchers must be able to get information on all of the costs and benefits related to programs, which is not simple (Rossi et al., 2004). Some of these monetary values are readily available (e.g., how much salaries, supplies, or space cost), but some are not (e.g., what is the monetary value of a potential life saved by incarcerating a murderer?).

Some Important Evaluation Studies in Criminology and Criminal Justice

Evaluation studies are regularly conducted in the field of criminology and criminal justice, and there is not sufficient space here to discuss all of them. Consequently, for illustration purposes, this section discusses important examples of impact evaluations that have been done in policing, corrections, and juvenile justice.


One of the first well-known evaluations in policing was the evaluation of the Kansas City Preventive Patrol Experiment, conducted during the early 1970s. The evaluation examined the effects of three different types of police patrols on crime, fear of crime, and opinions of the police in Kansas City, Missouri. The three types of patrol were (1) reactive, in which officers only responded to calls; (2) proactive, in which police visibility in the community was at least doubled; and (3) the normal level of patrol for the area (one car per police beat). The findings indicated that there were no significant differences in crime, fear of crime, or attitudes toward police across the different types of patrol (Kelling, Pate, Dieckman, & Brown, 1974). This study prompted police departments across the United States to reconsider how they approached neighborhood patrol.

One of the most famous evaluations in policing was Sherman and Berk’s (1984) evaluation of the Minnesota domestic violence experiment, in which offenders who came into contact with the police for simple domestic assault were to be randomly assigned to either arrest, police advice, or an order to leave their residence for 8 hours. The authors reported that during the following 6 months, people who were arrested initially were less likely to get rearrested for violence than those who were just told to leave their place of residence. In addition, they noted that victims indicated that those who were arrested were less likely to be violent again than those who received advice. However, the random-assignment portion of the study, which would have ensured that the results were more trustworthy, was not implemented correctly by the police officers, who did not always follow the researchers’ directions (Sherman & Berk, 1984). Despite this and other methodological problems, this study led to important changes in policing, whereby many departments decided to mandate arrest in domestic violence cases (Binder & Meeker, 1988). Later researchers attempted to replicate the findings from this study, but their results were inconsistent, sometimes showing that arrest worked to reduce domestic violence and sometimes showing that it increased it or had no effect (see Garner, Fagan, & Maxwell, 1995). Because of inconsistent results over time, this study and its effects on policy continue to be regularly discussed.

A more recent evaluation of Boston’s Operation Ceasefire experiment conducted by Braga and colleagues examined the impact of a Boston problem-oriented policing approach focused on stopping illicit firearm traffickers and deterring gang activities in order to reduce youth violence. A collaborative effort among many criminal justice and social service agencies included several efforts to decrease violence, including more police presence and enforcement in neighborhoods, prosecutors focusing on gang cases, increased probation checks and terms and conditions of probation, serving outstanding arrest warrants, and providing services to gang members. The results showed that youth homicide, calls to the police about hearing gunshots, and firearm-related assaults decreased in the area (Braga & Kennedy, 2002; Braga, Kennedy, Waring, & Piehl, 2001). This study has been widely discussed in academic and practitioner circles during the last few years because it shows some positive effects of collaborative policing strategies.


There have been a number of evaluations in corrections in the past few decades. One of the most famous was conducted by Joan Petersilia and Susan Turner, who conducted a 14-site randomized experiment across nine states from 1985 through 1990 to evaluate the effects of ISP. Clients were randomly assigned to either ISP or a control group (they were on probation, parole, or in prison). Overall, the researchers found that ISP was associated with more technical violations and therefore more prison commitments, but there were no significant differences in arrests between those on ISP and in the control groups. They also found that ISP cost much more than routine probation because it involved more court appearances and returns to prison but that ISP cost less than prison itself. Finally, the results showed that ISP programs worked best when they provided both treatment and surveillance (Petersilia & Turner, 1993). This evaluation is one of the most respected, because it was the first to use random assignment in a field setting (in the criminal justice system) and because the results were interpreted by many scholars to mean that ISP did not work, which eventually led to some programs being shut down and others being focused more on juveniles, for whom treatment was still considered a major goal (Lane, 2006).

Another well-respected evaluation study was conducted by MacKenzie and colleagues and examined correctional boot camps in eight states. Correctional boot camps have military-style programs with exercise, hard work, and stern discipline. The researchers compared people who completed boot camp programs with those who completed other types, such as probation, prison, and parole. The authors concluded that the military components of boot camp programs themselves do not reduce recidivism; specifically, they found no impact on recidivism in four states, worse results in one state, and some positive impact in three states. They noted that boot camps that focused on therapeutic components, intensive supervision after release to the community, longer time in the boot camp, and prison-bound offenders seemed to have better results. In addition, they speculated that those programs might have done as well without the particular military-style components (MacKenzie, Brame, McDowall, & Souryal, 1995). A later summary of many studies on boot camps concluded that boot camps did not substantially reduce recidivism but did often show good results in terms of in-program client attitudes and behavior and reduced need for prison space. In addition, the study noted that, because of this research, some boot camps had closed and some had added more treatment components and more postrelease supervision (Parent, 2003).

The Amity Prison Therapeutic Community is another example of a correctional evaluation that has had academic and practical impact, probably in part because it used a strong design (random assignment) to assign inmates either to the therapeutic community or to no treatment. The therapeutic community was characterized by a three-part treatment. The first part focused on orientation, client assessment, and job assignments. The second part focused on inmates gaining more responsibility and participating in intensive treatment, such as individual and group counseling. The third part focused on preparing the inmates to reenter society. The researchers found that inmates who participated in the therapeutic community and had strong aftercare were less likely to return to prison after 1, 2 (Wexler, De Leon, Thomas, Kressel, & Peters, 1999), and 3 years (Wexler, Melnick, Lowe, & Peters, 1999). Those who participated in the therapeutic community but not necessarily in aftercare did better with recidivism at 1 and 2 years but not at 3 years. The studies also found that parolees who were reincarcerated but spent more time in treatment stayed out of prison longer (Wexler, De Leon, et al., 1999; Wexler, Melnick, et al., 1999).

Juvenile Justice

One of the most widely used programs in juvenile justice is Drug Abuse Resistance Education (D.A.R.E.), a school-based drug prevention program taught by police officers. The 17 lessons focus on teaching information (e.g., about specific drugs) and life skills (e.g., how to resist peer pressure) to elementary schoolers with the ultimate goal of preventing or delaying drug use. A number of individual evaluation studies have examined the effects of D.A.R.E. in different parts of the United States and have found mixed results. Some have found that there are some short-term positive impacts, but a meta-analysis (statistical comparison) of multiple studies showed that the effects of D.A.R.E. in the short term were not substantial (Ennett, Tobler, Ringwalt, & Flewelling, 1994). Studies of the longer-term impact of participating in D.A.R.E. also have shown that there are few long-term positive effects. For example, one random experiment that compared survey responses for youth who participated in D.A.R.E. versus those in a control group showed that after 1 year, D.A.R.E. had no significant effects on the use of alcohol or cigarettes or on school success or behaviors there (Rosenbaum, Flewelling, Bailey, Ringwalt, & Wilkinson, 1994). Another study that examined the long-term effects of D.A.R.E. after 6 years found that participation had no effect on the use of alcohol, cigarettes, or marijuana for 12th graders but did find some evidence that it was related to less use of hard drugs (e.g., cocaine) in some males (Dukes, Stein, & Ullman, 1997).

Gang Resistance Education and Training (GREAT) is another popular program focused on gang prevention but modeled after D.A.R.E. The police-led GREAT program focuses on teaching 7th graders nine life skills lessons, including information about how to resolve disputes, about cultural differences, and about the negative elements of gang membership in hopes of helping youth resist peer pressure. The larger goal is to reduce gang activity. The evaluators conducted posttest surveys (1 year after participation in GREAT) of youth who participated in GREAT as well as youth in comparison groups in 11 sites across the United States. Researchers asked students about their attitudes and behaviors, including criminal and gang behaviors. They found that students who participated in the program reported less gang affiliation and gang activity as well as many other positive attributes (e.g., fewer problematic friends, greater self esteem, and more attachment to their parents) during the year following participation in GREAT (Esbensen & Osgood, 1999).

A later longitudinal study that examined the effects of GREAT at 2-years and 4-years postprogram in six locations used better comparison groups (assigning some classrooms to get the program and some not to it), had pretests (surveys before the program), and surveyed youth multiple times during the follow-up period. After 2 years, the researchers found no significant differences between the groups (Esbensen, 2001). After 4 years, they again found some generally positive effects of GREAT compared with not getting the program (e.g., in attitudes about peers and the police), but these long-term results did not show less gang activity or criminal behavior over time (Esbensen, Osgood, Taylor, Peterson, & Freng, 2001). Because of these results, the GREAT curriculum was modified (Esbensen, Freng, Taylor, Peterson, & Osgood, 2002).


Evaluations are increasingly important to help guide monetary and administrative decisions regarding criminal justice programs. As resources continue to decline in tough budget years, evaluations will likely become even more important. Evaluators can provide critical information about the need for a program, the soundness of the program’s theory, and its implementation and impact, as well as whether the benefits outweigh the costs. Evaluators have provided this critical information to policymakers and practitioners for decades, and this chapter discussed some important examples of these studies.