Offender Classification

Patricia M Harris. 21st Century Criminology: A Reference Handbook. Editor: J Mitchell Miller. 2009. Sage Publications.

Offender classification refers to a set of formal tools and practices used to establish supervision parameters and services for individuals under the control of correctional agencies. Of particular interest is the determination of the nature of risks posed by individual offenders and identification of their specific treatment needs.

The origins of classification, which is essentially the examination of how offenders differ from one another, can be traced to a typology presented by Cesare Lombroso in his 1876 work entitled Criminal Man. Development of structured instruments for ranking offenders according to risk began in the 1920s, when researchers such as Burgess (1928) explored the utility of statistical methods to distinguish between violators and non-violators among prisoners paroled from state prisons. Though numerous practitioners, criminologists, and forensic psychologists advanced classification research following this time, it was not until the 1980s that formal, structured classification became widespread practice in corrections. Three factors—the discovery that a relatively small proportion of offenders contributed disproportionately to the crime problem, successful inmate litigation calling for improved conditions of confinement in state prison systems, and heightened sensitivity of the public to new crimes committed by offenders released back into the community—accelerated the development and widespread adoption of objective instruments to classify offenders by risk and needs. Today, offender classification is practiced by local, state, and federal corrections agencies throughout the United States. There are now many instruments available for the classification of adult and juvenile offenders, and work on improving prediction of recidivism via statistical methods continues as a focus of agency and university research.

Varieties of Offender Classification

Risk/Needs Assessment

The assessment of risk and needs is the most common form of offender classification. This form may entail separate instruments, one for risk and at least one for needs. Risk instruments, which resemble questionnaires, occur by interview or the offender’s self-report, according to the design of the tool in question. Choice of risk instrument also depends upon the correctional setting in which it is to be administered. Risk assessment serves different purposes, depending upon the correctional context in which it takes place. Risk assessment tools used by probation and parole typically forecast the offender’s risk of rearrest while under community supervision. Instruments employed by prison and jail authorities, on the other hand, predict a detainee’s or inmate’s adjustment to confinement. Jail risk assessment focuses on inmates’ risk of harm to other detainees and need for protective custody, to guide housing decisions as well as to prevent harm to self, particularly suicide. The objectives of prison classification differ depending upon whether external or internal classification is taking place. The purpose of external classification is to determine whether the inmate can be housed in a general population, and if so, at what level (minimum, medium, or maximum custody). The objective of internal classification is to assess the inmate’s risk in the facility to which he or she has been assigned. This determination subsequently affects where the inmate will be housed, with whom, and with what privileges (programs, work assignments, etc.).

The evolution of formal instruments measuring offender risk and needs occurred over four “generations” (Andrews, Bonta, & Wormith, 2006). First-generation instruments were characterized by reliance on professional judgment and experience, also referred to as clinical judgment. Introduction of the salient factor score by the U.S. Parole Commission (Hoffman & Beck, 1974) and the Wisconsin Classification System (Baird, 1979) in the 1970s ushered in a second generation of risk assessments. Also known as actuarial tools, second-generation instruments are empirically derived tools whose items are chosen for their statistical correlations with recidivism.

Reliance on criminal history items (e.g., age at first adjudication, number of prior convictions, number of prior revocations of community supervision) and items assessing other static (i.e., unalterable) offender characteristics are distinguishing features of second-generation instruments. To the extent measures other than criminal history (e.g., drug or alcohol use and association with criminal companions) are used, such items capture historic data only. That is, second-generation tools may assess substance abuse and nature of associates up until the offender’s encounter with the criminal justice system, but not the status of current use or associations.

Third-generation instruments preserve the evidencebased feature of second-generation tools, in that they rely on items exhibiting a statistical correlation with recidivism, but differ in the extent to which dynamic correlates of recidivism are used to assess offender risk. Examples of dynamic factors include use of leisure time, living arrangements, and ability to manage emotions. The use of dynamic measures allows instruments to be sensitive to changes in the offender circumstances, thus facilitating reassessments of risk at a later date. If all the information captured in a risk assessment tool consisted only of historical items, this would not be possible.

The Level of Service Inventory-Revised (LSI-R) is a third-generation risk assessment instrument. The LSI-R is a semistructured interview consisting of 54 items measuring criminal history, education and employment, financial status, family/marital relations, living accommodations, extent of criminal companions, alcohol/drug problems, mental health status, and attitudes toward correctional supervision (Andrews & Bonta, 2001). Thus, the LSI-R includes both risk and needs items within the same instrument, whereas the salient factor score and Wisconsin models require administration of additional instruments for the assessment of those same offender needs. In contrast to stand-alone needs assessments, the LSI-R includes only those needs shown by research to influence criminal behavior.

Fourth-generation instruments are referred to as systematic and comprehensive because they measure factors important to treatment effectiveness. Fourth-generation tools embody all principles of effective correctional treatment by combining assessment of both static and dynamic predictors of risk with a case management component for setting treatment goals. The case management plan directs the officer’s focus on the offender’s criminogenic needs and responsivity factors (Andrews et al., 2006).

The evolution of risk/needs classification tools corresponds with research discoveries about the best predictors of offender recidivism (Gendreau, Little, & Goggin, 1996) as well as the nature of effective correctional treatments (Andrews, Zinger, et al., 1990). These include the principles of risk, needs, and responsivity. The principle of risk is that treatment is most effective on high-risk offenders. The needs principle holds that effective treatments will be those that take dynamic, criminogenic needs—those most likely to cause continued lawbreaking—into account. The responsivity principle indicates that interventions should take offender characteristics such as learning style, motivation, gender, and ethnicity into account when matching subjects to programs. According to a meta-analysis of 154 evaluations of adult and juvenile correctional treatments, Andrews, Bonta, and Hoge (1990) discovered that effective programs were those that practiced all three principles. Of much additional importance, failure to observe the principles of effective correctional treatment elevates the risks posed by lowand moderate-risk offenders. Using a sample of 7,366 offenders assigned to either community supervision or residential facilities, Lowencamp and Latessa (2005) compared recidivism rates across different risk categories. They found that lowand low/moderate-risk offenders experienced higher recidivism rates when placed in residential treatment, compared to placement on community supervision alone.

While there is much empirical support for the supremacy of actuarial instruments over those favoring clinical or diagnostic judgment (see, e.g., Grove & Meehl, 1996; Monahan, 1981), the importance of dynamic variables for prediction accuracy is less certain. In a survey of metaanalytic studies of the capacity of instruments from each generation to predict general recidivism, Andrews et al. (2006) found that first-generation tools had an average predictive validity (r) estimate of .10; second-generation tools, .42; third-generation tools, between .33 and 40; and fourth-generation tools, .41. The larger the value of r, the greater the amount of error one would avoid in relying on the prediction instrument in question. Thus, best validity performances came from instruments that emphasized static factors. In fact, removal of dynamic variables such as employment and living arrangements improved the predictive accuracy of the salient factor score, the assessment tool used by the U.S. Parole Commission (Hoffman, 1994). While the predictive accuracy of thirdand fourth-generation tools does not yet outpace that of the second-generation tools, the later instruments are superior for their ability to enlighten the user about criminogenic needs and responsivity factors crucial for achieving successful case outcome.

Instruments such as the salient factor score, LSI-R, and the Wisconsin model were all designed to predict general recidivism. Also available are assessment tools that predict particular forms of recidivism such as the risk of new violence, or even more specifically, risk of new sexual or domestic violence. Examples include the Violence Risk Appraisal Guide (VRAG), the Static-99 (for sex offenders), the MSOST-R (for sex offenders), the Spousal Assault Risk Assessment (SARA), and the Domestic Violence Risk Appraisal Guide (DVRAG).

Diagnostic Instruments

Using diagnostic tools, personnel can verify or rule out various mental disorders, as defined and described in the Diagnostic and Statistical Manual of Mental Disorders IV-Text Revision (DSM IV-TR) (American Psychiatric Association, 2000). Though in some cases, forensic staff will rely on a structured interview to confirm or deny the presence of mental disorders as identified in the DSM IV-TR, more typically they will administer formal tests that assess subjects against DSM criteria. For example, the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) is a commonly used mental health assessment for determining whether the subject harbors particular mood or personality disorders included in the DSM IV-TR. The MMPI-2 is a self-appraisal inventory that uses 567 true/false questions to measure subjects’ scores on 10 scales (hypochondriasis, depression, paranoia, psychoasthenia, schizophrenia, masculinity/femininity, hysteria, psychopathic deviance, social introversion, and hypomania) (Butcher et al., 2001). Use of the MMPI-2 is common in the classification of inmates but is used more sparingly, such as to confirm mental illness where there are other indicators of the same, in populations of individuals under community supervision. Because the MMPI-2 must be scored by a clinician, it is too costly for broad application where the disorders being measured are not widespread.

Some mental disorders, such as alcohol and drug addiction, are common in offender populations. Instruments such as the Substance Abuse Subtle Screening Inventory (SASSI), Addiction Severity Index (ASI), and the Drug Abuse Screening Test (DAST) experience widespread use in both community and institutional corrections. In contrast with the MMPI-2, instruments for the measurement of substance abuse are easily administered and scored by nonclinical staff.

Personality Inventories

Unlike diagnostic tools, which were developed for the population as a whole, most personality inventories used in offender classification were designed solely for use with offenders. The presumption behind use of personality inventories in correctional contexts is that different personality types require diverse officer interaction styles, supervision intensity, and treatment. Though they are not intended to be risk assessment tools, some personality inventories provide useful predictions of recidivism and institutional adjustment. Because they tend to take longer to administer and score relative to risk/needs instruments, personality inventories are usually too costly for use with probation and parole populations. Typical applications include prisons and other residential settings.

Of the many personality inventories currently available for use in classification, commonly used instruments focus on measures of commitment to criminal values and lifestyle. The Psychopathy Checklist-Revised (PCL-R), for example, is a highly regarded instrument for identifying the most antisocial of offenders. Psychopathy is a formal construct consisting of 20 distinct interpersonal, affective, and behavioral characteristics, including glibness/superficial charm, grandiose sense of self-worth, need for stimulation, pathological lying, conning, lack of remorse or guilt, shallow affect, lack of empathy, parasitic lifestyle, poor behavioral controls, promiscuous sexual behavior, early behavioral problems, lack of realistic long-term goals, impulsivity, irresponsibility, failure to accept responsibility for one’s own actions, many short-term marital relationships, juvenile delinquency, revocation of conditional release, and criminal versatility. Upon completion of a semi-structured 3-hour interview, a clinician assigns each item a score of 0, 1, or 2. A score of 0 means the factor is not present, and a score of 2 means the factor is strongly present. A total score of 25 to 30 indicates that the subject is a psychopath.

Though never intended as a risk assessment tool, the PCL-R is frequently used as one. Research finds that psychopaths are more likely to reoffend following release from prison than nonpsychopaths; that psychopathic sex offenders are far more likely to reoffend, including nonsexually, than nonpsychopathic sex offenders; and that treated psychopaths are more likely to reoffend than nontreated psychopaths (Hare, 2003).

The Psychological Inventory of Criminal Thinking Styles (PICTS) is an 80-item, self-report instrument that measures the offender on each of eight dimensions supportive of criminal lifestyles. These include mollification, cutoff, entitlement, power orientation, sentimentality, superoptimism, cognitive indolence, and discontinuity. The instrument, which can be easily scored by nonclinical staff, shows promise in predicting institutional adjustment, recidivism, and program completion. Because it measures dynamic factors, the PICTS can be used to measure change in criminal attitudes over time, through repeated testing (Walters, 2002).


Typologies, also referred to as taxonomies, sort offenders into mutually exclusive categories. Typology-based classification tools are available for both communitysupervised and incarcerated populations. The value of typologies lies in their capacity to communicate information about responsivity, one of the principles of effective correctional treatment.

The Client Management Classification (CMC) component of the Wisconsin Classification System, used by numerous community corrections agencies, is an interviewbased tool leading to the assignment of subjects into one of five “strategy groups” (casework control, environmental structure, limit setting, selective intervention-treatment, and selective intervention-situational). The strategy group serves as a guide to the offender’s criminogenic needs, motivation to change, amenability to supervision, treatment referrals, and recommended manner of interaction between officer and client (Lerner, Arling, & Baird, 1986). The Prisoner Management Classification System (PMC) is a prison/residential setting version of the CMC.

Quay’s Adult Internal Management System (AIMS) was designed to inform housing and program decisions in institutional settings. The AIMS consists of two assessments, the Life History Checklist and the Correctional Adjustment Checklist. The latter is completed by correctional staff after observing the inmate in custody for 2 to 4 weeks. Upon completion of the assessment, inmates are sorted by personality types and then placed into one of three categories: Heavies (prone to violence, manipulation, and predatory behavior), Lights (prone to anxiety and victimization), and Moderates (reliable and hardworking) (Levinson, 1988).

Edwin Megargee (Megargee, Carbonell, Bohn, & Sliger, 2001) employed the MMPI as the basis for a typology sorting offenders into 10 categories. The categories, whose names (e.g., Abel, Item, Delta, How) were assigned randomly and have no intuitive meaning, reflect patterns of MMPI scale responses. While Megargee’s typology has been subjected to extensive research regarding its ability to identify distinct groups of offenders with different behavioral characteristics, particularly with respect to prison adjustment, there is less evidence regarding its successful application to guiding treatment planning.

Classification of Juvenile Offenders

Classification is an important facet in the supervision of juvenile offenders, serving the same objectives—risk management and treatment planning—as it does for adult offenders. However, development of actuarial tools for juveniles has lagged behind evolution of classification tools for adult offenders. There are fewer standardized instruments for the assessment of juvenile risk and needs, and less research affirming their reliability and validity, than exists to date for adults (Hoge, 2002). Promising tools include juvenile-oriented versions of the LSI and PCL, referred to respectively as the Youth Level of Service/Case Management Inventory (YLS/CMI) and the Psychopathy Checklist-Youth Version (PCL-YV). Also available are instruments that measure specific risks, though these should be used cautiously. For example, the creators of the Juvenile Sex Offender Protocol-II (J SOAP-II) warn that adolescent populations are inherently unstable and require frequent reassessment (Prentky & Righthand, 2003).

The Jesness Inventory-Revised (JI-R) is a widely used personality assessment tool for juvenile offenders. Introduced in the 1960s, the Jesness Inventory is a 160-item, self-report questionnaire that measures the offender on each of 11 personality, 9 subtype, and 2 DSM IV-TR subscales. Personality scales include social maladjustment, value orientation, immaturity, autism, alienation, manifest aggression, withdrawal-depression, social anxiety, repression, denial, and asocial index. Subtype scales are based on integration theory, formerly known as the I-Level system. Subtype scales reflect increasing levels of perceptual complexity (i.e., the extent to which the offender views the world as threatening or supportive) and interpersonal maturity. DSM IV-TR subscales facilitate diagnosis of juveniles with conduct disorder and oppositional defiant disorder. Preferably, the instrument is administered and scored by a clinician (Jesness, 2003).

Megargee’s MMPI-based typology is also available for juveniles, using a version of the MMPI that was normed on adolescents (MMPI-A). Also available is a version of the CMC for juveniles, called the Strategies for Juvenile Supervision (SJS).

Reliability and Validity of Classification Instruments

Before any classification instrument can be implemented, it is necessary to confirm its validity and reliability. Interview-based classification tools are deemed reliable if different raters assessing the same offender arrive at the same result. This is referred to as interrater reliability. Instruments based on self-reports are deemed reliable if the same offender offers similar answers if assessed more than once, barring the passage of time sufficient to cause legitimate changes in responses. This is called test-retest reliability.

An instrument is judged to be valid if it really measures what it purports to measure. There are numerous ways of assessing validity, depending upon the classification instrument in question. Personality inventories and typologies are typically examined for construct validity, which uses quantitative means to measure the viability of their various scales or factors.

With respect to risk assessments, the question of predictive accuracy is a key concern. Different statistics are available for summarizing any one tool’s accuracy; currently favored is the area under the curve (AUC) statistic. It is popular because its utility does not depend upon the base rate of the outcome in question (i.e., proportion of offenders who recidivate). That is, it is equally meaningful whether the behavior being predicted is rare or fairly common (Harris & Rice, 2007). The AUC communicates how well the assessment tool improves over a chance prediction (such as determined by the toss of a coin). Higher AUC values indicate greater prediction accuracy and improvement over chance. For example, a value of .50 would inform the user that the instrument does not predict better than chance, whereas a value of .80 indicates substantial improvement over chance.

Researchers can use the AUC not just to assess the performance of any one instrument but also to compare different instruments. The best comparisons are those that report results of administration of various tools on the same population of offenders, followed up for the same period of time. Such comparisons are not typical, however. One exception is a study by Barbaree, Seto, Langton, and Peacock (2001), who evaluated the accuracy of six assessment instruments designed for the prediction of general and/or sexual violence, using a sample of 215 sex offenders released from prison and followed up on community supervision for an average of 4.5 years. Outcomes of interest included any new recidivism (measured as either charges or convictions), any serious recidivism, and new sexual offense recidivism. Barbaree and colleagues found that some instruments were good at predicting all three outcomes; none was superior at predicting all three; and of much importance, instruments that were easy to use and score (such as the RRASOR and Static-99) provided good predictions of new sexual violence, the least frequent of the outcomes studied. This kind of information is extremely valuable to agencies seeking to invest in a specific kind of classification instrument when there are several or more to choose from.

Reliability and validity are necessary but not sufficient conditions in the selection of assessment tools. Agencies usually need to consider how much time they can devote to each offender, and the demands imposed by particular instruments on interviewers’ skill sets. Thus while “multitasking” instruments such as the VRAG perform well with respect to the prediction of both new violence and new sexual violence, the high offender caseloads faced by community corrections agencies and baccalaureate status of the typical probation or parole officer make instruments such as the RRASOR and the Static-99, which have relatively few items and do not require much in the way of clinical skills to score, very attractive tools for assessing sex offender risk.

Issues in Classification Implementation and Practice

Classification Protocols

Protocols for offender classification depend upon choice of assessment instruments. Some tools involve interviews by staff, and others depend upon the offender’s selfappraisal. In either case, the instrument will be scored by correctional personnel.

Training is usually required before an individual can administer and score a classification tool. Depending upon the instrument, training may be as brief as 1 or 2 days. Others may require a week-long training. Still others, and particularly most personality inventories, require that the interviewer have an advanced degree or prior experience leading to the accumulation of clinical skills and judgment, in addition to specific training in the application of the instrument itself.

The extent of training and experience an individual has had in the administration of particular tools can have a positive impact on their reliability and validity. For example, Flores, Lowencamp, Holsinger, and Latessa (2006) found that the relationship between LSI-R score and supervision outcome was strongest for agencies whose officers underwent training in the use of the instrument and in agencies that had used the LSI for 3 years or more.

Interpretation of Risk Scores

After a risk assessment has been completed, its administrator totals the scores for each risk item. Each risk total is associated with particular probability of failure among persons having the same or higher score; the score itself is not a probability. Thus, an offender whose risk score is twice as high as another offender’s is not twice as likely to be rearrested. Increasing risk scores represent increasing probabilities of failure, and so should be used merely to rank clients as to risk of the outcome of interest. In order to determine the actual probability associated with a particular score, corrections managers must keep up with statistics regarding the failure rates of persons with each score.

Determination of Offender Risk Levels

Typically, ranges of risk assessment scores are combined into risk levels. For example, offenders scoring 0 to 10 might be placed in a low-risk category, those scoring 11 to 20 in a medium-risk category, and offenders scoring 21 to 30 in a high-risk category. The choice of which ranges of scores should be regarded as indicators of high, medium, and low risk classifications is an important one. Cutoffs delineating each group should take resources into account. In other words, what percentage of clients can an agency reasonably devote treatment and supervision resources to? Not all offenders, or even a large minority, can be treated as high-risk offenders. Devoting more supervision resources to highest-risk cases, even if it is a minority of cases, will result in more crimes prevented than less supervision spread across many cases (Clear & Gallagher, 1985).

Misuse of Classification Information

Effective use of risk instruments requires users to exercise faith in the overall risk score rather than yield to the influence of explicit descriptions of the offender’s prior criminal activities gleaned while carrying out the assessment interview. However, it can be difficult for classification personnel to ignore unsavory features of an offender’s criminal or social history, even when the outcome of the risk assessment instrument indicates that the client is a low risk. Research indicates that these biases are common in risk decision making (see, e.g., Hilton, Harris, Rawson, & Beach, 2005), and so practitioners must make an overt choice to side with the recommendations indicated by the assessment results.


Overclassification refers to the channeling of relatively low-risk offenders to unnecessarily secure supervision levels or institutional placements. Overclassification can result from failure to administer reliable and valid risk assessment tools, but also from failure to base supervision decision making on the results of an instrument when a good one is administered. Overclassification has serious consequences, namely, increased risk by overclassified offenders. Overclassified offenders experience a greater likelihood of becoming higher-risk offenders for various reasons, not the least of which may be increased exposure to other true high-risk offenders and diminished opportunities, such as employment, as liberties are curtailed (Clements, 1982).

Agencies promote overclassification when their risk assessment instruments include items that lack predictive validity. The decision to rely solely on statistically justified assessment items is not always easy for an agency to accept when omitted factors exhibit strong face validity with the outcome in question. Items have face validity if it seems they should affect the behavior being predicted. For example, while it would appear that a history of past escapes and severity of current offense exert an influence on a prison inmate’s risk of institutional misconduct, in fact, these characteristics are not valid predictors and including them will degrade the instrument’s predictions of inmate behavior (Austin, 2003).

What happens when valid and reliable offender classification instruments are not used? Without classification, a greater number of mistakes are made in efforts to identify true high-risk offenders. There is an increased cost of corrections. Corrections becomes more expensive when decision makers erroneously believe they need additional highsecurity prisons in the wake of overclassification of low-risk offenders, and when underclassified higher-risk offenders are mistakenly assigned to community supervision or other minimally secure settings.

Overclassification may be difficult to avoid with respect to prisons and jails. For example, the majority of jails in use today contain mainly maximum-security housing, which undermines meaningful custody classification. In many jails, inmate segregation according to gender, age (juvenile or adult), and detention status (sentenced or unsentenced) takes precedence over classification according to risk of violence or need for mental health care. As a result, jail administrators are unable to take advantage of many gains in offender classification that are available to other corrections agencies (Austin, 1998; Brennan & Austin, 1997).

Risk Prediction Errors

Risk prediction instruments, even those that are actuarial, are not perfect. One factor influencing the capacity to accurately predict behavior is called the base rate, the frequency of the outcome of interest (e.g., arrest, conviction, incarceration) in the sample being used to create the instrument. The more infrequent a behavior, the greater the difficulty in successfully predicting it. Thus, researchers are more successful predicting general recidivism than specific recidivism, such as particular violent acts.

All prediction tools produce both correct and incorrect predictions, also referred to as prediction errors. There are two possible kinds of errors: false positives and false negatives. False positives refer to incorrect predictions that offenders will commit new crimes. False positive errors are committed when individuals who would not have committed new crimes are subjected to secure settings or restrictive community supervision. False negatives refer to incorrect predictions that offenders will not commit new crimes. False negative errors occur when incarceration or restrictive conditions of supervision are not imposed on offenders who will commit new crimes. The terms true and false refer to the accuracy of the prediction. The terms negative and positive refer to the content of the prediction, that is, exhibiting a behavior (positive) or not exhibiting it (negative).

Because prediction errors with the current state of knowledge cannot be eliminated, the best to hope for is that their frequency can be reduced. With respect to any particular instrument, however, efforts to reduce one error type result in an increase in the other. To illustrate, suppose it is known from prior studies that 20% of felony offenders on community supervision can be expected to commit new felonies over the next 3 years. If there was a prediction instrument that was 100% accurate, identifying all of these recidivists-to-be would be very easy—the offenders could be rank-order classified by risk score and a cutoff established delineating the highest-scoring 20%. But when risk instruments are imperfect, setting the cutoff at the 20th percentile of scores means that some of the offenders designated as high risk will be false positives, and that some of the offenders whose scores fell outside the 20th percentile will be false negatives. By setting a more inclusive cutoff—for example, at the 30th percentile—the instrument will capture more true positives while at the same time increasing false positives.

What kinds of incorrect predictions most likely to occur depend upon the values attached to each kind of error. If those conducting the assessment are risk averse, meaning that they emphasize public safety and strive to avoid new victimizations, they are likely to set low cutoffs delineating highfrom low-risk offenders, resulting in a greater number of offenders (justly or unjustly) falling into the category judged high risk. If the assessors value justice for the individual being sanctioned in an effort to avoid unnecessarily restrictive supervision or confinement, they would establish more stringent cutoffs, resulting in fewer offenders judged threats to the community. Which type of error is worse? That depends upon policymakers’ value frameworks.

Even though the best prediction instruments are subject to error, what is important to keep in mind is that good instruments yield predictions that improve upon chance or an officer’s subjective judgment. Using a valid and reliable risk assessment tool helps to avoid the higher rate of error associated with using no actuarial instrument at all.


On occasion, it may be necessary for officers and caseworkers to disregard the risk score generated by even the highest-quality assessment instruments and process. For example, overrides may be appropriate when the subject of the assessment exhibits a lengthy history of serious criminal activity but encounters his first arrest relatively late in that career. The gainfully employed, vocationally skilled, drugand alcohol-avoidant, married serial killer who commits several homicides before experiencing even one arrest will achieve a low risk score on most general risk instruments. Obviously, following the recommendation of the risk tool would be a very bad decision.

Positive institutional adjustment is a factor that might warrant a lower custody level. History of committing rape or membership in a gang might warrant higher custody levels (Austin, 1998). Whatever the reason for the override, the total rate of overrides should remain low. Frequent overrides are an indication that the classification system is not working or is not understood by users.

Situational and Environmental Influences on Offender Behaviors

Prediction accuracy is difficult to achieve in volatile environments. Prisons, for example, harbor a variety of environmental and situational features that can give rise to circumstances favorable to violence, independent of the violence-proneness of any particular inmate. Encounters with gang members, sudden and unpopular decisions by prison administration, and even a facility’s architecture create opportunities for misconduct above and beyond the prediction presented by the formal risk assessment process (Austin & McGinnis, 2004).

Neighborhoods are also important influences on an offender’s risk. Neighborhoods vary widely with respect to the employment and housing opportunities they offer to reentering offenders, and they vary with respect to criminogenic influences such as crime rates, poverty, and residential instability. Using neighborhood-level census data in combination with data on a sample of offenders on community supervision in the Portland, Oregon, area, Kubrin and Stewart (2006) confirmed that recidivism rates were higher among offenders in disadvantaged neighborhoods, even after taking individual-level characteristics into account.

Controversies in Offender Classification

Universality of Risk Assessment Tools

An ongoing topic in classification research is the question of whether risk instruments developed in one jurisdiction or for a particular correctional setting are transferrable to other offender populations or contexts. When the Wisconsin model was first developed, it was rapidly adopted by probation agencies nationwide. Yet, a close look at the impact of applying the tool on a sample of probationers in New York City found that the instrument did not predict recidivism as capably as anticipated. Six of the items failed to exhibit any correlation with outcome, including drug and alcohol use, prior convictions, and prior revocations. Efforts to reweight instrument items to produce a better prediction were only marginally productive (K. N. Wright, Clear, & Dickson, 1984). Questions of transferability continue to haunt other risk assessment tools, such as the LSI-R, as well (see, e.g., Dowdy, Lacy, & Unnithan, 2002). While researchers have made great strides in producing more universally applicable classification tools, periodic evaluation of the impact of particular instruments on an agency’s ability to successfully identify and supervise high-risk offenders is always a worthwhile endeavor.

Applicability Across Gender

Though it is the practice of most jurisdictions in the United States to use the same classification instruments on male and female offenders, researchers disagree about the applicability of “gender-neutral” tools to female offenders. Commonly used risk assessment tools are prone to overestimate the risks posed by female offenders, whose role in violent criminal activities is frequently limited to that of accomplices to male offenders. Where females take the lead in violent activity, their crimes tend to occur within the context of long-term relationships and so do not present risks to the public. Factors such as seriousness of current offense, use of violence, and substance abuse do not predict adjustment of female offenders to custody settings, though these factors perform better for males (Brennan & Austin, 1997). Research indicates that because developmental pathways to crime are different for women, other variables better predict the behaviors of female inmates, among them, marital status, family structure of the childhood home, child abuse, and reliance on public assistance, to name a few (Bloom, Owen, & Covington, 2003; Hardyman & Van Voorhis, 2004).

Applicability to Different Cultures and Races

Ethnicity is a less well-understood variable in offender classification. On the one hand is the question of whether individual factors carry the same weight in prediction of new criminal behavior across different groups; on the other is the question of whether and to what extent assessment protocols should take culture into account to ensure a reliable and valid result. Whiteacre’s (2006) research on the LSI-R, for example, revealed that the instrument led to a higher rate of classification errors for African Americans compared with whites and Hispanics when particular cutoff scores were used. Severson and Duclos (2005) point out that in the aggregate, American Indians are less open to interviewers’ questions about mental and physical health, and use of alcohol and drugs, compared with other groups. The prominence of the narrative style in American Indian culture and its embrace of mental illness call for modification of both assessment items and protocols. Others indicate that the success of correctional practices, generally, relies on practitioners’ appreciation for the role played by proximity, paralanguage, density of language, history of discrimination, and other culture-specific variables (Umbreit & Coates, 2000).


Classification is the foundation of effective correctional supervision and treatment. Numerous tools are available for classification, such that it is possible for personnel of varying skill backgrounds to identify the risk and treatment needs of offenders in a wide variety of correctional settings. The most common form of assessment is classification for risk, possibly due to the stronger emphasis on crime control relative to rehabilitation aims of the criminal justice process. Risk assessment is also the most controversial form of classification, inasmuch as its errors are more visible and consequential to the public (in the case of false negatives) and the offender (in the case of false positives).

Many advances have taken place with respect to measuring offender risk and needs. Enhanced statistical methods and discoveries regarding the correlates of criminal offending have allowed researchers to make predictions that are great improvements over clinical judgments. However, new arenas for prediction await these advances. For example, classification takes place after sentencing, not before, though clearly better sentencing decisions could be made in the presence of classification results. The determination of which sex offenders should be eligible for community notification takes place without the benefit of state-of-the-art prediction tools. Similarly, decision making regarding which sex offenders should be subjected to civil commitment resembles clinical judgments from an earlier era. There is a need for application of classification research beyond traditional contexts.