Milagros Bravo. Handbook of Racial & Ethnic Minority Psychology. Editor: Guillermo Bernai. Sage Publications. 2003.
Research on ethnic minorities requires instrumentation that is sensitive to cultural variations. Psychological research on minorities usually involves comparisons among different ethnic groups. These comparisons demand instruments capable of identifying similar psychological phenomena in dissimilar groups. A challenge to the researcher who is studying diverse ethnic groups or cultures is to ensure that the assessment tools are equivalent across groups. Only by achieving this equivalence will it be possible to compare substantive results not confounded by instrumentation artifacts. Attaining this equivalence to study ethnic minorities in the United States sometimes requires translations into languages other than English, for example, to study Hispanics, Asian Americans, or Native Americans. For all ethnic groups, even for those whose native language is English, such as African Americans, cultural adaptations are necessary. This chapter describes conceptual and methodological challenges involved in the use of structured instruments in the study of ethnic minorities. It presents a comprehensive model to translate and adapt research instruments for its use in an ethnic group or culture other than that in which the instrument was originally developed. Within the model, techniques usually used to attain equivalency across languages and cultures are described. Some of the difficulties involved in the process are discussed and illustrated.
Conceptual Considerations
Culture serves as a web that structures human thought, emotion, and interaction (Canino & Guarnaccia, 1997). It is a dynamic process in which social transformations, social conflicts, power relationships, and migrations affect views and practices. Culture is the product of group values, norms, and experiences as well as of individual innovations and life histories. Although ethnic minorities share a common context with mainstream culture, each group has unique cultural characteristics that permeate their lives. These characteristics are the product of the continued interaction of their culture of origin with the dominant or majority culture. Cultures and subcultures vary not only by national, regional, or ethnic background but also by age, gender, and social class. Ethnic minorities also vary by whether the studied group is composed of migrants or natives to the host country. All these considerations must be taken into account when studying ethnic minorities.
Most investigators agree on the value of cross-ethnic and cross-cultural research findings and on the need to make research culturally sensitive (Canino, Lewis-Fernández, & Bravo, 1997). It has been emphasized that the goal of cultural sensitivity is to increase the scientific accuracy of the research rather than merely promoting multicultural political correctness (Rogler, 1999a). However, there is disagreement as to the degree of cultural or ethnic modifications that should be incorporated into research instruments.
Cross-cultural studies can be approached from two different perspectives, which together have been called the emicetic paradigm (Brislin, Lonner, & Thorndike 1973). The emic perspective involves the evaluation of the studied phenomenon from within the culture and its context, aiming to explain the studied phenomenon’s significance and its interrelationship with other intracultural elements “from the inside.” This approach attempts to describe the internal logic of a culture, its singularity, considering this a necessary step prior to any valid cross-cultural analysis. The etic perspective, on the other hand, is basically comparative. It involves the evaluation of a phenomenon from “outside the culture,” aiming to identify and compare similar phenomena across different cultural contexts.
Both perspectives have been criticized in the pertinent literature (Canino et al., 1997). Critics argue that cross-cultural research based on the emic approach neglects the problem of observation bias. The lack of methodological homogeneity across studies of different cultures can result in the inability to disentangle methodological from substantive factors when variability in cross-cultural comparisons is observed. For example, it may hinder the test of causal hypotheses across cultures. Although a thorough understanding of concepts relevant to one culture is obtained by using this approach, these concepts are not necessarily comparable to those of other cultures. On the other hand, the etic approach has been criticized for emphasizing reliability at the expense of validity. It may impose the appearance of cross-cultural homogeneity that is artifactual to the use of a constricted conceptualization embedded in the instrumentation. This limitation has been called the “cultural fallacy” (Kleinman & Good, 1985). Several investigators have devised strategies that attempt to integrate emic and etic perspectives into one overall research methodology that is both culturally valid and generalizable (see Canino et al., 1997, for examples from mental health research). Similarly, the instrument adaptation model presented in this chapter aims to respond to both the etic and emic perspectives. Its main purpose is to produce instruments that search for the equivalents of psychological phenomena across linguistically and culturally different populations, thus enabling comparisons inherent to the etic perspective. However, it aims to do it in a culturally sensitive way that makes possible the identification of unique cultural characteristics within groups.
Instrument Adaptation Model
Culturally sensitive research involves a continuing and incessant process of substantive and methodological adaptations designed to mesh the process of inquiry with the cultural characteristics of the group being studied (Rogler, 1989). An essential component of such an approach is the development and adaptation of culturally appropriate research instruments. Most psychological research instruments are developed in English in the context of the U.S. mainstream culture. Their use with ethnic minorities requires a careful and thorough adaptation process to produce cross-cultural equivalency. In some instances, it involves translation into another language, but in all cases, it entails cultural adaptations to guarantee its pertinence, applicability, and validity in each ethnic group. The adequacy of an instrument in a given culture or subculture does not guarantee its validity in another one (Brislin et al., 1973).
The model presented frames the cultural adaptation of an instrument in the context of the process of establishing validity of a measure (Flaherty, 1987; Flaherty, Gaviria, & Pathak, 1988). It postulates that equivalence between cross-language and cross-cultural versions of an instrument can be achieved by obtaining evidence about their equivalence on five dimensions: (a) semantic, (b) content, (c) technical, (d) criterion, and (e) conceptual equivalence. This model is generally consonant with recent guidelines developed by the International Testing Commission for cross-culturally adapting educational and psychological tests (Hambleton, 1994). In the Spanish-speaking Caribbean island of Puerto Rico, we have used this comprehensive model to translate and adapt a number of research instruments for studying mental health in both adult and children populations (Bravo, Canino, Rubio-Stipec, & Woodbury-Farina, 1991; Bravo, Woodbury-Farina, Canino, & Rubio-Stipec, 1993; Canino & Bravo, 1994). The difficulties and examples presented in the chapter mainly come from this work, but the issues involved are considered to be sufficiently general in character that they can apply to instrument adaptations involving other research topics as well as other cultural and ethnic groups.
Semantic Equivalence
Semantic equivalence requires that the meaning of each item in the instrument is similar in the language of each cultural group. When an already existing instrument is involved, a thorough process of translation is required to attain it. The translation of research instruments for use with ethnic minorities is a difficult and costly endeavor. Sometimes it has been avoided in large-scale surveys by excluding minorities who do not speak English. Besides the obvious bias that this practice entails (i.e., a subgroup of the population with unique characteristics is excluded, and thus the sample is not truly representative of the whole population), it forces minority people who speak some English to answer in their nonpreferred language. Although many people from ethnic minorities speak English, most feel more at ease speaking about emotional or behavioral topics in their native language. Furthermore, many do not have an adequate command of the English language to understand the linguistic nuances of a structured instrument, thus hampering its comprehension. Therefore, the conscientious study of ethnic minorities whose native language is not English requires the translation of research instruments. Even for those minorities whose native language is English (e.g., African Americans), some linguistic adaptations may be necessary to make the instruments more understandable and adequate. Processes described in this chapter are also appropriate for this purpose.
Attaining semantic equivalence is not an easy task. Latin American writer Octavio Paz has said that as the number of translations in the modern era has increased, skepticism about it has also increased in the philosophical, literary, and linguistic critique (Bravo, Canino, & Bird, 1987). Contrary to earlier times in which faith in translations was based on religious beliefs about the universality and lack of temporality of divine truth, at the present time it is considered that all text is relative and belongs to a specific time and place. Within this context, the modern translator does not search for the impossible identity but for the difficult similarity, intending to produce similar effects with different means.
Moreover, most research instruments are not developed with their translatability in mind (Draguns, 1980), although guidelines for using translatable language on research instruments were formulated some time ago. To ease translation of English into other languages, Brislin et al. (1973) formulated the following rules: (a) Use short, simple sentences; (b) employ the active rather than the passive voice; (c) repeat nouns instead of using pronouns; (d) avoid metaphors and colloquialisms; (e) avoid the subjunctive mode (e.g., use of could or would); (f) avoid adverbs and prepositions telling “where” or “when”; (g) avoid possessive forms; (h) use specific rather than general terms (e.g., cows, pigs instead of livestock); (i) avoid words that indicate vagueness about some event or thing (e.g., probably, frequently); and (j) avoid sentences with two different verbs if the verbs suggest different actions. Even when using these rules, some terms or verbal forms may not have adequate equivalents in other languages.
The best procedure to enhance equivalence in translations has been labeled decentering because it is not centered on any one culture or language (Brislin et al., 1973). It involves changing the original source version of an instrument if, during the translation process, it is identified that some terms or verbal forms do not have acceptable equivalents in the translated language. Therefore, both the original and translated versions of the instrument are open to revision to increase equivalence across languages. Through iterations of translations and back-translations, appropriate wording in the source and target languages is achieved. When developing instruments for use in diverse ethnic groups, this procedure is the best alternative because when versions of an instrument are decentered, they are in an equal linguistic partnership: The wording in each language is familiar and salient to respondents in the cultural groups involved (Rogler, 1999a). However, because this alternative is not commonly used in the development of instruments, even when its use with ethnic minorities is considered, instruments usually have to be translated as they are.
Cross-language equivalency of research instruments requires a comprehensive process that involves not only translation but also thorough testing in alternative languages. A combination of translation and back-translation techniques is the first step in the process. It involves translating the instrument into the target language and then translating it back to the original source language by someone other than the original translator (Brislin et al., 1973). A bilingual committee then compares the original and the back-translated versions. Members of this committee must be knowledgeable about the constructs that the instruments assess, as well as the population to be studied. They must check if the instrument’s items retain their original intent and that the language used is appropriate for the age and education level of the targeted population. It is important to consider language usage in the particular targeted group because the same language can be spoken in different ways by different groups. If semantic differences are observed, the version in the target language is altered. The process is repeated until the meaning of each item in the instrument is similar in both languages.
Although it is a useful technique, back-translation entails certain dangers. To promote similarity between the back-translated version and the original one, one could force the instrument in the target language to conform to the grammatical structure of the source language. For example, in the case of an instrument in English that is translated into Spanish, English syntax could be imposed into the translated version. This results in awkward wording that is difficult to understand by speakers of the target language. Rogler (1999b) presented some examples reported in the literature on the presence of this error on instruments used to survey Latinos or Hispanics in the United States, undermining the findings generated. Translation errors identified included errors of grammar or syntax, poorly constructed sentences, use of double negatives, and literal translations of colloquial expressions, among others. The objective of a translation is to produce easy-to-read, smooth, and natural-sounding wording that is still faithful to the original. To achieve this, a professional translator could be involved in the process of developing the first translation and revising the final one to guarantee that the translated version conforms to the rules and use of the target language. Another alternative is to involve a monolingual committee, representative of the study population, in the process of revising the work produced during back-translation (World Health Organization [WHO], 1998).
Yet the previously described process is still not sufficient to attain semantic equivalence. Field testing of the instrument is essential. In those places where bilingual participants, who are similar to those to be studied but equally proficient in both languages, are available, the instrument should be administered to these participants in both languages. Results from both administrations are then compared. If bilingual participants equally proficient in both languages are not available, the instrument is administered in the target language, but feedback from interviewers or debriefing of participants is obtained. Feedback from interviewers should include whether they felt each item was understood, if respondents’ comments and reactions were consonant with the item’s intent, and whether respondents looked engaged, distracted, or fatigued. The debriefing should ask respondents what they understand each question to be asking, whether they could repeat it in their own words, what came to their minds when they heard a particular phrase or term, and how they chose their answer (WHO, 1998). This information is best obtained on in-depth individual interviews, but focus groups are also an alternative.
The importance of field testing for attaining semantic equivalence is illustrated in the following example. In the translation and adaptation process of the Diagnostic Interview Schedule for Children (DISC), the translation of the phrase “to worry a lot” for the assessment of anxiety presented difficulties that were not envisioned by the bilingual research committee (Bravo et al., 1993). The Spanish literal equivalent “to worry a lot” is preocuparse mucho. Yet, the interviewers who field-tested the translated instrument noticed that the wording had an unintended interpretation. When parents were asked whether their children worried a lot about school, many proudly answered, “Of course he [or she] worries a lot [se preocupa mucho], he [or she] is a very good student,” conveying the message that the child had shown an appropriate and desirable behavior. After analyzing the situation, the bilingual committee concluded that the word in Spanish can have a negative connotation similar to the English word worry, but it can also have a positive one more consonant with that of the English word concern. The item was changed to se preocupa demasiado, the literal equivalent of “worries too much,” to convey the connotation that the behavior was to be out of the normal and desirable range. This kind of equivalence cannot be achieved by using a back-translation process. The purpose of semantic equivalence goes beyond what can be accomplished with this process. As Rogler (1999b) stated, “In research, translation similarity can be considered sufficient when translation has no consequences for the variance in the response’s replies, which is contributed by the language of the instrument” (p. 428). The focus is then on semantic and cultural equivalence rather than on linguistic or literal translation.
If the instrument to be translated and adapted is to be used in several countries, with people of varied ethnicity or even with diverse subgroups within an ethnic minority, the revision by an international or culturally diverse committee is recommended. This step was included in the development of the latest Spanish version of the DISC, the DISC IV (Bravo et al., 2001). An intercultural committee, sponsored by the National Institute of Mental Health (NIMH), was composed of people from different Spanish-speaking countries (Spain, Mexico, and Venezuela) and U.S. Hispanic groups (from Mexican, Central American, and South American origin), as well as Puerto Rico. The main goal of this committee was to ensure that the final translated version was applicable to diverse Spanish-speaking groups and comprehensible to people of varied educational levels. Each committee member carefully reviewed the initial version of the instrument translated and adapted in Puerto Rico, and consensus was reached about the appropriate wording of items. When a term common to all groups was not found, several ethnic variations or regionalisms were included in parentheses so the appropriate word could be selected in each place. For example, to translate the phrase “how often,” three phrases in Spanish had to be included: cuan a menudo, con que frecuencia, and que tan seguido. In this way, an instrument appropriate to a wide variety of Spanish-speaking groups was developed.
Content Equivalence
Content equivalence refers to whether the content of each item is relevant to each cultural group or population under study, that is, if it evaluates a phenomenon that occurs in and is noted as real by members of the ethnic or cultural groups. A committee composed of people who know one or preferably both cultural groups well can attain content equivalence through careful revision. A procedure similar to rational analysis, which is usually employed to obtain evidence about content validity in the development of an instrument, should be employed. That is, a panel of judges, usually composed of experts in the construct to be assessed, decides whether the instrument’s items reflect the concept under study. However, this procedure is sufficient in the judgment of items only when researchers and respondents share the symbolic systems of the same culture (Rogler, 1999a). When researchers and respondents have little or no cultural symbolism in common, it is not sufficient. In this case, detailed cultural observations must supplement it.
These cultural observations should be used for two purposes. First, it should determine whether the construct that the original instrument measures is pertinent to the target cultural group. Second, it should determine whether its operationalization is appropriate. Differences not only across groups but also within the same ethnic group (e.g., socioeconomic or age differences) must be considered in both processes. These determinations sometimes can be made in the selection of the instrument to use in a particular population, even before it is translated, but at other times they are revealed through pilot testing.
Rogler, Malgady, and Rodriguez (1989) described an instance when the irrelevance of a construct was identified at this stage. They had started by trying to measure spousal relationships through decision-making questionnaires. During the initial phase of the study, it became apparent that the items of the instrument they were considering did not apply to impoverished Puerto Rican couples in New York. The items inquired about decision making regarding where to go on vacation, which school the children should attend, the purchasing of insurance policies, and so on. They thought about changing the content of the items but retaining the operationalization of spousal relationships through decision making. After pretesting these new items, it became clear that the construct of decision making for these families was irrelevant because the margin of choice in their lives was slim. In another study using an impoverished island sample, the alternative, identified through ethnographic observations, was to assess changes in marital relationships through gender-based division of labor (Rogler, 1999b). This example serves to illustrate how a higher level construct that an instrument evaluates (i.e., marital relationship) can be defined by using a more pertinent lower-level construct (i.e., gender division of labor) instead of another lower-level one that was found to be culturally inappropriate (i.e., decision making). These examples illustrate the importance of (a) determining whether the construct that the original instrument measures is pertinent to the target cultural group and (b) identifying an alternative construct, if possible, that can be used to assess the higher-order construct that is the main focus of study. Moreover, it also illustrates the need to take into consideration other variables, besides culture, when making these decisions. The decision-making construct was not appropriate for these families, not necessarily because they were Puerto Rican but because they were at the bottom of a socially stratified system struggling to satisfy their most basic needs. This situation can be applicable to poor people from any ethnic minority or even majority group. It illustrates that cultural aspects as well as other factors such as socioeconomic conditions must be considered on the revision of instruments. As previously stated, cultures and ethnic groups vary not only by national, regional, or ethnic background but also by age, gender, and social class.
Content equivalence should be carefully examined, especially in instruments that aim to assess deviations from “normal” behavior. For example, an examination of an instrument designed to measure family functioning revealed that its operationalization of what constitutes “normal” family functioning was not pertinent for Puerto Rican families. The instrument in question, the Family Adaptability and Cohesion Scale (FACES) (Olson, Portner, & Bell, 1982), contained several items that ran counter to its cultural norms and practices. The respondent was asked to rate the frequency by which, for example, children had a say in their discipline and could help solve family problems, whether household responsibilities shifted from person to person, or whether family members consulted other family members on their decisions. The Puerto Rican family, especially in the low socioeconomic class, is predominantly hierarchical, with marked division of labor around sex role stereotypes, and children usually have little participation in decision making. “Normal” family functioning can therefore not be measured along concepts that inquire about sharing in decision making and responsibilities and egalitarian sex role definitions. Again, this characteristic could be more related to the slim margin of decision making that living in poverty can entail than to cultural factors.
A similar problem was identified regarding an instrument that measures social adaptation in children. Adaptation is defined in terms of the way the person’s role performance conforms to the expectations of his or her reference group. Measures are thus based on behaviors or roles that are normative to a given society or context (Katsching, 1983). Given this contextual definition, one would expect the construct to vary across different cultural and/or socioeconomic groups. We have evidence from a 1985 Puerto Rican children survey (Bird et al., 1988) that supports this statement. Even after matching children for age, sex, and socioeconomic status, differences between Puerto Rican and Anglo children were observed in the Child Behavior Checklist’s social competence scores. Puerto Rican parents and teachers scored youth as considerably lower in the social competence items of this scale as compared to the Anglo sample. The adolescents also scored themselves lower in social competence. A closer inspection showed that reports on items that assess use of spare time, such as involvement in sports, hobbies, organizations, or part-time jobs, were significantly lower for Puerto Rican children, contributing to their lower social competence scores. The lack of resources in poor-income neighborhoods that is common in the island, as well as the high unemployment rates that limit the availability of jobs particularly for the younger age groups, probably accounts for these findings rather than the conclusion that Puerto Rican children are less socially adapted. Moreover, Puerto Rican children more frequently endorsed an item that measured frequency of contacts with friends and getting along with family and siblings. The latter results may reflect the importance that the Puerto Rican culture gives to close family ties and good interpersonal relations (Canino, 1982). In these situations, the content of items that are not relevant to the specific group to be studied must be substituted for others that are more culturally appropriate. In this case, although the lower-order construct of “use of spare time” can be retained to define a dimension of the higher-order construct of “social adaptation,” it must be operationalized in a more culturally consonant way.
Whether the construct that the instrument measures is pertinent to the target group can also vary by geographical location. The DISC has some items designed to assess seasonal depression. Children and parents are asked whether they had experienced the symptoms characteristic of a dysphoric mood when days were shorter (late fall and winter) as compared to when days were longer (spring and summer). In Puerto Rico, people looked puzzled when asked in this manner. In a tropical island, there are no marked differences in the length of days during the year or striking environmental differences among the various seasons. To convey a similar idea in a more contextually appropriate way, we reworded the item to ask about seasons cuando obscurece más temprano (de octubre a marzo) and cuando obscurece más tarde (de abril a septiembre)—literally, this means, “when it grows dark sooner (from October to March)” or “it grows dark later (from April to September).” However, even after changing these items to make more sense in our context, we found that none of the children interviewed experienced seasonal depression. We wondered whether it was the result of the translation and adaptation or because it is unlikely to occur in a place where there are no marked seasonal changes. Results from the testing of the DISC, done through field trials carried out in collaboration with three North American sites, showed a pattern consistent with the latter. The prevalence of seasonal depression was higher in two communities of the Northeast of the United States, lower in a Southern community, and nonexistent in Puerto Rico (Canino & Bravo, 1999). This empirical finding was consistent with the arguments presented in the international committee’s meetings. Although some members coming from the continental United States considered the construct and the original items appropriate for their sites, those coming from places nearer the equator (e.g., Venezuela) argued that it was not pertinent to their context.
These examples illustrate the importance of operationalizing and assessing constructs in a culturally appropriate way, after determining whether the construct is adequate for the ethnic groups involved. To carry out these tasks, researchers sometimes must use ethnographic methods, as illustrated in some of the examples presented above. Their use is intended to avoid the imposition of the appearance of cross-cultural homogeneity that is artifactual to the use of a constricted conceptualization embedded in the instrumentation (”cultural fallacy”) (Kleinman & Good, 1985). Through a combination of psychometric empirical studies and ethnographic inquiry, this bias can be surmounted (in the mental health literature, e.g., see Carstairs & Kapur, 1976; Kinzie et al., 1982; Manson, Shore, & Bloom, 1985).
Technical Equivalence
Technical equivalence is attained if it can be documented that the measuring techniques used are similarly appropriate—that is, produce similar effects—in the different cultures involved. Sometimes differences identified between cultures that have used the same assessment instrument could be due to differences in the assessment technique being used rather than the content of the instrument. It is thus important that the technical equivalence of the instrument is assessed before the onset of the study. A careful consideration of the capabilities of the targeted respondents and their familiarity with the instrument’s format and administration technique is needed. A bicultural committee familiar with the population under study can do this revision. However, field testing is essential. The use of these techniques is illustrated in what follows.
To study drug use, researchers should use self-administered instruments to minimize response bias due to social desirability. However, the lack of appropriate reading and writing skills is a source of inaccuracy when studying inner-city minority populations in which functional illiteracy is prevalent. It is thus necessary to maintain the anonymity of self-reports for sensitive information of this nature and at the same time address the problem of functional illiteracy. For this purpose, Turner, Lessler, and Gfroerer (1992) have developed a computerized audio system in which the respondent is not required to read the item but must answer the question posed by the audio system by pressing a key on the computer. These technological advances, although encouraging, must be carefully assessed to evaluate their appropriateness across and within ethnic groups. For example, this technology could be appropriate for youth familiar with computers or similar technology but not for those unfamiliar with them. Moreover, it could be appropriate for youth but not for older respondents.
Maximizing anonymity through self-report techniques as described earlier might not be sufficient for certain populations. Ethnic minorities and other socially disadvantaged populations may be more prone to deny behaviors such as use of drugs, physical abuse, sexual activity, or other antisocial behaviors because they think they could get into trouble with the authorities. In fact, there is evidence that a significantly greater proportion of African Americans than non-Hispanic Whites admit that they would not be honest in reporting their illicit drug use, even if they did hypothetically engage in this type of behavior (Mensch & Kandel, 1988). The lower rates of drug use among African American populations reported in national surveys sponsored by the National Institute of Drug Abuse may be the result of underreporting.
Data thus suggest that the assessment of sensitive or antisocial behaviors through self-reports will need to be supplemented with other types of assessment to avoid significant underreporting. This is especially so in populations in which this type of behavior is greatly censured or in groups who are at a disadvantage in the society surveyed. The use of key informants and other data sources (e.g., police and medical records or hair or urine analyses for biological detection of drug use) might be advisable. In addition, use of interviewers who live in the participants’ community or prior consistent contact of interviewers with community leaders might be necessary to avoid underreporting of sensitive information.
Testing the reliability of an adapted instrument is an additional way for determining whether the assessment technique is appropriate for the particular group studied. If it is not, inconsistent answers are likely to be obtained. Moreover, reliability results from the adapted instrument that are similar to those obtained with the original version constitute more evidence of the technical equivalence of the instrument in both cultures and ethnic groups studied. Other more complex statistical techniques have been developed in the education field to test technical equivalence among different language versions of structured instruments (see, e.g., Hambleton, 1993).
Criterion Equivalence
Criterion equivalence implies that the interpretation of the results obtained from the measure is similar when evaluated in accordance with the established norms of each culture. It involves techniques similar to those used to assess criterion validity of a measure. However, it is very important that the criterion that serves as a validator is culturally appropriate. Again, the similarity between the observed validity results using the adapted version and those obtained with the original instrument attests to the criterion equivalence among both versions of the instrument.
The criterion equivalence of the Spanish versions of various diagnostic instruments has been established in Puerto Rico by comparing the diagnoses they produce against the clinical judgment of well-trained and experienced
Puerto Rican clinicians, which is used as an external criterion (see, e.g., Bravo et al., 1993; Canino et al., 1987; Rubio-Stipec, Bird, Canino, & Gould, 1990). Because the reliability and validity of clinical judgments have sometimes been questioned, certain procedures have been employed to make the judgments more accurate. These include using the structured interview schedule to organize the interview, requiring that clinicians evaluate the presence of each diagnostic criterion (instead of only the disorder as a whole), and using a best estimate diagnosis (BED). The BED involves the consensus judgment of at least two clinicians about the presence of a disorder in a case (for more details, see Canino & Bravo, 1999). Relatively similar results to those obtained with the English versions of the diagnostic instruments have been obtained when similar comparisons were made, attesting to the criterion equivalence of the instruments’ versions.
Results from comparisons between the diagnoses produced by the instruments and those given by clinicians have also been used to enhance the adapted instruments. For example, in the validity study of the Diagnostic Interview Schedule (DIS) (Robins, Helzer, Croughan, & Ratcliff, 1981), an overdiagnosis of schizophrenia relative to the clinical judgment was observed. Looking for the cause of this bias, it was identified that certain culturally syntonic experiences were likely to be scored as a positive psychotic symptom by the instrument yet were not considered as such by experienced Puerto Rican clinicians. The reported experiences were usually religious or spiritual in nature with strong influences from Catholic, Pentecostal, and Spiritism (Espiritismo) belief systems (see Garrison, 1977). These beliefs are common to large segments of the Puerto Rican population (Hohmann et al., 1990) and are prominent in Latin American cultures and literature (see, e.g., Allende, 1986). These experiences usually involve seeing or hearing dead relatives or religious figures (e.g., saints, the Virgin Mary, Jesus Christ) and having premonitions of events to occur. Under certain circumstances, these experiences are culturally valued and even considered a “special gift.” As a result of these findings, in the instrument used for the epidemiological study, we introduced some additional items to fine-tune the assessment of these experiences. These items gather information about how acceptable and common the reported experience is for the interviewed persons and their relatives or friends (”Have you talked about these experiences with your family, friends, or peers?” “What did they say about them?” “Do you think these experiences happen only to you?”). The answers to these questions, in addition to the description of the experience that was also collected by the instrument, were used by Puerto Rican clinicians to decide whether the experience should be considered psychotic or a culturally syntonic behavior (Guarnaccia, Guevara-Ramos, González, Canino, & Bird, 1992). The addition of these features to the diagnostic instrument increased its criterion equivalence because it made the diagnoses derived from it more similar to the clinical judgments used as the culturally consonant validation criterion.
Conceptual Equivalence
Conceptual equivalence requires that the same theoretical construct be evaluated in the different cultures involved. Procedures similar to those used to attain construct validity of instruments can be used. One of the strategies is to use factor analysis to check the similarities in factor structures among versions of the same instrument. Another strategy is to determine the relationship of the construct with other relevant concepts derived from theory or previous research to test whether hypothesized relationships are confirmed.
We used the latter strategy in the Spanish translation and adaptation of the DISC. We hypothesized that children classified by the DISC as disordered would have higher levels of impairment (as measured by the Children’s Global Assessment Scale) (Shaffer, Gould, & Brasic, 1983), lower levels of adaptive functioning (Beiser, 1990), and more school problems (dropping out, absenteeism, failure, detention, suspension, attending special classes) as compared to children who did not meet DISC diagnostic criteria. These hypothesized relationships were generally confirmed (Bravo et al., 1993). Results thus suggested that the adapted instrument was evaluating phenomena associated with dysfunction in social, psychological, and academic dimensions in children and adolescents, a finding that would be expected from an instrument appropriately evaluating psychiatric disorders in our context.
Cross-Cultural Comparisons
The presented model aims to produce equivalent instruments for varied languages and contexts. Once methodological artifacts have been minimized, comparisons across cultures or ethnic groups may yield illuminating findings, especially if many and varied cultures are involved. That is the case for the Diagnostic Interview Schedule (Robins et al., 1981), which has been used across many diverse sites in Asia, Europe, and North America (see Helzer & Canino, 1992; Weissman et al., 1994; Weissman et al., 1997; Weissman, Bland, Canino, Faravelli, et al., 1996; Weissman, Bland, Canino, Greenwall, et al., 1996). Interesting discrepancies and similarities have been observed (Rubio-Stipec & Bravo, 1999).
Some disorders varied widely in lifetime prevalence (i.e., alcoholism, 0.45%-23%; major depression, 1.5%-19%), but most showed more consistent rates (i.e., bipolar disorder, 0.3%-1.5%; social phobia, 0.5%-2.6%; panic disorder, 0.4%-2.9%; and obsessive compulsive disorder, i.9%-2.5%). Reasonably consistent age of onset was observed for all studied disorders: social phobia (mid-teens to early 20s), bipolar disorder (late teens to mid-20s), alcoholism (early to mid-20s), panic disorder (early 20s to mid-30s), obsessive compulsive disorder (early 20s to mid-30s), and major depression (mid-20s to early 30s). Consistencies in symptomatic expression across sites were also observed for some disorders, such as alcoholism (rank order correlation of symptoms > .80 within North American sites and within Asian sites), but not for others, such as obsessive compulsive disorder (predominance of obsessions over compulsions for some sites and the opposite pattern for others).
Moreover, consistent gender distributions have been generally identified: Major depression, panic disorder, and social phobia tended to be female prevalent; alcoholism tended to be male prevalent; and bipolar and panic disorders tended to be gender balanced. The magnitude of female-to-male ratios were similar for some disorders (i.e., bipolar disorder, 0.3:1 to 1.2:1; social phobia, 1.4:1 to 1.6:1) but varied widely for others (i.e., major depression [F:M, 1.6:1 to 3.5:1], panic disorder [F:M, 1.3:1 to 5.8:1], and alcoholism [M:F, 4:1 to 25:1]). The male-to-female ratio was particularly high for alcoholism in the Asian and Hispanic cultures compared with Western and Anglo-Saxon cultures (12-25 vs. 4-6 times greater in males) as well as for Mexican American immigrants to the United States (25:1) compared to Mexican Americans native to the United States (4:1). This result suggests that an important societal effect, such as social stigma attached to drinking among females, may be present. It also illustrates the advantage of studying immigrant and native ethnic minorities in the context of larger multinational comparisons to elucidate observed patterns.
A caveat, however, is warranted at this point. Various limitations have been identified in the cross-cultural use of the DIS that may undermine the studies’ findings (Rogler, 1999b). They include the overinclusiveness of some items and the absence of equivalent culturally significant symptoms of mental distress when applied to other cultures (for studies of the Hopi, see, e.g., Manson et al., 1985). The previously reviewed articles that reported DIS data do not usually describe in detail the process of translating and adapting the instrument to other languages and cultures. Therefore, it is not possible to judge whether a culturally sensitive process was used, and thus the resulting instruments cannot be considered culturally equivalent to the original. Therefore, interpretation of findings must take this into account. However, if the comparisons are culturally valid, they are a step in the process of understanding the role of culture in the development and course of disorders.
Discussion
A major undertaking for psychology is the comparison of psychological phenomena across cultural or ethnic boundaries. This line of inquiry is embedded in the current interest on emphasizing the role of culture as an integral part of the study of cognition, emotions, intentions, and behaviors (Sinha, 1996). Psychology had been “culture blind” in the past. This tradition implied not only the denial of the influence of culture on human development but also the hegemony of a Euro-American worldview in the production of theories to explain it. According to Berry (1996), the reaction to this “culture-blind” tradition has been twofold. First, it was characterized by the emergence of conceptualizations and studies that consider culture a factor in the explanation of behavior. Second, it was represented by cross-cultural studies designed to compare the influence of various cultures on particular human behaviors. This cultural perspective is a reflection of debates and transformations in the philosophy of science, as well as the increasing recognition that knowledge is socially constructed (Cole, 1996).
Consonant with this philosophical transformation, Rogler (1999b) has used the concept of procedural norms, taken from the analysis of science as an institutional structured social process, to identify sources of persistent cultural insensitivity in research. Procedural norms are “canons of research that tell scientists what should be studied and how, and they are taught to successive generations of researchers” (p. 424). The procedural norms this author identified are (a) obtaining evidence about content validity based on experts’ rational analysis of concepts, (b) using translations that try to conform to the exact terms used in standardized instruments, and (c) uncritically transferring concepts across cultures. These issues have been addressed in this chapter, and some solutions to tackle them have been proposed. They are, respectively, (a) including cultural observations besides the rational analysis of concepts to attain content equivalence, (b) targeting semantic or cultural equivalence rather than linguistic or literal translation, and (c) determining whether the construct that the original instrument evaluates is pertinent to the target cultural or subcultural group through detailed cultural observations and pilot testing.
However, these methodological procedures, as well as others described before, may produce an instrument that is somewhat or markedly different from the original one. Although most investigators agree on the value of cross-ethnic and cross-cultural research findings and on the need to make research culturally sensitive, there is disagreement as to the degree to which cultural or ethnic diversity should be incorporated into research instruments (Canino et al., 1997). Specifically, how much local cultural diversity can be incorporated into an established instrument before the degree of alteration renders the instrument incapable of measuring the original constructs for which it was designed? This dilemma is difficult to tackle and is at the root of the emicetic paradigm. On one hand, cross-cultural comparisons require similarities in observations, but the phenomena observed in each cultural group must be culturally valid for the comparison to be worthwhile.
In what follows, I propose two sets of guidelines to enhance valid cross-ethnic or cross-cultural comparisons derived from the experience of using the adaptation model previously presented and the analysis of pertinent literature. One set of rules applies to the case when a cross-ethnic collaborative process is used to develop the instrument from the start. The suggested steps are as follows.
First, include researchers from varied ethnic and cultural groups in the development of any instrument that is foreseen to be widely used. In this way, constructs and their operationalizations that are appropriate to multiple groups are considered and incorporated. Second, employ translatable English following Brislin et al.’s (1973) rules. Third, translate the instrument to other pertinent languages and use a decentering process to attain semantic equivalence across languages. Fourth, obtain evidence about the content validity of the instrument in the different languages and cultures by using experts’ rational analysis. Fifth, pilot-test the different versions in diverse groups; special attention should be given to low-education and low-socioeconomic status (SES) populations because they are not likely to be represented in the panel of experts, regardless of ethnic group. Cultural observations should be added at this point to ensure that the constructs evaluated, or their operationalizations, as well as the administration procedures are appropriate. Sixth, evaluate the psychometric properties of each language version and targeted population to obtain evidence about their reliability and validity as well as their construct and conceptual equivalence. The whole process is not centered on any one culture or language; all language versions are subject to modifications at any step to attain cross-cultural equivalency. This is the ideal situation, but it is not likely to occur frequently due to the costs, effort, and coordination involved.
The other set of guidelines applies to translations and adaptations of an established instrument. First, examine carefully whether the instrument’s constructs and dimensions, as well as its operationalization, are appropriate to the target group (content equivalence). Perform cultural observations (e.g., ethnography, culturally sensitive in-depth interviews or focus groups) in the target populations, besides rational analysis, when studying a population not familiar to the researchers, even within the same ethnic group (e.g., low-education or low-SES people). If lack of pertinence is identified beforehand, resources involved in translation and pretesting could be spared. However, this check for cultural pertinence should continue throughout the whole process of instrument development, especially in the translation and pretesting phases. Second, translate the instrument, but when translating, conform to semantic or cultural equivalents, not the exact terms of standardized instruments. Third, when some cultural discordance is identified within the instrument, substitute for culturally concordant equivalents. If the assessed higher-order construct (e.g., social adaptation) and its lower-order dimension (e.g., use of spare time) are considered culturally appropriate but not its operationalization (e.g., participation in organized sports), work at the item level to attain cultural equivalency; that is, the best alternative is to substitute each item for a pertinent operationalization (e.g., participating in informal sport games). When revising items, delete them only after being absolutely sure that no cultural equivalents are possible. If the addition of items is necessary, the analysis of data with and without the added items is sometimes appropriate. Fourth, if the discordance is of a higher magnitude—that is, at the level of the assessed higher-order construct (e.g., marital relations) or a lower-order dimension used to define it (e.g., decision making)—work at the dimension or instrument level is required. In this case, substitute culturally inappropriate lower-level constructs (e.g., decision making) for other lower-level appropriate ones (e.g., gender-based division of labor) to define the same higher-level construct in the instrument (decision making). Even in this case, try to maintain the same instrument format, if appropriate, and the same approximate length. Analytical difficulties in the comparison may result, however. An alternative is to develop equivalent cutoff points for each ethnic or cultural group—for example, culturally equivalent scores of “good” marital relations between the decision-making and the division-of-labor operationalizations. Fifth, test the psychometric properties of the new adapted version in the targeted population to obtain evidence about their reliability and validity as well as their construct and conceptual equivalence. Comparison of results with those obtained using the original version is warranted. Finally, write a thorough description of the methods used to translate and adapt the instrument, and publish it to let readers judge the adequacy of the process.
From what is presented above, it follows that the culturally sensitive perspective advocated in this chapter entails difficult endeavors because multiple methodological challenges must be overcome. On one hand, cross-cultural comparisons require equivalency in observations in the different cultures or ethnic groups, enabling researchers to disentangle whether the differences observed across groups are due to differences in the methods and measures employed or to true ethnic or cultural differences. On the other hand, for the comparisons to be worth the effort, instruments should be translated and adapted in a thorough and culturally sensitive way to avoid the cultural fallacy of imposing the appearance of cross-cultural homogeneity that is artifactual to the use of a constricted conceptualization embedded in the instrument (Kleinman & Good, 1985). The use of careful and comprehensive adaptation concepts and methods, similar to those described and illustrated in this chapter, is thus recommended to attain both goals. As previously stated, culturally sensitive research involves a continuing and incessant process of substantive and methodological adaptations designed to mesh the process of inquiry with the cultural characteristics of the groups being studied (Rogler, 1989). It is a difficult but worthwhile endeavor because the scientific accuracy of the research depends on it.