Systematic Observation

M Elizabeth Vonk. Research Techniques for Clinical Social Workers, 2nd edition. Columbia University Press, 2007.

Direct observation is one of the most natural methods for gathering information. All of us observe phenomena continually and make decisions based on our observations. Similarly, in their practice clinical social workers routinely observe clients and base diagnostic and treatment decisions upon what their clients say and do. In the context of social research, however, observation is conducted more systematically than in daily life or in much of clinical practice.

In its most structured form, systematic observation is a method of quantitative data collection that involves one or more observers, observing events, or behaviors, as they occur, and reliably recording their observations in terms of previously structured codes or numerical categories. In some observational studies the events or behaviors to be observed are first preserved on video or audio recordings. Observations may take place through one-way mirrors as well. Whatever the means of access to the observational data, systematic observation is sufficiently structured to allow testing of research hypotheses or, in the case of practice, to allow for assessment or evaluation.

At the other end of the observational research continuum, unstructured or nonsystematic observation involves one or more observers recording narrative or qualitative accounts of their observations without employing previously structured numerical categories for describing events or behaviors. In social science research this method is used to generate rich descriptions of phenomena or to propose theory. Clinical social workers, however, engage in this form of qualitative observation all the time. Their training and practice experience enables them to make highly refined use of these observations for diagnostic and treatment purposes. Observation for qualitative analysis will be discussed more in chapter 8.

In this chapter the focus is on systematic observation, a procedure with great potential to contribute to diagnostic and treatment decision making. More specifically, we concentrate on the use of systematic observation to study individuals interacting in families, groups, or other social institutions. Our intent is to provide clinical social workers with systematic observational principles and techniques that can assist in diagnostic assessments of these individuals. For those readers interested in the assessment of goals and outcomes for a group or a family, however, we recommend several articles listed in table 3.1 and referenced in the selected bibliography at the end of the chapter.

Systematic Observation and Clinical Social Work

In social research observational techniques are frequently used to measure research participants’ behaviors in “natural” settings such as at home or school. Often they are utilized along with other more obtrusive approaches such as self-administered questionnaires or interviews in order to provide direct measurement of behaviors that may be difficult to reliably capture through the use of self-report instruments. For instance, in some situations research participants may be unaware of the behaviors of interest to the researcher or they may be unable to accurately share information related to the behavior or concept of interest. Systematic observational techniques are useful in such situations. Likewise, clinical social workers can employ systematic observation to generate practice-relevant information that facilitates accurate assessments of clients’ behaviors in natural settings. For instance, systematic observational techniques can be valuable diagnostic tools in schools, hospitals, work with developmentally delayed persons, and more.

Systematic observational techniques are potentially useful in all phases of clinical social work practice. In diagnostic assessment, for example, observations of individuals referred for treatment can be made by social workers, other professionals, or significant others in the referred individual’s environment. The contents of these observations can include both the problems and the strengths the individual brings to social situations. For instance, observational techniques can be used to assess the frequency of provocative behavior within the family, disruptive classroom behaviors, or physically self-destructive behaviors in an institutional setting. Alternatively, these techniques can supply information about positive expressions of affection in a family, cooperative behavior in the classroom, and self-caring behaviors in an institutional setting.

Systematic observation can also be employed to validate information obtained by other, less rigorous procedures. For example, a mother may consult a social worker in a community family service agency for help with her child. A discussion with the mother reveals that she is troubled by her child’s “immature behavior.” The mother is unable to be very specific, however, about the nature of these behaviors or their frequency. In addition, it is unclear whether these behaviors are “abnormal” in frequency and kind when compared to the behaviors of other children of the same age. By visiting the home, and using systematic observation, the social worker can better assess the specific nature and frequency of the behaviors that are problematic to the mother. The social worker can also observe the mother’s efforts to reduce the occurrence of these behaviors. In this way the social worker can determine whether the “problem” is with the child or with the mother’s expectations.

Systematic observation can be used as well to monitor the extent to which clients follow treatment plans. In fact, the monitoring can be done by clients themselves. Thus, for example, marital partners may be instructed to follow a relaxation procedure for five minutes before dinner each evening. The objective of this is to reduce tension and angry outbursts during dinner. Both partners may be asked to record on a simple form the number of times during the week that they followed this procedure as well as the number of arguments they had during dinner. Such systematic recording will give the social worker an indication of the extent to which the clients are complying with treatment and whether the procedures are having any positive effect.

Another way that systematic observation can be used is to assess or monitor the clinical process. A creative example of such use is provided by McDowell (2000), who describes the use of systematic observation to resolve a therapeutic impasse. In this case both clinician and client observed videotaped sessions and identified “helpful” and “nonhelpful” interventions. By comparing and discussing their observations, the pair was able to move forward in treatment.

Finally, systematic observation can be employed to evaluate the effectiveness of treatment. This would require recording problematic behaviors or mood states before, during, and after treatment to determine whether they have declined significantly and whether the benefits persist even after treatment has been terminated. For example, the parent of a twelve-year-old child currently in therapy because of school-related problems may be asked to record the number of times per week the child completes assigned homework. This recording is maintained for a while after treatment ends to determine whether treatment gains persist. If they do not, treatment may need to be modified and continued.

Principles of Systematic Observation

Determining What is to Be Observed

To observe a phenomenon reliably, one must decide upon the behaviors to include and to exclude from systematic observation. It is impossible to observe and record everything. Consequently, social workers should decide on the purpose of the observation before considering techniques of observation. In the context of clinical social work practice, observation-generated information should assist the social worker in carrying out practice-related tasks. In the assessment phase of practice, the information should be linked to diagnostic decision making.

The focus of observation may be either overt or covert behaviors. Overt behaviors are directly observable by others, for example, crying, laughing, or hitting. Covert behaviors, on the other hand, involve self-reported feelings, moods, or thoughts such as despair, sense of well-being, or feelings of anger. Covert behaviors are directly observable only by those experiencing the feeling states, and are only indirectly observable to others. As a result, systematic observation is often confined to observation of overt behaviors. Since more than one observer can observe the same set of overt behaviors, higher reliability in measurement can be achieved. However, covert behaviors such as thoughts and feelings can also be measured with systematic observation through the use of self-reported observations.

Assessing the Availability of Existing Observational instruments

Referring to the principles for assessing available instruments presented in the previous chapter, social workers should determine whether instruments exist that can be used directly or serve as stimuli for the construction of new observational devices. Two observational instruments are mentioned here: the Behavior Assessment System for Children (Reynolds and Kamphaus, 1998) and the Behavioral Observation Instrument (Lieberman et al., 1974). The Behavior Assessment System for Children is a broadband instrument used to assess behavioral problems in children and adolescents. The Behavioral Observation Instrument can be used to classify the behaviors of individual clients in clinical settings such as psychiatric units in hospitals. Staff members may be trained in this method to systematically observe how and where patients spend their time.

If appropriate observational instruments do not already exist, an original instrument can be developed using the following principles.

Constructing the Observational Instrument

The basic task in constructing observation instruments is to provide well-defined, easy to understand dimensions and categories that can be reliably and accurately recorded. Attached to the instrument should be a set of instructions among which are included operational definitions of dimensions and categories, instructions regarding when observations are to be made, and instructions for making necessary tallies of observations. The number of dimensions and categories should not be excessive.

In an observational instrument each dimension refers to a single behavior or attribute that is to be observed, such as playing, crying, or laughing. Each dimension should be sufficiently specified by an operational definition so that observers can agree that the behavior is or is not taking place. In addition, a dimension can be recorded in terms of frequency, duration, magnitude, or some other quality of it. Thus, for example, playing can be classified in terms of the number of times it occurs in a specified time period, the length of time that it takes place over a specified time period, how actively it is engaged in, and whether it is solitary or involves others. As in the construction of forced-choice questions for questionnaires, observational categories should be mutually exclusive and exhaustive. Hence, whatever the classification scheme, categories should not overlap and, in addition, they should exhaust the range of possibilities along that dimension.

The actual recording of observations can be done on paper using a checklist, a tally sheet, a form, or a rating scale. On a checklist, behavioral dimensions are identified and observers indicate whether or not the behaviors are observed. For example, therapeutic technicians in a psychiatric unit of a hospital may be asked to indicate whether given patients appear depressed, anxious, or manic. Naturally, to use such a checklist properly, technicians must be trained to accurately and reliably observe patients in terms of the foregoing categories.

A tally sheet requires that the observer mark each time a specific behavior occurs within a specified time period. Space is left for tallying up the total number of times the behavior is observed. An observer in a classroom, for example, may be asked to record the number of times in an hour that a student speaks without permission or hits other children. The forms generally include a space for adding up the totals for each behavioral dimension observed.

Similarly, forms may be used to record the duration of a particular behavior. The observer would simply indicate the amount of time a specified behavior occurred over the course of a specified amount of time. A classroom observer, for instance, could record the length of time a child is on task over a duration.

Rating scales attempt to record the intensity or degree to which a dimension is exhibited. For example, an observer in a halfway house for clients recovering from alcohol or drug abuse may be asked to rate residents daily in terms of degrees of depression. The rating scale may be numerical, rating the presence of depression from 0 to 10, with 0 indicating no depression and 10, extreme depression. Or, specific non-numerical categories that are mutually exclusive and exhaustive may be provided, such as “not depressed,” “mildly depressed,” “moderately depressed,” and “severely depressed.” Many of the principles discussed in chapter 1 on questionnaire development apply to the construction of rating scales for systematic observation.

In addition to pen and paper methods, the recording can be done electronically using a notebook, laptop, or handheld computer. With appropriate software, portable and handheld computing devices have an advantage over forms in that the data can be downloaded directly into a database or statistical software package. Table 3.2 lists several available software packages for recording and analyzing observational data.

Whatever the method, it should be user friendly and as nonobtrusive as possible. If the method is not easily utilized by observers, in all likelihood, it will not be carried through. As Bloom, Fisher, and Orme (2003) point out, some observers do well with one of many low-tech nonpaper methods such as dropping a poker chip in a basket each time a particular behavior is observed. After a specified amount of time, the observer simply counts the chips and records the number on a form. This type of method may be particularly useful for busy observers such as parents and teachers.

Determining the Sample of Observations

For reasons of economy, in most clinical situations observations must be made on a selective basis. For example, it might be clinically useful, but prohibitively expensive, to observe a child with a behavioral disorder all day, every day, for a month. As a result, it is necessary to select a sample based on time or situational factors. A time sample involves making observations at previously designated periods of time to obtain what is hopefully a representation of typical behavior. The times chosen for observation may be selected randomly or systematically, for instance, every twenty minutes. Alternatively, the choice of times may be based on situational factors, that is, on some strategic sense of the times that are most critical to the expression of the behavior to be observed. For example, observation of a child’s social skills may take place only at recess and lunch since those are the only unstructured times with peers during the school day.

As much as possible, the number of dimensions observed should be small enough and the number of observations made should be large enough to ensure the reliability and validity of the observations made. However, if observers are not full-time observers but must perform other roles, the number of observations should not interfere with management of their other roles. If the demands of these other roles are not taken into account, the result will be fatigue, resentment, and errors in judgment and in recording.

Overall, the observations should be spaced broadly enough so that recording them need not be overly laborious, yet they should be spaced narrowly enough to provide a fairly reliable representation of the situation to be observed. In a family situation, for example, recording parent-child conflicts on a weekly basis would lead to highly unreliable recording. Alternatively, monitoring conflicts every five minutes would be exhausting and unnecessarily precise. As with many other research issues, some balance must be struck between precision and practicality in constructing the instrument.

It should also be noted, in the case of observing behavior within a family or other group, that the social worker must decide how many and which of the individuals to observe. The sample chosen should be unbiased and representative of the group overall. (See chapter 12 for a more in-depth discussion on sampling.)

Choosing the Observer

As indicated earlier, any number of individuals may be involved in the collection of observational data. What is important, however, is that the observer has familiarity with the situation being observed and with the dimensions and categories that are to be recorded. Consequently, only a trained clinician should be used to make complex clinical judgments. Alternatively, if the categories of behavior to be recorded are simple and part of everyone’s day-to-day language and experience, nonexperts may be used. Accordingly, only trained clinicians should make judgments about whether a patient appears “manic.” It does not take a professional, however, to record when someone is laughing.

Whatever their usual role in the situation to be observed, the observers should try to refrain from projecting their own biases and values on the behaviors observed. This is often difficult. It is facilitated, however, by an instrument that includes clear instructions for making and recording observations and is composed of simple, behaviorally specific dimensions and categories. Valid and reliable observational data are also greatly facilitated with training. Even experienced clinicians may require training in order to do systematic observation.

Training the Observers

To properly use an observational instrument, observers must be adequately trained. The purpose of such training is to standardize the use of the observational instrument, to reduce the number of errors and ambiguous recording of observations, to increase reliability, and to promote objectivity. The amount of training necessary depends on the complexity of the instrument and the clarity and specificity of dimensions, categories, and instructions. The more complex or abstract the dimension, the more training is required.

In training observers, they should first be instructed about the purpose of the instrument, how it is to be used, and the definitions of dimensions and categories. Each dimension to be observed should be clearly and specifically defined so as to avoid confusion among the observers. This can be accomplished through written instructions, oral presentations, and discussion for clarifying ambiguities or misunderstandings. Role-plays also can be helpful for simulating the situations to be observed and trying out the instrument.

Observers should be instructed to observe and record behaviors in as unbiased a way as possible. Ideally, observers should not be aware of the expected changes in behavior. In clinical practice, however, observers’ expectations and levels of optimism (or pessimism) about the effectiveness of interventions are likely to bias their observations such that they will support their expectations of success or failure. The use of more than one observer may help to compensate for this bias. Often, however, only one observer is available. Moreover, the observer is frequently the clinician who developed the instrument, requiring a high level of self-control and objectivity so that the instrument is not used in a biased fashion to support a predetermined conclusion.

Pretesting the Observational Instrument

Before actually implementing an observational instrument, it should be pretested so that reliability estimates can be made and ambiguities in procedures can be corrected. This can be accomplished by using the instrument on a trial basis in actual clinical situations or with videotapes, audiotapes, or role-plays of them. In pretesting, one should establish whether instructions are clear, whether the observers can use the instrument reliably and accurately, and whether use of the instrument causes undue fatigue or boredom. The latter elements, if present, will reduce reliability over time.

Having two independent observers pretest the instrument is a useful procedure to help ensure reliability. However, that may not be practical in many clinical situations. If only one person is used for observing many events over time, it is desirable to have another observer do spot checks of small segments of events occasionally to test reliability. If that is not practical, and events are recorded on video or audiotape, observers can spot-check their own reliability by recoding an event and comparing the two data sets. If events are not recorded, observers should, of course, be instructed to do the coding while the event is taking place. Coding done after the event is likely to be inaccurate and distorted.

Assessing the Validity of the Observational Instrument

In the context of systematic observation, validity refers to the extent to which the observational instrument measures what it claims to measure. The two types of validity that are particularly relevant to observational instruments are content and concurrent validity. Here, content validity refers to the logical connection between the observational instrument and the information pertinent to diagnostic assessment. This relationship can be established by having two or more experts review the dimensions and categories of these dimensions in the instrument to determine whether they constitute appropriate indicators of the clinically relevant information sought. For example, observation of a child at play would not be a valid way to measure that child’s ability to stay on-task in the classroom.

Concurrent validity refers to the extent to which the information generated by the observational instrument is consistent with information generated by other means. Thus, one would expect observational data regarding a child’s level of social skills to correspond with the level indicated by the score on a questionnaire completed by the child’s teacher. Although concurrent validity may be difficult to demonstrate and may not be practically obtainable in many instances, we recommend, at minimum, that an observational instrument exhibit a high level of content validity before using it to make diagnostic assessments.

Determining the Reliability of the Observational Instrument

To ensure a high degree of reliability in the observations, observers should be trained to make and record their observations in a nonbiased, nonintrusive manner. When a number of observers are used, training and practice are necessary until a high level of interobserver reliability is achieved. Here, interobserver reliability refers to the extent to which two or more independent observers, observing the same situation without mutual consultation, using the same form, agree in their judgments and coding of the event. A simple index of interobserver reliability is a percentage based on the number of agreements between the judges, relative to the total number of judgments that they make, multiplied by 100. In other words, the number of agreements would be divided by the total possible number of agreements and then multiplied by 100. A score above 80 percent is generally regarded as fairly high. Thus, two judges who agree in eight out of ten judgments exhibit an 80 percent index of interobserver reliability.

It should be noted, however, that the larger the number of categories employed, the more difficult it is to reach such a high level of agreement. In other words, the greater the number of choices available for coding observations, the greater is the probability of disagreement. Thus, if there are two categories for recording a dimension, by chance alone, observers are likely to agree 50 percent of the time. If there are three categories, chance observations would lead to agreement 33 percent of the time even if they were both blindfolded during observation sessions. The probability of chance agreement would decrease with each additional response category.

In order to take the probability of agreement into account, interobserver reliability can be tested in two ways. The first involves collapsing contiguous categories of observation so that there are only two. Thus, if a rating scale includes four categories of anxiety (no anxiety, low anxiety, moderate anxiety, and high anxiety), and observations are roughly equally distributed among the categories, the no and low anxiety categories can be added together and treated as one category and the moderate and high anxiety categories added together and treated as another single category. Agreement will then be calculated on the basis of the two newly constructed categories. If the observations are not equally distributed among the categories, the split between the two should be done in such a way that roughly 50 percent of the observations fall into each of the newly constructed categories. Thus, if almost all the observations fall into the moderate and high anxiety categories, the no, low, and moderate anxiety categories should be collapsed and added together, and the high anxiety category should be kept intact.

Another procedure for roughly estimating reliability is used when there are more than two categories within a dimension and when the distribution of cases within categories is relatively equal. This procedure is based on the difference between the results that one would expect based on chance alone and an arbitrary standard set at 20 to 30 percent higher. So, for example, if a dimension has three categories, the expected agreement by chance alone would be 33 percent of the observations. By adding an additional 20 to 30 percent, a rough criterion of 53 to 63 percent agreement could be established as an acceptable standard of interobserver reliability. Likewise, if there were four categories, agreement in 45 to 55 percent of the observations could be established as an acceptable standard of interobserver reliability. A formula for more precisely calculating an estimate of interobserver reliability, the Kappa coefficient is often used in research. An acceptable Kappa score is at least .60, with higher scores being more desirable.

An important principle for estimating interobserver reliability is that the observations are made independently, without the mutual collaboration of the observers. Under these conditions, high agreement among the observers indicates that they understand the dimensions and categories of observation, that the instructions for recording observations are clear, and that the format for recording the observations works. An observational system that is unreliable should not be used. An unreliable system should either be discarded or the reasons for unreliability should be identified and modified until acceptable levels of reliability are attained.

For each observer used, a high degree of intraobserver reliability is also desirable. In this context, intraobserver reliability refers to the extent to which a single observer repeatedly records the same event in the same way. With videotaped or other electronically recorded events, this is computed relatively easily by viewing or listening to the taped events twice, observing and coding them, and computing a percentage based on the number of agreements in observations relative to the total number of observations made, multiplied by 100. Here again a 70-80 percent reliability is generally required. For observational dimensions in which more than two categories exist, the techniques suggested earlier for estimated interobserver reliability can be used. In either case, there should be a sufficient time-lag between observational sessions, or a sufficient number of events to observe, in order that the observer does not reproduce the observations from memory.

Finally, one should not assume that once established, a high degree of reliability is easily maintained. Over time, observers get tired or bored and their reliability declines. As they become more familiar with the situation to be observed, they may become less attentive to new aspects or prejudge the course of events. To protect against these possibilities, spot checks should be done throughout the assessment or evaluation. If a high degree of reliability is maintained, social workers can have more confidence in the diagnostic or treatment implications drawn from the findings.

Making the Observations

In attempting to be as objective as possible, observers should avoid influencing those being observed. The observers’ stance should aim for neutrality; they should refrain from intervening with the participants in the observed situation. Even if they say nothing, however, their mere presence may significantly affect the behaviors of those under observation. Because of this, it is sometimes helpful to give clients time to get used to the presence of the observer. Real data should not be collected until observers are relatively confident that the participants in the situation are able to ignore their presence and recording activities.

Systematic Observation in Action

Two school social workers who work in the elementary school setting frequently receive referrals from teachers of children who are described as “out of control,” “immature,” or “disruptive.” Quite often the teachers are specific about neither the behaviors to which these labels refer nor their frequency. Due to the heavy volume of referrals that the social workers receive, they need to evaluate which of the children referred are most immediately in need of social work intervention. In addition, they suspect that some of the referrals may be from teachers who are either less tolerant of behaviors that are developmentally appropriate or may be scapegoating particular children. If this is the case, in-service training for the teachers may help increase classroom control and decrease referrals. They wish to have a better understanding of the behaviors and their frequencies.

The social workers decide to devise an observational instrument that will make it possible to record the frequency of the behaviors teachers find disruptive among those children referred. Such an instrument would make assessment of individual children possible as well as comparison of these children with some of their classmates. In addition, the instrument might indicate whether some teachers are less tolerant than others of normal levels of activity among children or are biased in other ways with regard to specific children.

Determining What is To Be Observed

Social workers want to observe the classroom behaviors of children who have been referred to them for treatment. In addition they want to be able to compare these children’s behaviors with the behaviors of some of their classmates. For a start, they are concerned about monitoring behaviors that teachers are most likely to describe as the bases for their referrals. These behaviors generally include speaking without permission, getting out of seat without permission, arguing with other children, and refusal to comply with teacher’s instructions.

Assessing the Availability of Existing Observational Instruments

In the library the social workers discover a number of books and articles in the psychology and education field pertaining to “classroom management.” Some of these publications contain highly complex observational instruments that would be overly time-intensive to implement. However, some components of a few instruments are suggestive of observational dimensions that seem relevant. The social workers decide to use these elements as a guide to developing their own observational instrument.

Constructing the Observational Instrument

After talking with teachers and reviewing available literature, the social workers decide that there are four behavioral dimensions they would like to include in their instrument: inappropriate talking, appropriate talking, being out of seat inappropriately, being out of seat appropriately. An “other” dimension is included to cover behaviors not included in the foregoing dimension. The social workers then begin to develop categories of behavior that fall into each of the dimensions. For example, inappropriate talking may include such things as answering questions without being recognized by the teacher or talking to a classmate when it is not permitted. Appropriate talking may include answering a question when called upon by the teacher or talking to a classmate during a free period. Being out of seat inappropriately may include running around the class or actually leaving the class without permission. On the other hand, running an errand for the teacher or going to the bathroom with teacher’s permission are appropriate ways to be out of seat. Finally, “other” may include quietly working or reading.

Once the dimensions are specified and agreed upon, the social workers develop an observational form on which they can code, within specified intervals, the occurrence of behaviors of the children referred to them, as well as the behaviors of the neighboring children. Initially, they develop a form that is very simple, including only the five dimensions to be coded for each child. Later, a more refined version might be developed in which the categories within these dimensions could be coded as well. In its present state, however, the specific categories only serve as examples of indicators for coding each of the dimensions. The simplified form might look like this.

With this form individual observations are recorded by “hash marks,” which are totaled for each dimension and for each child at the end of the specified length of observation.

Determining the Frequency of Observations

Based on other aspects of their workloads, the social workers determine that for pretesting purposes it would be possible for them to engage in three fifteen-minute observational periods on three successive days. On one day observation will take place relatively early in the morning at 9:30 A.M. The next day observation will begin at 11:30 A.M. On the final day, observation will be conducted in the afternoon at 2:00 P.M.

At the beginning of each fifteen-minute observational period, the behavior of the referred child and the children immediately in front and behind him or her will be recorded on the form. Using a stopwatch, observers will repeat this process, every five minutes, resulting in three observations for each child during a thirty-minute observational period. At the end of each observational period, the results are tallied for each child. Then, at the end of the three-day period, these can be added together.

Choosing the Observers

During the pretesting stage of the instrument, both social workers observe the same children in the same classroom, at the exact same times. This common observation of the same events is necessary for checking interobserver reliability. In other situations it might be appropriate to have the teacher, a student teacher, or a teacher’s aide conduct simultaneous observations. In any case, once observers become well-trained in the use of the instrument, they may be used to collect the primary information without the presence of a dual observer.

Training Observers and Pretesting the Observational Instrument

Since the social workers were themselves the authors of the observational forms and the observers, no additional training was necessary. However, after pretesting the form, they found a few instances of disagreement, primarily in the category of talking behaviors. Through discussion, they arrived at a more refined definition of what constitutes appropriate and inappropriate talking behavior.

In the future, if the workers decide to have others use this instrument, they will need to offer training sessions and conduct reliability tests of the new observers’ use of the instrument. However, since they intend to use the instrument themselves, it is ready to be used by each of them for diagnostic assessment of the children referred to them.

Assessing the Validity of the Observational Instrument

The social workers judge the instrument to be valid based on the logical connection between the instrument and the behaviors that are of concern to the teachers and social workers. However, if there are vast discrepancies between teacher reports concerning specific children’s behaviors and the information generated by the forms, the question of concurrent validity arises. Thus one possible explanation is that the form has low concurrent validity and needs to be refined further. The other possibility is that the teacher’s reports are not valid and are, in fact, based on a biased perception of the particular child’s behavior. To deal with this problem, the social worker may want to do another round of observations first, before changing the instrument. If the results based on the structured observations remain consistently at variance with the teacher’s reports, then the teacher should be presented with this information and the social worker’s interventions may be directed toward the teacher.

Determining the Reliability of the Observational Instrument

After the pretest of the observational instrument, the social workers compare their results on each of the dimensions for each of the youngsters observed, across all three observation sessions. By dividing the number of agreements by the total number of observations for each dimension and multiplying by one hundred, they establish an 80 percent or higher interobserver reliability for each dimension. Consequently, they assume that they can each employ the instrument reliably enough to use it independently for diagnostic-assessment purposes.

Making the Observations

In order to avoid influencing the situation as much as possible, the social workers first made arrangements with the teachers in whose rooms they would conduct their observations. In addition, they spent about twenty minutes in the classroom the day before they conducted their systematic observation. Also, before each set of structured observations were conducted, the social workers spent ten minutes in the classroom so that teachers and students would be accustomed to their presence and presumably behave in a natural manner when the systematic observations began.


  1. Based on the school social work example above, develop a sampling procedure and simple rating scale that could be used if the social workers decided that they wanted to measure the duration rather than the frequency of the behaviors they were observing.
  2. With at least one colleague, observe fifteen minutes of a videotaped clinical interview. After the tape is finished, write a brief narrative statement describing the level of anxiety exhibited by the client. After each person has finished writing, discuss the indicators, such as client’s statements or nonverbal behaviors, on which you based your decision about level of anxiety. Next, devise a systematic observational instrument for recording the number of times the client exhibits one of the agreed-upon indicators of anxiety. Replay and code the tape. Wait a day, view once more, and repeat coding. Compute the interobserver and intraobserver reliability of the instrument. How reliable is it? Which form of reliability is higher? Comment on the validity of your observation instrument.