Understanding the Overlap in Programme Evaluation Terminology

Tue, 05/31/2005 - 10:38

2 comments

Summary

This appendix serves to supplement the commentary in The Drum Beat (Issue #302) by Jane Bertrand, dated June 6, 2005.

The available evaluation textbooks often provide useful glossaries of terms that help the health professional or student of programme evaluation to grasp the lexicon of terms used in this field. However, many people are unclear how one set of terms (e.g., formative-process-summative evaluation) relates to another (e.g., input-process-output-outcome). Are they totally different? Do they overlap?

Figure 1 presents the most widely used terms in programme evaluation and is designed to shed light on this issue of overlapping terminology. Specifically, it outlines what different types of evaluation actually measure and how one set of terms relates to another. The figure consists of a series of six horizontal bars, which are meant to be read (1) left to right, to get a sense of chronological order, as in the case of formative, process, and summative evaluation, and (2) vertically, to compare the terms on one bar to those in the same "column" on another, with the aim of identifying how one set of terms (formative-process-summative) relates to another (inputs, process, outputs, outcomes). For example, the bar on "programme evaluation" runs the full width of the chart, indicating that it is an umbrella term that covers the full range of different types of evaluation. The same is true of the term "monitoring and evaluation" (M&E). The different bars of the chart are as follows.

What is measured? The first bar on the chart (in yellow) describes "what is measured" in terms that do not require specialised knowledge of evaluation. These include level of funding, activities completed, service statistics, and changes in behaviour, to name a few.

Formative, process, and summative evaluation. The second bar - in blue - relates the terms "formative, process, and summative" to "what is measured" in bar 1. For a fuller description of the terms, see The Drum Beat #302 (June 2005).

Monitoring inputs, outputs, and outcomes. The third bar in lighter blue indicates how the terms "input, output, and outcome" link to the previous two bars. One might question why "process" doesn’t get its own box on the chart. Indeed, we often hear mention of "process indicators." In this chart, we have instead used the terms "functional outputs," which in fact quantify the activities conducted (e.g., number of persons trained, number of booklets produced, number of schools enrolled, number of community meetings held). This contrasts to "service outputs," which relate to measures of quality of care and access to services, in those programmes where they apply. For example, "length of waiting time" would be one service output. "Number of VCT clinics per 100,000 population would be another example of a service output, in this case measuring access. The third type of output measures service utilisation: number of clinic visits, number of VCT tests performed, number of ORS packets distributed. Box 1 provides more detailed definitions of input-process-output-outcome.

Figure 1 attempts to shed some light on the term "monitoring," which is one of the most widely used terms in the field of programme evaluation (see the light green bar). Indeed, monitoring is a very broad term that can apply to tracking many aspects of the programme, from inputs used to start the programme to long-term outcomes such as fertility or mortality. The term "monitoring" generally implies some type of tracking or measurement over time, without benefit of causal attribution. For example, government and donor agencies may track the level of contraceptive prevalence in a given country over a 20+ year period, with a strong notion that their programmatic inputs have contributed to the continued rise in contraceptive prevalence. However, this type of monitoring does not take into account and can not rule out other factors (e.g., changes in economic conditions, urbanisation, literacy levels) that may also influence this change in prevalence over time.

Impact assessment. The light green bar distinguishing monitoring (tracking change) from impact assessment (i.e., did the intervention in question cause the observed results?) In Figure 1, we set apart - as if in a class by themselves - those evaluation methodologies that allow for causal attribution. As described in the Drum Beat Commentary, these include experimental designs (rarely feasible when evaluating large scale communication programmes), quasi-experimental designs, or post-test only cross-sectional studies using advanced statistical techniques (e.g., propensity scoring, structural equation modeling). These designs yield causal inferences that are probabilistic or plausible, depending on the rigor of the design (Habicht et al, 1999).

Programme performance. The fourth bar involves "programme performance" (in pink). This term relates to the different types of monitoring (inputs, outputs, and behavioural outcomes), but stops short of long-term outcomes. The reason is that factors other than the programme also contribute substantially to these changes (e.g., in fertility, mortality, and morbidity), such as the state of the economy or the social status of women.

M&E and Programme Evaluation. The final two bars - "monitoring and evaluation" (peach-colored bar) and "programme evaluation" (light turquoise) run the full width of the page, indicating that both are umbrella terms used to describe any aspect along with continuum. Regarding M&E, some have made the distinction that monitoring refers to the act of tracking outputs or outcomes, whereas evaluation explains why the change occurred. However, this distinction is by no means universally accepted. It is probably more truthful to say that within the field of evaluation, there is no common agreement where monitoring ends and where evaluation starts, especially since "programme evaluation" is often used to describe the full gamut of evaluation activities.

The underlying premise to the framework in Figure 1 is a causal chain: that inputs are used to create and implement a series of activities (processes) that produce measurable outputs and lead to outcomes. Missing from this causal chain of events is some sense of context that will influence the process. For example, a donor agency may give ample funding for the planning and implementation of an HIV/AIDS programme, resulting in an elaborate mass media programme and community-based activities. Yet this may or may not lead to a change in HIV/related behaviour or avert the transmission of HIV, depending on competing motivations and barriers to behaviour change that are deeply embedded in the socio-cultural environment.

It is not particularly useful to the field for one researcher or set of researchers to play "word police" regarding the terminology used in programme evaluation. On the other hand, health professionals and students trying to understand the basic terminology used in this field may find the overlap of concepts and the different classifications for types of evaluation somewhat confusing. Figure 1 represents the attempt of one researcher to bring greater understanding of the range of terms use in connection in the programme evaluation.

Box 1. Input-Process-Output-Outcome

This set of terms is also in widespread use, especially in connection with log frames (logical frameworks). The specific terms are designed in the following ways (Bertrand and Escudero, 2002):

Input: The human and financial resources, physical facilities, equipment, and operational policies that enable programme activities to be implemented.

Process: Processes refer to the multiple activities - both planning and implementation - carried out to achieve the objectives of the programme.

Output: Output measures results of these activities at the programme level, in two forms: the number of activities performed (e.g., number of service providers trained, number of radio spots aired, number of students reached in a school-based programme) and measures of service utilisation (e.g., number of contraceptives distributed, number of oral rehydration packets sold, number of calls to an HIV/AIDS hotline, number of antenatal visits).

Outcomes: Outcomes relates to the measurement and reporting of indicators of the status of the social or health conditions the programme is accountable for improving. Often outcomes are synonymous with the behavioural results the programme attempts to achieve. The measure of outcome may be restricted to those participating in the programme (e.g, the percent of clients in a stop smoking programme who are nonsmokers six months after the programme). In evaluating large scale programmes in the international context, outcome is often population-based: the percent of married women 15-44 using contraception, the percent of youth 15-25 who used a condom at last sex with a casual partner), the percentage of mothers who exclusively breastfeed for the first six months.

Health communication programmes designed to change behaviour at the individual level often draw on theoretical models that specify psychosocial factors hypothesised to influence behaviour change (i.e., knowledge, attitudes, self-efficacy, and risk perception, among others). A given evaluation may track change along the causal pathway, in which case it can be useful to differential between tree types of outcomes:

Initial: knowledge, attitudes, beliefs, self-efficacy, perceived risk
Intermediate: behaviours (e.g., contraceptive use, condom use, breastfeeding, use of bed nets)
Long-term: mortality, morbidity, fertility

One advantage of using the term "long-term outcome" rather than "impact" to measure changes in mortality, morbidity, or fertility is that it does not presume that the programmatic intervention (under evaluation) caused this change. Rather, it implies that it contributed - along with other factors including macro-level social and economic variables - to this change.

References

Bertrand, Jane T. (2005). "Evaluating Health Communication Programmes," The Drum Beat, Issue 302, The Communication Initiative.

Bertrand, Jane T. and Gabriela Escudero. (2002). Compendium of Indicators for Evaluating Reproductive Health Programs. MEASURE Evaluation Project, Carolina Population Center, University of North Carolina at Chapel Hill, MEASURE Evaluation Manual Series, no. 6.

Habicht, J.P., C.G. Victora, and J.P. Vaughn. (1999). Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. International Journal of Epidemiology 28:10-18.

Comments

I do not agree totally with the definition above for outcome - may be we should differentiate between outcome and impacts. And we can also have different generations of impact: oucome is, in my view, the very first result of an output (product or event): the number of people attending a meeting is an outcome, or the number of people listening to a radio program is an outcome. This does not mean "per se", that these people have a) rationally understood, b)emotionally accepted the message. This understanding and acceptation could be the first generation of impact. Then we have the consequences: the use of condoms, or the boiling of water.. (if this is economically and socially feasible) = the behavior change could not occur, but it is the second generation change. Which may generate a third generation change: less diseases, for example. And so on. Anyhow, BRAVO to Jane T. Bertrand !!

Extremly usfull - thank you

Legacy Partners

Understanding the Overlap in Programme Evaluation Terminology

Comments

Red de La Iniciativa de Comunicación

Soul Beat Africa Network

The Drum Beat Network