A very brief introduction to evaluation methods: Non-experimental, experimental, and quasi-experimental designs

When you are evaluating a program to see whether it was effective, there are three types of approaches: experimental, non-experimental, and quasi-experimental. The difference between them is that in an experimental design, the researcher determines which participants were in the treatment group (participated in our program) and which were in the control group (did not participate in our program). In non-experimental or quasi-experimental designs, the researcher does NOT control who gets the treatment. Quasi-experimental designs create a group for comparison after the fact.

One common experimental approach is the randomized control trial (RCT). This model comes from medical experiment where researchers might give one group of sick people a pill, another group no pill or a pretend pill, then see whether the people who got the pill got better. The word randomized refers to the fact that people are placed into the two groups randomly. The assumption behind an experimental model is that if the two groups were the same at the beginning of the experiment, they would be expected to be the same at the end. Any difference between the treatment group and the control group can be attributed to the program (or pill or intervention) itself. I’ve written on some of the problems with RCT here.

If an RCT is not feasible, affordable, or ethical, some other options exist. The easiest is a non-experimental design, the simple pre-post.


Two Girl's Soccer Teams race for the ball.


Pre-post design
The simple pre-post design shows whether the group that participated in our program is different at the end than they were at the beginning. From the perspective of being able to attribute the change to our intervention, it has a couple of flaws. The first is selection bias. If people elected to participate in your program, they might be especially likely to benefit from it. They might be better informed, more motivated, or more organized – characteristics that could influence their outcomes.

Second, while you were running your program, everyone will have matured – this is especially going to be a problem if your intervention involves children or youth. Everyone will be dramatically different and know more at the end of a year than they did at the beginning. Even adults will change as they are exposed to new information and changes in their environment. Statisticians call this problem history effects.

A simple pre-post design might work if you would not expect much change to come from the environment. For example, if your program teaches very specific content that people would not pick up in their everyday interactions or if you could show that there had been very little change in an important variable for a long period before the intervention. For example, low-wage workers’ incomes would not be expected to change much from year to year, so their income before a training program could be compared to their income after the program.

Most experimental designs and quasi-experimental designs try to address history effects by using a different group of people who are aging and maturing over the same period of time and for the same amount of time. Since they should be influenced by the same history effects as our experimental group, the difference we see between them and the experimental group should be attributable to our program.


Matched individuals
The most complex quasi-experimental designs are based on constructing control groups from individuals who did not participate in our program. Access to administrative datasets is usually required for this. For example, you could compare the wages of participants in your training program to similar workers across the state using Unemployment Insurance datasets, or you could compare educational outcomes between groups of students using test data. In these cases, you would match the individuals in your control group to the individuals in your treatment group by the variables you know, such as age, grade, gender, and race/ethnicity. One way of doing this is propensity score matching.

Unfortunately, for most small-scale studies methods like propensity score matching won’t be feasible because of lack of access to the required administrative datasets and because these methods require a lot of data.

Matched groups
Another approach is to match an entire group that is similar to ours. For example, if we create an intervention in one school, we might compare the outcomes (average test scores, for example) at our school to a different school that is as similar as possible, or maybe a few schools that are similar to ours. If the intervention affected our state (for example, a policy change), we might compare outcomes to other states that have similar populations.

These methods assume that the change that we observe in the control group is caused by history effects. Any change in the treatment group over and above the change that we see in the control group can be attributed to our program or intervention.


Any of these methods can work. Choosing one will depend on the need for direct attribution and the feasibility and expense of a large study. You should not feel that a simple design is not worth doing. It might not be perfect, but it is usually better than no study and can help lay the groundwork for more complex studies at a later date. If I can help you set up a research design, click the box below to set up a quick conversation!


Pieta Blakely

About Pieta Blakely

I help mission-based organizations measure their impact so that they can do what they do well. I started my nonprofit career as a teacher in workforce development and adult basic education. It was important work and I was worried that we didn’t really know if we were doing it well. In the process of trying to answer that question, I got a Masters in Education and a PhD in Social Policy, and became an evaluator.

Leave a Comment