(I Can’t Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research

Frank J. van Rijnsoever, Conceptualization , Data curation , Formal analysis , Investigation , Methodology , Resources , Software , Validation , Visualization , Writing – original draft , Writing – review & editing 1, 2, *

Frank J. van Rijnsoever

1 Innovation Studies, Copernicus Institute of Sustainable Development, Utrecht University, Utrecht, The Netherlands

2 INGENIO (CSIC-UPV), Universitat Politècnica de València, Valencia, Spain

Find articles by Frank J. van Rijnsoever Gemma Elizabeth Derrick, Editor

1 Innovation Studies, Copernicus Institute of Sustainable Development, Utrecht University, Utrecht, The Netherlands

2 INGENIO (CSIC-UPV), Universitat Politècnica de València, Valencia, Spain Lancaster University, UNITED KINGDOM Competing Interests: The author has declared that no competing interests exist. Received 2017 Feb 20; Accepted 2017 Jul 4. Copyright © 2017 Frank J. van Rijnsoever

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Associated Data

S1 Appendix: Technical details. Mathematical details of the simulation. GUID: A3654B81-0318-46BF-9EE0-DDA6BE35E9DF S1 File: R-code for the simulations. Code for the simulations in R. GUID: AAAF99BB-114B-4610-97B1-09352AFBB173 S2 File: Simulated data. The simulated data set used for this study. GUID: CE52D7B6-6EF2-41C4-941E-5E0E27D26610

All relevant data are within the paper and its Supporting Information files.

Abstract

I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario.

Introduction

Qualitative research is becoming an increasingly prominent way to conduct scientific research in business, management, and organization studies [1]. In the first decade of the twenty-first century, more qualitative research has been published in top American management journals than in the preceding 20 years [2]. Qualitative research is seen as crucial in the process of building new theories [2–4] and it allows researchers to describe how change processes unfold over time [5,6]. Moreover, it gives close-up and in-depth insights into various organizational phenomena [7,8] perspectives and motivations for actions [1,8]. However, despite the explicit attention of journal editors to what qualitative research is and how it could or should be conducted [8–10], it is not always transparent how particular research was actually conducted [2,11]. A typical topic of debate is what the size of a sample should be for inductive qualitative research to be credible and dependable [9,12] (Note that in this paper, I refer to qualitative research in an inductive context. I recognize that there are more deductive-oriented forms of qualitative research).

A general statement from inductive qualitative research about sample size is that the data collection and analysis should continue until the point at which no new codes or concepts emerge [13,14]. This does not only mean that no new stories emerge, but also that no new codes that signify new properties of uncovered patterns emerge [15]. At this point, “theoretical saturation” is reached; all the relevant information that is needed to gain complete insights into a topic has been found [1,13]. (Note that to prevent confusion, I use the term ‘code’ in this article to refer to information uncovered in qualitative research. I reserve the term ‘concept’ to refer to the concepts in the theoretical framework).

Most qualitative researchers who aim for theoretical saturation do not rely on probability sampling. Rather, the sampling procedure is purposive [14,16]. It aims “to select information-rich cases whose study will illuminate the questions under study” [12]. The researcher decides which cases to include in the sample based on prior information like theory or insights gained during the data collection.

However, the minimum size of a purposive sample needed to reach theoretical saturation is difficult to estimate [9,17–22].

There are two reasons why the minimum size of a purposive sample deserves more attention. First, theoretical saturation seems to call for a “more is better” sampling approach, as this minimizes the chances of codes being missed. However, the coding process in qualitative research is laborious and time consuming. As such, especially researchers with scarce resources do not want to oversample too much. Some scholars give tentative indications of sample sizes that often lie between 20 and 30 and are usually below 50 [23,24], but the theoretical mechanism on which these estimates are based is unknown.

Second, most research argues that determining whether theoretical saturation has been reached remains at the discretion of the researcher, who uses her or his own judgment and experience [9,22,25,26]. Patton [12] even states that “there are no rules for sample size in qualitative inquiry” (p. 184). As such, the guidelines for judging the sample size are often implicit. The reason for this is that most qualitative research is largely an interpretivistic endeavor [27] that requires flexible creative thinking, experience, and tacit knowledge [9]. However, researchers from the field of management [8,11,28], information sciences [24,29], health [30,31] and the social sciences in general [12,13,32,33], acknowledge the need for transparency in the process of qualitative research. Moreover, not all researchers have the required experience to assess intuitively whether theoretical saturation has been reached. For them, articulating the assessment criteria in a set of guidelines can be helpful [33].

In this paper I explore the sample size that is required to reach theoretical saturation in various scenarios and I use these insights to formulate guidelines about purposive sampling. Following a simulation approach, I assess experimentally the effects of different population parameters on the minimum sample size. I first generate a series of systematically varying hypothetical populations. For each population, I assess the minimum sample sizes required to reach theoretical saturation for three different sampling scenarios: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. The latter two are purposive sampling scenarios.

The results demonstrate that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, when the mean probability of observing codes is low, the minimal information and maximum information scenarios are much more efficient in reaching theoretical saturation than the random chance scenario. However, the purposive scenarios yield significantly fewer multiple observations per code that can be used to validate the findings.

By using simulations, this study adds to earlier studies that base their sample size estimates on empirical data [16,17], or own experience [22]. Simulating the factors that influence the minimum purposive sample size gives these estimates a theoretical basis [34]. Moreover, the simulations show that the earlier empirical estimates for theoretical saturation are reasonable under most purposive sampling conditions. To my knowledge, there is one earlier study that uses simulations to predict minimum sample size in qualitative research based on random sampling [35]. The present study extends this work by taking into account the process of purposive sampling, using different sampling scenarios.

Based on my analyses, I offer a set of guidelines that researchers can use to estimate whether theoretical saturation has been reached. These guidelines help to make more informed choices for sampling and add to the transparency of the research, but are by no means intended as mechanistic rules that reduce the flexibility of the researcher [10].

In section 2, I discuss the theoretical concepts about purposive sampling. Section 3 describes the simulation, and the results are presented in section 4. In section 5, I draw conclusions, discuss the limitations, and offer recommendations.

Theoretical concepts

I base this section largely on the existing literature on purposive sampling. I also introduce some new ideas that are sometimes implied by the literature, but that were never conceptualized. Table 1 summarizes the main concepts in this paper, and the symbols used to denote them.

Table 1

An overview of the main concepts, definitions and symbols.
ConceptDefinitionSymbol
Information sourceThe unit from which information is gatheredi
PopulationThe total set of information sources that are potentially relevant to answering the research questionJ
Sub-populationA subset of information sources that are potentially relevant to answering the research questionj
Sampling stepThe number of information sources sampled so farn
CodeA unique piece of information in the population relevant to the researchCk
Number of codesThe number of unique pieces of information relevant to the research in the populationk
Theoretical saturationAll codes are observed at least once.s
Probability of reaching theoretical saturationThe probability that each code is observed at least oncepn
Sampling steps to reach theoretical saturationThe number of sampling steps needed to observe each code at least oncens
Mean probability of observing codesThe mean probability that a code is observed at an information source Φ c ¯
Repetitive codesCodes that are observed more than once.-
Minimum number of repetitive codesThe minimum number of times that a code needs to be observedv
Sampling strategyHow the researcher selects the information sources; commonly empirically based.-
Sampling scenarioThree theory based scenarios on how the sampling process proceeds: random chance, minimal information, maximal information-
EfficiencyThe fewer sampling steps that a scenario requires to reach theoretical saturation, the more efficient it is-

Populations, information sources, and sampling steps

A population is the “universe of units of analysis” from which a sample can be drawn [36]. However, in qualitative research, the unit of analysis does not have to be the same as the unit from which information is gathered. I call the latter “information sources.” In the context of interviews, information sources are often referred to as informants [16,37], but they can be any source that informs the researcher: other examples are sites to collect observational data, existing documents, or archival data. I refer to the total set of information sources that are potentially relevant to answering the research question as the population.

From this population, one or multiple information sources are sampled as part of an iterative process that includes data collection, analysis, and interpretation. At each iteration the researcher has the opportunity to adjust the sampling procedure and to select a new information source to be sampled. I assume in this paper that at each iteration only one source is sampled; this assumption has no further consequences for the remainder of the paper. Moreover, I use the term “sampling steps” rather than iterations, as this excludes analysis and interpretation. Finally, contrarily to formal quantitative sampling terminology, I count as sampling steps only observations that participated in the research, thus excluding non-response or the inability to access sources. This eases interpretation.

Sub-populations

A population of information sources is usually not homogeneous. Multiple sub-populations can often be distinguished, for example the difference between interviewees, documents, or focus groups. This is important as the researcher can choose different sampling procedures and data collection methods for each sub-population. The exact delineation of sub-populations depends on the judgment of the researcher. However, I argue there are a number of restrictions on the delineation of sub-populations.

First, if there are differences in the type of information source, sampling strategy, type of data, data collection, or methods of analysis, then there are sub-populations. The reason for this criterion is that different methods are needed. These different methods need to be accounted for [32] as they can explain differences in outcomes.

Second, information sources should be interchangeable at the sub-population level. Within a sub-population, no single information source may be critical for reaching theoretical saturation. Hence, no single information source in a sub-population can contain information that is not found in other information sources in that sub-population. The reason for this criterion is that if a particular information source is critical for theoretical saturation, it should by definition be included in the research. Observing critical information is not guaranteed if the inclusion is dependent on a particular sampling strategy. A critical information source should then be treated as a separate sub-population of size one.

Second, if cases or groups are compared, it is important to treat these as sub-populations. For example, distinguishing between sub-populations is a condition for data triangulation, because the researcher effectively compares the results from one sub-population (for example interviews with managers) with the results from another (for example annual reports). Furthermore, comparative case studies [4,38] involve the comparison of sub-populations.

The concept of sub-populations implies that theoretical saturation can be reached at the level of the overall population or at the level of the sub-population. Reaching theoretical saturation in all the sub-populations is not a condition for reaching theoretical saturation at the level of the population, since sub-populations can have an overlap in information. However, it is necessary to reach theoretical saturation in each sub-population in comparative research or when triangulating results, as this is the only way to make a valid comparison.

Codes and theoretical saturation

In most cases of inductive qualitative research, information is extracted from information sources, interpreted and translated into codes. I refer to codes here in the context of inductive qualitative data analysis, which means that they can be seen as “tags” or “labels” on unique pieces of information [13]. Codes can represent any sort of information and may be related to each other (for example phenomena, explanations or contextualization). The only conditions that I impose are that each code represents only one piece of information and that two different codes are not allowed to represent the same information. In practice, this means that synonyms are removed during qualitative data analysis. Thereby, codes can be interpreted as unique “bits” of information.

The population contains all the codes that can be potentially observed. At the start of a study, the codes in the population are unobserved and the exact number of codes in the population is unknown. Consulting information sources sampled from the population allows codes to become observed. Theoretical saturation is reached when each code in the population has been observed at least once.

Number of codes and mean probability of observing codes

I let, the number of sampling steps required to reach theoretical saturation depend on two population characteristics. First, the larger the number of codes distinguished in the population, the more sampling steps are required to observe them all. The number of codes can vary greatly per study, depending on the complexity of the research question, and the amount of theory in the literature. A number of 100 is common. Second, the more often a code is present in the population, the larger are the chances that it will become observed. As theoretical saturation takes place at the population level, the distribution of codes in the population is important. For example, interviews can vary in length or some documents can contain more relevant information than others. In general, one would expect that the higher the “mean probability of observing codes” in a population is, the fewer sampling steps are required to reach theoretical saturation. By definition, these probabilities vary between 0 and 1. A mean probability of observing codes of 0.5 means that, on average, a code is observed at 50% of the information sources.

Purposive sampling allows the researcher to make an informed estimation about the probability of observing a given code at each sampling step, using (theoretical) prior information, like sampling frames [39] or insights gained during the data analysis. (This conceptualizing of purposive sampling is also consistent with the notion of theoretical sampling. Both terms are often used interchangeably. Theoretical sampling can be seen as a special case of purposive sampling [14]). However, when the number of codes is large, it is easier simply to estimate the mean probability of observing all the codes in the population. To make such estimations, it is important to consider what the probability of observing codes actually represents. The probability of a code being present at least depends on: the likelihood of an information source actually containing the code, the willingness and ability of the source (or its authors) to let the code be uncovered, and the ability of the researcher to observe the code. These probability estimations are based on the characteristics of the information source and the researcher. The probability of observing a certain code can decrease when the information source (for example an interviewee) has strategic reasons not to share information. The strategic behavior of actors can also lead to the discovery of other additional codes about the motivations and the actions of these actors. The relevance of these codes depends on the research question. In addition, if the researcher has less experience with the technique used to uncover codes from a source or with correctly interpreting information during the data analysis, the probability of observing codes decreases. Having multiple independent coders, on the other hand, can increase the probability of observing a code.

Repetitive codes

Some researchers consider codes that are observed more than once as redundant, since they do not add new information to the data [12,26,32]. I refer to codes that are observed more than once more neutrally as “repetitive codes.” Repetitive codes are important for a methodological purpose: they can help guard against misinformation. That is, information sources may have given false codes, for reasons of social desirability, strategy, or accidental errors.

To guard against misinformation and to enhance the credibility of the research, it can be advisable to aim for a sample in which each code is observed multiple times (this also follows from the logic behind triangulation). One could argue that if a code, after a substantial number of sampling steps, is still observed only once while almost all other codes have a higher incidence, a critical examination of the code is warranted. In many cases, the researcher may already be suspicious of such a code during the analysis. A frequency of one does not mean that the code is wrong by definition; it is possible that the code is just rare or that the low frequency is just a coincidence. However, it is relatively easy to make an argumentative judgment about the plausibility of rare codes (for example based on theory).

Sampling strategies, sampling scenarios, and efficiency

A sampling strategy describes how the researcher selects the information sources. The most elaborate inventory of sampling strategies comes from Patton [12], who identifies 15 purposive sampling strategies for qualitative research. Examples include “maximum variation sampling,” “typical case sampling,” and “snowball sampling”. These strategies are based strongly on research practices, but the underlying theoretical criteria for distinguishing between the strategies are left implicit. For example, a criterion that can explain the difference between “maximum variation sampling,” “typical case sampling,” and “extreme case sampling” is the focus of the research question. “Snowball sampling” and “opportunistic sampling” differ in the way in which they obtain information about the next information source that is to be sampled. “Confirming or disconfirming sampling” and “including politically sensitive cases” as strategies are motivated by a delineation of the population. Overall, Patton [12] acknowledges that purposive sampling in qualitative research can be a mixture of the strategies identified and that some of these strategies overlap. These strategies also make implicit assumptions regarding the prior knowledge of the researcher about the population. For example, “extreme case sampling” implicitly assumes that the researcher has knowledge about the full population; otherwise, he or she would be not be able to identify the extreme cases. “Snowball sampling” assumes that the researcher does not have full knowledge of the population, as relevant leads are only identified at each sampling step.

I use the concepts described above to formulate three generic sampling scenarios. I refer to sampling scenarios to avoid confusion with the sampling strategies. The term scenarios term signifies that they are based on theoretical notions, instead of empirical data or observed practices. The three sampling scenarios are based on the number of newly observed codes that a sampled information source adds. This criterion is motivated by the premise of purposive sampling: based on the expected information, the researcher makes an informed decision about the next information source to be sampled at each sampling step. This informed decision implies that the researcher can thus reasonably foresee whether, and perhaps how many, new codes will be observed at the next sampling step. The fewer sampling steps that a scenario requires to reach theoretical saturation, the more efficient it is.

The three scenarios that I identify are “random chance,” “minimal information,” and “maximal information.”

Random chance assumes that the researcher does not use prior information during each sampling step. The researcher randomly samples an information source from the population and adds it to the sample. This scenario is solely based on probability and is considered to be inappropriate for most qualitative studies [14,16]. However, there are good reasons to include this scenario. First, there are conditions under which random chance is an appropriate scenario for sampling. One of these is when no information is gained about the population during the sampling steps, such as when documents or websites are analyzed. Second, random chance can be seen as a worst-case scenario. If a researcher is uncertain about how a sampling process actually worked, it is always possible to explore whether theoretical saturation would have been reached under the conservative conditions of random chance. Third, random chance is the only scenario for which the number of sampling steps can be calculated mathematically. Finally, the random chance scenario can serve as a benchmark to which the number of sampling steps in the other scenarios can be compared.

Minimal information is a purposive scenario that works in the same way as random chance, but adds as extra condition that at least one new code must be observed at each sampling step. This is equivalent to a situation in which the researcher actively seeks information sources that reveal new codes, for example by making enquiries about the source beforehand. It is not uncommon for a researcher to discuss topics with a potential interviewee prior to the actual interview to assess whether the interview will be worthwhile. The minimal information scenario captures these kinds of enquiries. Similarly, researchers may be referred to a next source that adds new codes as part of a snowball strategy. Overall, the criterion of observing at least one new code per sampling step seems to be relatively easy to achieve as long as the researcher has some information about the population at each step. This makes the scenario broadly applicable and more efficient than random chance.

Maximal information is a purposive scenario that assumes that the researcher has almost full knowledge of the codes that exist in the population and the information sources in which they are present. At each sampling step, an information source is added to the sample that leads to the largest possible increase in observed codes. This scenario is in line with the theoretical aim of purposive sampling. However, it does not reflect scenarios where populations' sizes are unknown and too large. It makes large assumptions regarding the prior knowledge of the researcher about the population. An example of when this scenario might be realistic occurs when the researcher is extremely familiar with the field and the specific setting that he or she is investigating.

Simulation

I use simulations as they allow me to assess the effects of the three scenarios for a series of hypothetical populations that vary systematically regarding (1) the number of codes in the population and (2) the mean probability of observing codes. The controlled setting allows me to assess the relative influence of each of these factors on the reaching of theoretical saturation. In an empirical setting, this would not be possible, because the researcher can generally not control the characteristics of the population under study, because the number of populations that can be studied is limited and because it is never entirely certain whether theoretical saturation has been reached [27].

To keep the paper readable for audiences with either a quantitative or qualitative background, I minimize the mathematical details in the main text as much as possible. The full technical details of the simulation are in S1 Appendix, which can be read instead of sections 3.1 and 3.2. To relate sections 3.1 and 3.2 to S1 Appendix, I assign symbols to the most important concepts in the main text, and refer to the appropriate sections of S1 Appendix.

Definitions

I denote the number of sampling steps to reach theoretical saturation by ns, and the number of codes in the population as k. Theoretical saturation is reached when all k codes are observed (see S1 Appendix Section A: Definitions). I further denote the mean probability of observing codes as Φ c ¯ . I take the mean, because not all codes have the same probability of being observed. This means that some codes are more difficult to uncover than others (see S1 Appendix Section B: Mean probability of observing codes). However, making the unrealistic assumption that all codes have the same probability of being uncovered allows me to calculate the number of sampling steps mathematically (see S1 Appendix Section C: Reaching theoretical saturation). This calculation is not a result of the paper, it only helps me to validate results from the simulations. When there is a difference in probabilities of observing codes, it is not possible to mathematically calculate the number of sampling steps. Therefore, I use simulations. I denote the required minimum number of occurrences of a code by v. I will calculate the effect of this factor on the number of sampling steps for theoretical saturation (see S1 Appendix Section D: Repetitive codes). Finally, my simulations apply to the sub-population level, the results for the sub-populations can be aggregated to the population level (see S1 Appendix Section E: From sub-population to population).

Simulation of scenarios

Using the R-program [40], I generate 1100 hypothetical populations of 5000 information sources. The populations vary systematically by the number of codes (k) from 1 to 101 with increments of 10. I let the mean probability of observing codes vary between 0.09 (1/11) and 0.91 (10/11) (see S1 Appendix Section F: Simulation). Further, in line with my earlier argument about interchangeability of information sources, I impose a condition whereby each code should actually be present in at least two information sources in the population.

For each hypothetical population, I simulate the number of sampling steps necessary to reach theoretical saturation under the three scenarios from section 2.5. Fig 1 gives a schematic overview of how the algorithms for each scenario operate. The full R-code is available as S1 File: R-code for the simulations, the resulting data is available as S2 File: Simulated data.