Data analysis

You have your data and now it’s time to try and make sense of it. This means that you need to conduct an analysis of the raw information that you obtained via your survey, focus group or interview. It’s important to understand that, no matter how objective you are trying to be, there is always an element of subjectivity in your analysis. Even when you’re doing a statistical analysis of the information you will need to make a conscious choice about what tests you’re going to do, and what data you’ll do those tests on. Depending on what variables you choose, you may end up with a different outcome.

Sometimes it can feel quite disorientating to start with your analysis because the “right” approach may not be clear to you. In particular, the texts used in qualitative data analysis are often multilayered and open to a variety of interpretations, which is especially confusing when you’re just starting out.

Quantitative data analysis

Quantitative data analysis is a powerful research form, emanating in part from the positivist tradition. It is often associated with large-scale research, but can also serve smaller scale investigations, with case studies, action research, correlational research and experiments.

Cohen, Manion & Morrison (2007)

There are several key concepts to understand with respect to the statistical analysis of quantitative data; scales of data, parametric and non-parametric data, descriptive and
inferential statistics, dependent and independent variables, and statistical significance.

Scales of data

Nominal scale: numbers are used to denote categories e.g. 1 = male, 2 = female. These categories are exclusive and have no numerical value. In the example above 1 does not = the number 1. “1” is used to represent the category “male”. A nominal scale uses classification.
Ordinal scale: used to classify data as well as introduce an order e.g. Strongly agree, Agree, Neutral, Disagree, Strongly disagree. A Likert scale is an example of an ordinal scale. It is important to understand that there is no regular calibration between intervals i.e. the difference between Strongly agree and Agree may not be the same as the difference between Agree and Neutral. An ordinal scale adds the concept of order to the concept of classification.
Interval scale: there is a regular and equal interval between each data point on the scale. This lets us know how far apart are the data points on the scale. An interval scale adds the concept of a metric (a measure) to the previous two scales; classification and order. It is important to note that there is no concept of true zero in any of these scales.
Ratio: a ratio keeps the concepts of classification, order, metrics, but adds the concept of zero. This means that ratios are types of scales that you can conduct arithmetic on. In other words, ratios allow the values on the scale to be manipulated with calculation (addition, multiplication, etc.). Distance, money, population, time spent on tasks, income, Celsius temperature, and marks on a test are all ratio measures as they are capable of having a ‘true’ zero quantity.

Parametric and non-parametric data

Non-parametric data make no assumptions about the population, usually because the characteristics of the population are unknown. Parametric data assume knowledge of
the characteristics of the population. This is done in order for inferences to be able to be made securely.

The distinction matters because the decision around which statistical test to use is dependent on the kinds of data that have been gathered. Nominal and ordinal data are considered to be non-parametric, while interval and ratio data are considered to
be parametric. You cannot apply parametric statistics to non-parametric data, although it is possible to apply non-parametric statistics to parametric data. Non-parametric data are often derived from questionnaires and surveys (though these may also gather parametric data), while parametric data tend to be derived from experiments and tests (e.g. examination scores).

Descriptive and inferential statistics

Descriptive statistics are simply those that describe the data and make no attempt to infer or predict; they simply report what has been found. Descriptive statistics include the (Cohen, Manion & Morrison, 2007):

Mode: the score obtained by the highest number of people
Mean: the average score
Median: the score obtained by the middle person in a ranked group
Minimum and maximum scores:
Range: the distance between the highest and lowest scores
Variance: a measure of how far scores are from the mean
Standard deviation: a measure of the dispersal or range of the scores
Standard error: the standard deviation of sample means
Skewness: how far the data are asymmetrical in relation to a “normal” distribution
Kurtosis: how steep or flat is the shape of a graph or distribution of data

Inferential statistics attempt to make inferences or to predict something based on the data. These can include hypothesis testing, correlations, regression and multiple regression, difference testing (e.g. t-tests and analysis of variance), factor analysis, and
structural equation modelling.

Dependent and independent variables

Quantitative data analysis is often concerned with the relationship between variables (in this context, a variable considered to be any property that the researcher is interested in). An independent variable is an input variable that causes, in part or in total, a particular outcome. It is a stimulus that influences a response. A dependent variable is the outcome variable, which is caused, in total or in part, by the input variable. It is the effect, consequence of, or response to, an independent variable.

It is important to understand this fundamental concept of statistical analysis because of the following issues that arise (Cohen, Manion & Morrison, 2007):

The direction of causality is not always clear: an independent variable may become a dependent variable and vice versa.
The direction of causality may be bidirectional. Assumptions of association may not be assumptions of causality. Just because the variables are related does not mean that the relationship is causal.
There may be a range of other factors that have a bearing on an outcome.
There may be causes (independent variables) behind the identified causes (independent variables) that have a bearing on the dependent variable.
The independent variable may cause something else, and it is the something else that causes the outcome (dependent variable).
Causality may be non-linear rather than linear.
The direction of the relationship may be negative rather than positive.
The strength/magnitude of the relationship may be unclear.

Statistical significance

Statistical significance is a measure of how likely the result came about by chance. When testing a hypothesis it is important for the researcher to be able to assert with some level of confidence that the relationship between variables is a true relationship, rather than luck. Correlation is the concept that is used to describe whether, and to what extent, there is an association between two variables.

One should use statistical significance with caution, especially since there is a reasonable chance that two variables in a study with many will have a significant (statistical) relationship. However, this says nothing about whether or not there really is a relationship. In addition, the practice of p-hacking has cast doubt on the value of determining statistical significance.

It is beyond the scope of this to describe specific statistical analyses that are appropriate for the different measures presented above.

Qualitative data analysis

Qualitative data analysis involves organizing, accounting for and explaining the data; in short, making sense of data in terms of the participants’ definitions of the situation, noting patterns, themes, categories and regularities.

Cohen, Manion & Morrison (2007)

There are two main forms of qualitative data analysis; content analysis and grounded theory. Content analysis is most simply defined as the process of “summarizing and reporting written data – the main contents of data and their messages. More strictly speaking, it defines a strict and systematic set of procedures for the rigorous analysis, examination and verification of the contents of written data.” (Cohen, Manion & Morrison, 2007). Content analysis usually proceeds in a stepwise fashion and while there are several slightly different approaches, the list below is a useful starting point.

Step l. Organize and prepare the data for analysis. This involves transcribing interviews, optically scanning material, typing up field notes, or sorting and arranging the data into different types depending on the sources of information.

Step 2. Read through all the data. A first step is to obtain a general sense of the information and to reflect on its overall meaning. What general ideas are participants saying? What is the tone of the ideas? What is the impression of the overall depth, credibility and use of the information? Sometimes qualitative researchers write notes in margins or start recording general thoughts about the data at this stage.

Step 3. Begin detailed analysis with a coding process. Begin detailed analysis with a coding process. Coding is the process of organizing the material into chunks or segments of text before bringing meaning to information. It involves taking text data or pictures gathered during data collection, segmenting sentences (or paragraphs) or images into categories, and labeling those categories with a term, often a term based in the actual language of the participant (called an in vivo term).

Step 4. Use the coding process to generate a description of the setting or people as well as categories or themes for analysis. Description involves a detailed rendering of information about people, places, or events in a setting. Then use the coding to generate a small number of themes or categories, perhaps five to seven categories for a research study. These themes are the ones that appear as major findings in qualitative studies and are often used to create headings in the findings sections of studies. They should display multiple perspectives from individuals and be supported by diverse quotations and specific evidence. Sophisticated qualitative studies go beyond description and theme identification and into complex theme connections.

Step 5. Advance how the description and themes will be represented in the qualitative narrative. The most popular approach is to use a narrative passage to convey the findings of the analysis. This might be a discussion that mentions a chronology of events, the detailed discussion of several themes (complete with subthemes, specific illustrations, multiple perspectives from individuals, and quotations) or a discussion with interconnecting themes. Many qualitative researchers also use visuals, figures, or tables as adjuncts to the discussions.

Step 6. A final step in data analysis involves making an interpretation or meaning of the data. Asking, “What were the lessons learned?” captures the essence of this idea (Lincoln & Guba, 1985). These lessons could be the researcher’s personal interpretation, couched in the understanding that the inquirer brings to the study from her or his own culture, history, and experiences. It could also be a meaning derived from a comparison of the findings with information gleaned from the literature or theories. In this way, authors suggest that the findings confirm past information or diverge from it. It can also suggest new questions that need to be asked – questions raised by the data and analysis that the inquirer had not foreseen earlier in the study. Thus, interpretation in qualitative research can take many forms, be adapted for different types of designs, and be flexible to convey personal, research-based, and action meanings.

Grounded theory is the other important approach to qualitative data analysis. In this approach, theory generation is emergent which means that it is “more inductive than content analysis; the theory emerges from, rather than existing before, the data” (Cohen, Manion & Morrison, 2007). The aim of using grounded theory is not to “reduce complexity by breaking it down into variables but rather to increase complexity by including context” (Flick, 1998). We will not provide an in-depth review of grounded theory here. If you would like to explore it in more depth, consider Saldaña (2009).

Presenting qualitative findings can be done in several ways, including by groups of participants, themes / issues, or by the research question. Again, there is no single “correct” way of presenting your data. What matters is that it is coherent and internally consistent. It should also be clear to the reader how your interpreted data aims to answer your question.

Conclusion

This section has provided a very brief overview of some of the methods that can be used for analysing qualitative and quantitative data. It is important to note that there are others that have not been included. In addition, the information presented here is, at best, a superficial summary and you are encouraged to explore these topics in more depth.

Readings

Cohen, L., Manion, L., & Morrison, K. (2007). Research methods in education. Professional Development in Education. Sage

Cresswell, J. W. (2009). Research design: Qualitative, quantitative and mixed methods approaches (Third edition). London, EC1Y 1SP: Sage Publications Ltd.

Flick, U. (1998) An Introduction to Qualitative Research. London: Sage.

Saldaña, J. (2009). The coding manual for qualitative researchers. 1 Oliver’s Yard, 55 City Road, London, EC1Y 1SP: Sage. https://doi.org/10.1017/CBO9781107415324.004