In the sample dataset, respondents were asked their gender and whether or not they were a cigarette smoker. There were three answer choices: Nonsmoker, Past smoker, and Current smoker. Suppose we want to test for an association between smoking behavior (nonsmoker, current smoker, or past smoker) and gender (male or female) using a Chi-Square Test of Independence (we'll use α = 0.05). Show Before the TestBefore we test for "association", it is helpful to understand what an "association" and a "lack of association" between two categorical variables looks like. One way to visualize this is using clustered bar charts. Let's look at the clustered bar chart produced by the Crosstabs procedure. This is the chart that is produced if you use Smoking as the row variable and Gender as the column variable (running the syntax later in this example): The "clusters" in a clustered bar chart are determined by the row variable (in this case, the smoking categories). The color of the bars is determined by the column variable (in this case, gender). The height of each bar represents the total number of observations in that particular combination of categories. This type of chart emphasizes the differences within the categories of the row variable. Notice how within each smoking category, the heights of the bars (i.e., the number of males and females) are very similar. That is, there are an approximately equal number of male and female nonsmokers; approximately equal number of male and female past smokers; approximately equal number of male and female current smokers. If there were an association between gender and smoking, we would expect these counts to differ between groups in some way. Running the Test
SyntaxCROSSTABS /TABLES=Smoking BY Gender /FORMAT=AVALUE TABLES /STATISTICS=CHISQ /CELLS=COUNT /COUNT ROUND CELL /BARCHART.OutputTablesThe first table is the Case Processing summary, which tells us the number of valid cases used for analysis. Only cases with nonmissing values for both smoking behavior and gender can be used in the test. The next tables are the crosstabulation and chi-square test results. The key result in the Chi-Square Tests table is the Pearson Chi-Square.
Decision and ConclusionsSince the p-value is greater than our chosen significance level (α = 0.05), we do not reject the null hypothesis. Rather, we conclude that there is not enough evidence to suggest an association between gender and smoking. Based on the results, we can state the following:
The chi-square test for independence, also called Pearson's chi-square test or the chi-square test of association, is used to discover if there is a relationship between two categorical variables. SPSS StatisticsAssumptionsWhen you choose to analyse your data using a chi-square test for independence, you need to make sure that the data you want to analyse "passes" two assumptions. You need to do this because it is only appropriate to use a chi-square test for independence if your data passes these two assumptions. If it does not, you cannot use a chi-square test for independence. These two assumptions are:
In the section, Procedure, we illustrate the SPSS Statistics procedure to perform a chi-square test for independence. First, we introduce the example that is used in this guide. SPSS StatisticsExampleEducators are always looking for novel ways in which to teach statistics to undergraduates as part of a non-statistics degree course (e.g., psychology). With current technology, it is possible to present how-to guides for statistical programs online instead of in a book. However, different people learn in different ways. An educator would like to know whether gender (male/female) is associated with the preferred type of learning medium (online vs. books). Therefore, we have two nominal variables: Gender (male/female) and Preferred Learning Medium (online/books). SPSS StatisticsSetup in SPSS StatisticsIn SPSS Statistics, we created two variables so that we could enter our data: Gender and Preferred_Learning_Medium. In our enhanced chi-square test for independence guide, we show you how to correctly enter data in SPSS Statistics to run a chi-square test for independence. Alternately, see our generic, "quick start" guide: Entering Data in SPSS Statistics.
The 13 steps below show you how to analyse your data using a chi-square test for independence in SPSS Statistics. At the end of these 13 steps, we show you how to interpret the results from your chi-square test for independence.
You will be presented with some tables in the Output Viewer under the title "Crosstabs". The tables of note are presented below: The Crosstabulation Table (Gender*Preferred Learning Medium Crosstabulation)
Published with written permission from SPSS Statistics, IBM Corporation. This table allows us to understand that both males and females prefer to learn using online materials versus books. The Chi-Square Tests Table
Published with written permission from SPSS Statistics, IBM Corporation. When reading this table we are interested in the results of the "Pearson Chi-Square" row. We can see here that χ(1) = 0.487, p = .485. This tells us that there is no statistically significant association between Gender and Preferred Learning Medium; that is, both Males and Females equally prefer online learning versus books. The Symmetric Measures Table
Published with written permission from SPSS Statistics, IBM Corporation. Phi and Cramer's V are both tests of the strength of association. We can see that the strength of association between the variables is very weak. Bar chart
Published with written permission from SPSS Statistics, IBM Corporation. It can be easier to visualize data than read tables. The clustered bar chart option allows a relevant graph to be produced that highlights the group categories and the frequency of counts in these groups. |