Correlation Technique – Phi Coefficient and Point Biserial.
Highlights:
- Definition of correlation
- Correlation coefficient
- Correlation range
- Correlation techniques
- Phi coefficient
- Point Biserial coefficient.
Correlation is a statistical based technique used for establishing relationship between two or more variable. This is usually arrived at with the ais of correlation coefficient.
Correlation coefficient: The degree of association is measured by a correlation coefficient, denoted by r. It is sometimes called Pearson product moment correlation coefficient. The correlation coefficient can either be positive or negative. It ranges from -1 through 0 to 1.
Advertisements
When the correlation coefficient is zero, it implies no relationship between the variables under consideration. When it is 1, it indicates a perfect relationship between the variables. Those variables are completely dependent on each other. When it is -1, it shows an inverse relationship which indicate that there are not dependent on each other.
- When a correlation coefficient is positive, it means as one variable is increasing, the other is increasing as well. E.g study time and student achievement. This implies that the more you study, the greater the achievement you will have in what you are studying.
- If the correlation is negative, it means an inverse relationship exist between the variables. These shows that increase in one variable leads to decrease in the other variable.
However, it is important to note that correlation does not imply causation.
Correlation Range Interpretation
1.) 0 – 0.3 = weak/low correlation
2.) 0.31 – 0.69 = moderate correlation
3.) 0.7 – 1 = high correlation
Types of Correlation Technique
Nominal | Ordinal | Interval/Ratio | |
Nominal | Phi coefficient
C. coefficient Cramers V
|
– |
– |
Ordinal | Rank Biserial | Spearman rank | – |
Interval/Ratio | Point Biserial | Biserial | Pearson Product Momemt Coefficient |
Phi Coefficient: This is applied when we have two nominal dichotomous variable or data. Dichotomous means the independent variable has two levels. Like gender(male and female).
Example, if a researcher is interested in finding the relationship between gender and student interest in chemistry and mathematics. Use the data given in table 1 below
Gender | Biology/Chemistry |
1 | 1 |
1 | 0 |
0 | 0 |
1 | 1 |
0 | 0 |
1 | 1 |
1 | 0 |
0 | 1 |
0 | 1 |
0 | 0 |
Now we code or assign
Female = 1
Male = 0
Biology = 1
Chemistry = 0
This will be represented in a 2 x 2 matrix form.
Gender | Biology(1) | Chemistry(0) | Total |
Male (0) | 2 A | 3 B | 5(A+B) |
Female(1) | 3 C | 2 D | 5(C+D) |
Total | 5 (A+C) | 5 (B+D) |
For Male
The first cell under biology which is 2 is gotten by careful inspection. This is done by counting male student who showed interest in biology. That is 0,1. Go to table 1 and count how many pair of 0, 1 you see and write it down.
The second cell under chemistry which is 3 is gotten through that same procedure. It indicates male student that showed interest in chemistry. That is 0,0, so go to table 1 and count the number of times you see 0,0.
The total is gotten by summing 2 and 3(A+B) = 5
For Female
The first cell under biology for female which is 3 is gotten by careful inspection. This is done by counting female student who showed interest in biology. That is 1,1. Go to table 1 and count how many pair of 1, 1 you see and write it down.
The second cell under chemistry for female which is 2 is gotten through that same procedure. It indicates female student that showed interest in chemistry. That is 1,0, so go to table 1 and count the number of times you see 1,0.
The total is gotten by summing 3 and 2(C+D) = 5
Also for the column, A+C = 5, B+D = 5.
Applying this formula
Phi = (BC-AD)/SQRT((A+B)(A+C)(C+D)(B+D)
Where, BC = 33 = 9, AD = 22 = 4, (A+B) = 5, (A+C) = 5, (C+D) = 5, (B+D) = 5
Phi = (9-4)/SQRT(555*5)
= 0.2
This implies there is a weak or low relationship between gender and students’ interest in biology and chemistry.
Point biserial correlation coefficient
This is used to determine the relationship between two variables when one of the variablesis measured on a continuous scale(interval/ratio) and the second variable is a nominaldichotomous variable. Like gender. For more information on scales of measurement, click here.
The formula use here is given as
Ypb = (MeanY1 – MeaY0)/SdY * SQRT(P * Q)
Where,
MeanY1 = mean of scores of students who got the item correct
MeanY0 = mean of scores of students who got the item wrong
SdY = standard deviation of the scores of students
P = proportion of students who got the item correct
Q = proportion of students who got the item wrong
Example: Assuming a researcher is interested in finding the relationship between student responses to an item and their achievement in integrated science examination. Given the table below
S/N | Responses(x) | Scores(y) |
1 | 1 | 15 |
2 | 1 | 10 |
3 | 0 | 9 |
4 | 1 | 6 |
5 | 1 | 8 |
6 | 0 | 17 |
7 | 0 | 18 |
8 | 0 | 11 |
9 | 1 | 20 |
10 | 0 | 6 |
Correlation Technique – Phi Coefficient and Point Biserial
Solution
First, we code the responses of students;
Correct response = 1
Wrong response = 0
S/N | Responses(x) | Scores(y) | |
1 | 1 | 15 | 225 |
2 | 1 | 10 | 100 |
3 | 0 | 9 | 81 |
4 | 1 | 6 | 36 |
5 | 1 | 8 | 64 |
6 | 0 | 17 | 289 |
7 | 0 | 18 | 324 |
8 | 0 | 11 | 121 |
9 | 1 | 20 | 400 |
10 | 0 | 6 | 36 |
Total | 1676 |
MeanY1 = mean of scores of students who got the item correct(1)
So we have (15+10+6+8+20)/5 = 11.8
MeanY0 = mean of scores of students who got the item wrong(0)
This will be (9+17+18+11+6)/5 = 12.2
SdY = standard deviation of the scores of students = 4.86 i.e using the raw score formula.
P = proportion of students who got the item correct = 5/10 = 0.5
Q = proportion of students who got the item wrong = 5/10 = 0.5
Ypb = (11.8 – 12.2)/4.86 * SQRT(0.5 *0.5)
= -0.041
This shows a negative or inverse relationship between the students’ responses to the items and their overall score in integrated science.
If the correlation is moderate or high and negative, it means students who failed or got the item wrong tends to score higher in the exam. Where as if the correlation is moderate or high and positive, it means students who got the item correctly tends to score lower in the exam.
Advertisements