I met you at the beginning of my thesis and found your advice very helpful. I am in the process of finishing my data and I have a little question that I thought you could help. I made behavioral observations for my study and one person coded all the data, and another person coded20% of the data for reliability. I want to use the kappa equation to determine the reliability between my coders. I know that I have to calculate four numbers: 1) the total number of agreements that have taken place; 2) total agreements that the behavior did not take place; 3) Number of coder said yes and coder B said no, and 4) number of coder A said no and codeR B said yes. My question is, what am I going to do with these numbers to get a kappa score? I know the SPSS will do it if I inserted all the data – but it would be hundreds of data points per subject and would take a lot longer than they would calculate by hand. Any information you could provide would be greatly appreciated. Thank you! Rebecca Landis, J. R., Koch, G. G. (1977). The measure of the compliance agreement for categorical data.

Biometrics, 33, 159-174. There are a number of statistics that have been used to measure the reliability of interreters and intraraterns. A sub-list includes a match percentage, Kappa cohens (for two tyters), kappa fleiss (Adjustment of Cohens Kappa for 3 or more raters), contingency coefficient, Pearson r and Spearman Rho, intraclassin correlation coefficient, match correlation coefficient, and Alpha krippendorff (useful if there are several tips and evaluations). The use of correlation coefficients such as Pearsons r can be a poor reflection of the agreement between advisors, leading to an extreme overshoot or underestimation of the actual level of the breach agreement (6). In this document, we will take into account only two of the most common measures, the percentage of consent and Kappa cohens. An example of Kappa`s statistics calculated in Figure 3 is available. Note that the agreement percentage is 0.94, while the Kappa is 0.85 – a significant reduction in the level of congruence. The greater the expected random chord, the lower the resulting value of the Kappa. A similar statistic, called pi, was proposed by Scott (1955).

Cohen Kappa and Scotts Pi differ in how pe is calculated. However, previous research has shown that several factors influence Kappa`s value: observer accuracy, code in sentence, prevalence of certain codes, observer bias, observer independence (Bakeman – Quera, 2011). Therefore, Kappa`s interpretations, including definitions of what constitutes a good kappa, must take into account the circumstances. The percent deal and Kappa have strengths and limits. Percentage chord statistics are easy to calculate and directly interpretable. Its main restriction is that it does not take into account the possibility that councillors guess on partitions. It may therefore overestimate the true agreement between the advisors. The Kappa was designed to take into account the possibility of rates, but the assumptions it makes about the independence of advisers and other factors are not well supported, and it can therefore reduce the estimate of the agreement excessively.