Test Theories: TST And IRR

Do you know the theories of testing and do you know what they consist of?
Test theories: TST and IRR

In psychology, tests are used as measuring instruments. Just as we use a tape measure to measure length, we could use a test to measure intelligence, memory, attention … But one of the differences between either of these actions is that the tests are not that easy to construct and apply.

Moreover, a single measurement does not allow us to speak of the volume of an object. The same thing happens with tests: administering just one of them does not allow us to make a diagnosis or suggest an intervention. The tests are therefore important for the evaluation but do not determine it.

This is where the psychologist plays an extremely important role: in some way, he must use the information he has obtained from the test and other sources  to arrive at a coherent assessment, which can lead to an intervention. . To put it another way, it is when you have to analyze the results of different sources that you recognize the quality of a professional. We are talking about a know-how that is acquired through study and knowledge, but also through experience.

Brief history of test theories

The origin of the tests goes back to the time of the Chinese emperors,  in the years 3000 BC. JC. These emperors performed tests to assess the professional skills of the officers who would go to work for them.

The current tests were born from those carried out by Galton  (1822-1911) in his laboratory. However, it was James Cattell who first used the term  mental test  in 1890. Since these early tests did not actually indicate the cognitive ability of humans, researchers like Binet and Simon ( 1905) introduced cognitive tasks in their new scale to assess aspects such as judgment, comprehension and reasoning.

The Binet scale opens up a tradition of individual scales. In addition to cognitive tests, great advances are taking place in personality tests.

tests

 

Why are test theories necessary?

In the face of all the progress made, measurement theories (theories of tests) began to develop and affect tests as instruments. In order to see them measure what we want them to measure, with a smaller margin of error, psychometry has emerged. Psychometrics requires that all tests or measuring instruments be valid and reliable.

Remember that  reliability is understood as the stability or consistency of measurements as the measurement process is repeated. Validity refers to the extent to which empirical evidence and theory supports the interpretation of test results.

There are therefore two main theories of testing when we talk about analyzing and building this type of instrument:  classical test theory (TCT) and item response theory (IRT). 

Classical Test Theory (TCT)

This is the dominant theory in the construction and analysis of tests. It is relatively simple to build tests that meet the requirements of this paradigm. Just as it is relatively simple to evaluate this test according to the parameters mentioned: reliability and validity.

It was born from the work of Spearman at the beginning of the 20th century. Then, in 1968, researchers Lord and Novick reformulated it and came up with the new approach to IRR.

This theory is based on a classical linear model. This model was proposed by Spearman:  the score a person obtains as a result of a test, which we call empirical score and which we denote by the letter X, is based on two components.

On the one hand, we find the true score of the test subject (V) and, on the other, the error (e). It is expressed as follows: X = V + e .

Spearman adds three hypotheses to this theory:

  • First, the true score (V) is the mathematical expectation of the empirical score. This is the score a person would get if they took this test an infinite number of times.
  • There is no relationship between the number of true scores and the size of the errors that affect those scores.
  • Finally, measurement errors in one test are not related to measurement errors in another test.

To conclude this theory, Spearman defines  parallel tests  as those which measure the same thing but with different items.

Limits of the classical approach

The first limitation is that, in this theory, the measurements are not invariant depending on the instrument used. This means that if a psychologist evaluates the intelligence of three people with a different test each time, the results are not comparable. Why ?

The results of the three measuring instruments are not on the same scale:  each test has its own scale. In order to be able to compare, for example, the intelligence of X people who have been evaluated with different intelligence tests, it is necessary to  transform the scores directly obtained from the test into other scales. 

The problem is that by transforming the scores into scales, we assume the idea that the normative groups where the scales of the different tests are drawn up  are comparable – same measure, same typical deviation –  which is difficult to guarantee in practice. The new approach to IRR has therefore assumed a huge step forward from this point. With it, the results obtained through different instruments will be on the same scale.

The second limitation of this approach is the absence of invariance of the properties of the tests  with respect to the people who estimate it. Thus, in the context of TST, the psychometric properties of the tests depend on the type of sample used to calculate them. This point also finds a solution, even partial, in the IRR approach.

 

Item Response Theory (IRT)

Item Response Theory (IRT) was born as a complement to classical test theory. In other words, the TST and the IRT could evaluate the same test and establish a score for each of the items, which could lead to a separate result for each person. The TRI provides us with better calibrated instruments, but it involves higher expenses and the participation of specialized professionals.

IRR contains several assumptions, but perhaps the most important is the one that tells us that any measurement instrument should be consonant with an idea:  there is a functional relationship between the values ​​of the variable that measure the items and the probability of success. This is called the C Ourbe characteristic of the item  (CCI). What do we assume then?

Something that may seem very logical and that the TCT does not assess. For example, the most difficult items are those that only the smartest people can answer. An item that all people answer would be of no use. It wouldn’t give any kind of information. This is just a small sample of what TRI offers.

test theories

To conclude, even though these two theories are almost contemporary, IRR seems to have arisen as a response to the limitations or problems developed by TST. However, there is still a lot of research to be done in this area of ​​psychometrics.

 

Characteristics and functioning of psychological tests
Our thoughts Our thoughts

Psychological tests must meet a number of criteria in order to be serious and achieve your goals.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *


Back to top button