Glossary of terms used by HR Assessors

The section provides you with an explanation behind some of the most common terms used in the field of Psychometrics. The explanations have been kept succinct deliberately and aimed at an audience who does not have nor need a more detailed understanding of these terms.

Accuracy

Ability tests take place under pressure and people often focus on the accuracy (as opposed to speed) and attempt fewer questions in the time given which means they take longer time per question, but ensure that they get those that they attempt correct. However, they are likely to compromise the opportunity to get extra marks as they may run out of time and not answer all of the questions. Accuracy refers to the number of questions a candidate gets correct (NC) out of the number of questions attempted (NA). It is recommended that candidates work both quickly and accurately.

Classical Test Theory

Classical Test theory suggests that any assessment will only reveal an individual’s observed score, and that this is not always reflective of their ‘true’ score, as there is always something in the environment that impacts an individual’s performance (error). Assessments that are designed to determine an individual’s true score are flawed as they are manmade and not infallible. Thus:

OBSERVED SCORE = TRUE SCORE + ERROR.

True Score – this is the individual’s true ability and is always constant for a particular person.

Observed Score – this is the score obtained by an individual on an assessment.

Error – anything that may have impacted an individual’s performance on a test.

By practising tests, you are minimising the error and therefore increasing your observed score. With zero error (not possible), your observed score should equal your true score.

Item Bank

Originally, tests that were provided online were ‘fixed’ tests in the sense that each candidate took the same test consisting of the same questions, presented in the same order. However concerns about cheating soon became apparent as candidate’s could easily share their reasoning of a particular question. Item banking provided a solution to this problem.

Item banking allows a test to be generated from a bank of items instantly. The test is generated using items that make up an overall difficulty level of the test which is comparable to the tests that similar candidates have taken. However, each test instance is different in both the question order and the content of the questions. This reduces any risks of cheating that employers may have.

Item Response Theory

Item response Theory (IRT) is normally used for developing tests that require item banking (see ‘item banking’). It is a complex process which allocates a difficulty level to each item, and then develops an instance of a test for a candidate of comparable difficulty, such that every candidate receives a fair test. IRT selects a number of easy to difficult items to ensure the overall difficulty of the test is as consistent as possible.

Normal Distribution

Also known as the ‘bell curve’, in assessment a normal distribution is a graphical representation of frequencies of scores on a particular test. Assuming the horizontal axis refers to the score on an assessment, then we will find that very few people obtain a poor score and very few obtain a very high score; and that most individuals seem to score around the ‘middle’ or get an average score. Research has found that many characteristics can be depicted in this way such as weight or height.

Normal distributions are a pre-requisite for the scoring systems used for assessment today, particularly the use of norms and cut-offs that employers apply in their recruitment processes.

Normed Score

A ‘normed’ score is a total score on an assessment for an individual that has been compared against a group of individuals who have previously completed the same assessment; this is often provided as a ‘percentile’ (but can also be T-Score, Sten Score etc) which highlights the percentage of candidates the individual has done better than. Imagine a test consisting of 20 questions, which means a maximum possible score achievable is 20, is given to hundreds of individuals, from whom no one achieves a score greater than 15; then if a new candidate takes this test and achieves a score of 16, 17, 18, 19 or 20, they would be deemed to be at the 99th percentile, irrespective of which score above 15 they achieve. This is because they have done better than 99% of the comparison group.

Number Attempted

This refers to the number of questions you attempted in an assessment, irrespective of whether you got any questions correct or incorrect.

Number Correct

This refers to the number of questions you get correct in an assessment.

Percentile

Percentiles are not to be confused with percentages although the principles behind both of these are similar. In an assessment context, percentiles are a way of presenting percentages, but in comparison to the individuals who have completed the assessment previously rather than the total number of questions in the assessment.

Percentiles tell you how many individuals in the ‘norm’ or ‘comparison’ group an individual has performed better than. For example, an individual who achieves the 30th percentile is deemed to have done better than 30% of the individuals in the comparison group.

If a test consisting of 20 questions, which means a maximum possible score achievable is 20, is given to hundreds of individuals, from whom no one achieves a score greater than 15; then if a new candidate takes this test and achieves a score of 16, 17, 18, 19 or 20, they would be deemed to be at the 99th percentile, irrespective of which score above 15 they achieve. This is because they have done better than 99% of the comparison group.Using percentiles helps organisations select the best of ‘what’s available’.

Speed

Ability tests take place under pressure and people often focus on the speed (as opposed to accuracy) and attempt lots of questions very quickly, giving each question minimal time. This focus on speed can increase the risk of error as the candidate is seen to be rushing and not paying enough attention to the question detail; however the candidate is likely to attempt all of the questions.Speed refers to the number of questions a candidate gets correct (NC) out of the total number of questions (TC).

It is recommended that candidates work both quickly and accurately.

Sten

This is known as a standardised score and it often corresponds to a ‘normed’ score. The Sten scale ranges from 1-10 along a normal distribution. Individuals with a sten score of 5.5 are said to be at the 50th percentile which means they have done better (or worse) than 50% of people who have taken that assessment previously (norm group).This scale is often used for personality assessments as it makes it easier to feedback using behavioural descriptors which are attached to each value.

T Score

This is known as a standardised score and it often corresponds to a ‘normed’ score. The T-Score scale ranges from approximately 10-90 along a normal distribution. Individuals with a T-Score of 50 are said to be at the 50th percentile which means they have done better (or worse) than 50% of people who have taken that assessment previously (norm group).

This scale is often used for ability assessments. It is also easy to combine the T-scores of two assessments such as Verbal and Numerical reasoning tests to provide a summarised ability score which can then be ranked to choose the best individuals from an applicant pool.

Z Score

This is known as a standardised score and it often corresponds to a ‘normed’ score. The Z-Score scale ranges from approximately -3 to +3 and corresponds to the statistical distribution descriptor ‘Standard Deviation’. The Z-score also lies along a normal distribution. Individuals with a Z-Score of 0 are said to be at the 50th percentile which means they have done better (or worse) than 50% of people who have taken that assessment previously (norm group).

Z-scores are not used for any feedback of assessments, but instead tell you where an individual sits on a distribution (see ‘normal distributions’). This scale helps to derive other useful scales such as T-Scores, Stens etc.