What Aptitude Tests Actually Measure

I once took an aptitude test that labelled me “average” at language reasoning.

A while ago I took a Logiks aptitude test for a role. The result labelled me as average in numerical reasoning and language reasoning, and slower than most of the population at answering questions.

That surprised me.

Logiks is a time based test with numerical, verbal and spatial (referred to as abstract in Logiks result) reasoning questions.

I was reading novels before I started school (yes age 5), sound like I just ate a thesaurus in general conversation, and have a worked in data and analytics for most of my career. It did not align at all with my identity, but the test was also not used as the basis for a go or no-go decision for my application, I was un-nerved by it, but didn’t take any action. It has played on my mind the entire time, I have discussed it with others and impacted me so much, I ended up writing this.

A month ago I was advised that another role I was applying for, would only end up placing candidates who scored exceptionally high on a CCAT (Cognitive Capability Aptitude Test). Due to my experience with the previous test, I decided to do a practice test online. My score on the practice test was 33%, with 50% being average. The practice test was very similar in style to Logiks, and again I questioned my identity.

I discussed the test with ChatGPT, and after discussions and probably four additional hours of practising I scored in the 95th percentile on the actual CCAT.

An image showing average score for a test called Logiks and then a 95th percentile score for the same person for a similar test called CCAT. — *Same person, similar test, vastly different results.*

The following sections can be expanded, if you would like my deep dive into this experience and how it applies to context.

What these Tests Claim to be measuring vs What they do Measure

Most of the descriptions for these tests, in the test outcome report or the “so you’ve been asked to take a Logiks test” web sites from both corporate test providers and those who provide practice tests quote that the tests are measuring a candidate’s intellectual ability. A number also highlight that the test is not measuring IQ.

*screenshot of explanation of what the CCAT test is measuring from career resource section of organisation that requested the test*

If you take these tests and :

you pause to think you lose,
you try to be accurate you lose,
you brute-force you lose.

Consistently in forums or discussions online those who do extremely well say “I didn’t solve everything, I skipped ruthlessly. “

Actually Measuring

pattern recognition under time pressure

mathematical shortcuts

specific types of trained heuristics

switching tasks without context recovery

fast elimination of wrong answers

Claims to Measure

Critical Thinking

Problem Solving

Attention to Detail

Learning & Comprehension

Memory

Perception

Visual Processing

Can you prepare for these Tests?

The test itself is designed to measure your current cognitive ability, and there is no way to immediately improve that.

Career resource section of site where I took the CCAT

After discussing my results with ChatGPT, who advised me that this test and those like it, are not really measuring aptitude, and that it rewards speed and specific trained heuristics, not insight ; I still only scored 54% on my second attempt. In reality for 50 questions in 30 minutes, I knew I needed to read figure out what I was supposed to be solving and answer in 18 seconds, so probably 12 seconds for solving. I tried it with a timer and that made me score worse. I also realised that because I knew I could solve – particularly the algebraic ones, I became invested in solving them but creating an algebraic equation that matched the text, solving for multiple variables and checking for the usual inverse verbal nuance is not a 12 second solve – for me.

I practiced the cadence a few more times and decided to take the test when I felt that I could ruthlessly guess and skip for longer solve questions earlier in the test. I did not increase my aptitude from the 33rd percentile to the 95th percentile in 4 hours. I learned the types of questions I would need to complete, the ones I could instantly answer (clearly verbal reasoning and yes I did smirk while I typed that) and when I needed to stop myself for those that I got really invested in solving but had not solved in the time required.

When I was asked to complete the Logiks test the internal recruitment consultant advised that I did not need to prepare, but I should ensure I had sufficient time, and would not be interrupted. The organisation that I did the CCAT for had the quote above under a heading about whether or not you can prepare. Adding more weight to the idea that my intellectual ability and therefore my worth as an employee in my field is a) practically set in stone and b) accurately measured by the test and c) it’s evidence based (see image below).

*another screenshot from the career resource section of organisation that requested I take the CCAT*

Evaluating Reasoning Ability in Non-preferred Language

These tests may be useful signals. But they are very sensitive to context – including language familiarity and test strategy.

A colleague told me once “Words are my weapons in my native tongue.” An assessment of verbal reasoning does not have the same context of an assessment of verbal reasoning in your non-preferred language.

Even an assessment of numerical reasoning does not have equivalent context if the numerical reasoning question is written in your non-preferred language and usually contains a verbal nuance that inverses or amends what you are require to do with the solved value after you have derived the algebraic formulae and solved for the variables. Examples of these are requiring to calculate effort to complete a task with multiple people who work at different rates, but then asking for the difference in completed items. Or calculating a discount price for multiple sale percentages or consecutively applied discounts, but being expected to calculate net savings not the final purchase price.

Is placing a tricky negation somewhere in a wall of text – when you know they have 12 seconds to answer and are told it’s a test you don’t have to prepare for really going to assess numerical reasoning?

Is attention to detail equivalent for me scanning 400 words of English to create a solvable equation, to someone for whom English is their non-preferred language?

The Merriam Webster thesaurus site says there are 168 synonyms for no in English, I’m not convinced I could make a comprehensive list – or recognise all of them as a negation. Could you, in 12 seconds when the question is actually algebraic?

Does an English verbal reasoning assessment accurately reflect verbal reasoning for everyone?
Does a numerical reasoning assessment accurately reflect numerical reasoning if you are required to apply verbal reasoning to identify the problem you are solving?

The cognitive load required is higher, the translation lag will impact timing and the nuance can impact accuracy.

Hiring and Admissions Equity

Australian Federal law prohibits discrimination based on race, colour, sex, sexual orientation, age, physical or mental disability, marital status, family/carer responsibilities, pregnancy, religion, political opinion, national extraction, or social origin. This applies to advertising a vacancy, selection criteria and to the recruitment process, including job interviews. These laws aims to ensure fairness.

The tests themselves are often introduced and administered to reduce bias.

*screenshot from another testing site describing the neutrality of the test and outcomes.*

A standardised test does not offer a neutral result based on national extraction (preferred language).
A standardised test does not offer a neutral result based on mental disability (ADHD, autism, dyslexia, dyscalculia).

I have also seen these tests listed as requirements for being admitted into educational programs.

If preparation and practice materially changes the result (I increased my score by 65%), and familiarity advantages some groups more than others, what exactly are we measuring? Are we decreasing bias? Are we ensuring fairness?

Pyschological Impact

I’m writing this as someone who was more than just un-nerved by my Logiks results. This test was not used by my employer to not select me for a role; and the CCAT score meant that I was in the top 5% of applicants for the other role. This was still very uncomfortable for me, so uncomfortable, I am writing all of these words to tell you about it. Thematically this blog does fit with context about data, but it is and was more than that for me.

*screenshot of my full Logiks test results*

It made me question if my view of myself as someone who is pretty good at solving complex analytical problems and even better at writing and comprehension (in English) is valid.

The gap between my understanding of my cognitive aptitude and these results really bothered me.

I viewed these measurements as defining, even though I highlight how measurement without context adds operational risk everyday in my career.

I am very impulsive, cognitively I want to sprint through everything. I also really want to get the correct answer and not let it go if I know how to do it. When practicing it was very difficult for me to “feel” twelve seconds passing. Even when I knew how to score higher it was extraordinarily difficult to click next on a maths question, it felt like I was choosing to say I was bad at numeric reasoning not that I was choosing to attempt the most questions I could in the allotted time.

What Does Better Look Like?

I’m not an academic, I’m not a researcher and I do not work in assessment. I’m aware that driving change in this space, is not in scope for myself or this blog. If we did want to do this better, what would we do?

Could those who administer these tests be trained to have different discussions with candidates, could content like this accompany the ones I refer to. Yes, transparency will enable gaming, but does this add or remove weighting from the testing being valid or “scientifically proven to predict job performance”. Does our ability to game the result also predict job performance?

Would reducing the time pressure make it a more valid assessment? Would adding information to the test about how it is biased towards certain groups make it more fair?

Despite working in data, I did not recognise this sooner.

A test score is a measurement, and measurements only make sense when we understand what they are actually measuring.
Without that context we risk measuring, confidence, familiarity and speed – and calling it capability.

Icons used in this post were obtained from and created by authors on Flaticon