Psyched For Business Podcast Episode 15

Published March 19, 2025 by Evolve Assess

Episode 15:
What does validity mean in assessments and how do we evaluate it? – with Andrew Munro

Richard is joined by Andrew Munro for the second time. Andrew is a chartered psychologist with over 30 years experience in the corporate sector, a conference speaker, and author.

In this episode we’ll cover what validity means in assessments and how it can be evaluated. We will also delve further into the methodologies and metrics that can be used to assess validity as well as the importance of transparency and simplicity.

Subscribe to the podcast on your favourite platform:

Episode 15 – Transcript

Voiceover 0:00
Welcome to Psyched for Business, helping business leaders understand and apply cutting edge business psychology principles in the workplace.

Richard Anderson 0:12
Hi, and welcome to Psyched for Business. My name is Richard Anderson. Thank you for joining me. In today’s episode, I’m joined by chartered psychologist Andrew Munro. I’m really delighted to have Andrew back on the podcast for the second time, the first one we recorded was called Why do intelligent people do stupid things? And that’s also well worth a listen. But in today’s episode, we cover the very interesting and slightly contentious topic of validity in assessment. What does it mean? How is it evaluated? And is all what it seems I hope you enjoy the episode. Thanks again for listening. Andrew Munro, welcome back to Psyched for Business. It’s great to have you on the show. Again, thank you for joining me, how are you doing?

Andrew Munro 0:54
All, good. And thanks for asking me back. I enjoyed hugely our last discussion. That was that was great fun.

Richard Anderson 1:02
Yeah, me too, Andrew. And the listeners won’t know this. But I’ll tell them you’re sitting in very exotic country at the minute. Albania you are telling me all about it. And I’m suitably envious after say. So we, we hope you’re having a good time out there. But listen, just to just to I guess briefly recap on the introduction that I that I gave you last time in the first podcast. So you’re a chartered psychologist, of course, you’ve worked in the area of assessment. And I’m going to emphasize the word validity because of course, that’s what this podcast is all about. You’ve done that for 30 years now. You’re a Director of Talent World Consulting you’re also an associate of Envisia Learning. And you’ve also and I must put this in, I very much enjoy your book A to Z, and Back Again: Adventures and Misadventures in a Talent World. Andrew, you know, we’ve decided, and we had a little bit of deliberation, you and I, after the last podcast about what the next one could be about. And we’ve decided to yeah, we’ve decided to make it all about the topic of validity.

There’s a few reasons why we’ve done that I’m really looking forward to getting stuck into this topic. We both know, it’s a slightly contentious one, to say the least it splits a lot of opinion. So yeah, I’m really looking forward to getting stuck in. Good, good. Okay, so if you’re happy for me to, I’m going to kick off by summarizing a small section of its A to Z and Back Again, your book. So I’m going to open quotes now and say:

A Conjuring act pulls the White Rabbit of validity out of the hat of a data set. Here, the magic depends on missing the sleight of hand that one generalizes with a puff of smoke and well positioned mirrors from a data set that is small and unrepresentative sample. Number two waves the wand of statistical trickery and corrections to astonish and baffle. Three saws the glamorous assistant in half, but avoids cross validation with a different data set. Number four climbs the Indian rope to disappear out of view, without reference to any independent replication. At this point, even the rabbit look surprised that it managed to jump out of the magician’s hat to appear in a test publishers manual. It’s very amusing. It’s very hard hitting. There’s a lot going on there. Start off Andrew if you’re happy to by giving us an unplugged summary of validity, please.

Andrew Munro 3:42
Okay, Richard, thanks for that passage, I think Masoud my co-author and I think we both got a bit carried away with that conjuring analogy. Wee bit overblown. But the point is, validity is in a bit of a mess. This is an unplugged podcast. And we can put in the technical stuff, definitions, methodologies, and so on into the transcript notes. For my working definition for this conversation over the next 30 minutes or so. Is there validity establishes if an assessment does what it claims – as simple as that. And without a specific claim it becomes a very slippery construct. And it’s one that’s open to all sorts of interpretation and endless circular debates.

Richard Anderson 4:35
And I’ve certainly seen that I have to say but why don’t we kick off with an example of what you mean if you’ve got one

Andrew Munro 4:43
A vendor is selling a language based assessment. He uses text textual analysis to generate insights into personality. And the website announces our science is validated and has over 20,000 citations on Google Scholar, okay, I then scroll down the listings for a range of applications, including recruitment. That’s what I’m interested in right now. And I read an article called Predicting Mental Health Status in Remote and Rural Farming Communities. I have no doubt whatsoever that linguistic analysis methods can be affected, you might, you might want to run our sentiment analysis on your own LinkedIn posts, Richard, and you’ll, you’ll get the idea. But the vendors making a generalized statements about a promising methodology is not a specific claim. And it’s not relevant to my requirement to, for example, implement a test to recruit staff and social care sector.

Richard Anderson 5:52
Yeah, yeah, makes total sense. And thinking about you, of course, have been on both sides of the table, haven’t you? So you’ve been a client. But you’ve also been a consultant when it comes to validity and tests and those types of things. So if you put yourself in, in your shoes, when you were a client reviewing claimed assessment validity from different vendors, what was your own experience?

Andrew Munro 6:18
Oh, lots, actually. But specific one of my bosses, asked me to meet a potential new assessment vendor. The specific tool had intuitive appeal. It had a very differentiated position in the test marketplace. And my boss and I, we both thought it might complement some of the aspects of our talents, our processes.

Richard Anderson 6:43
Bet you were looking forward to the meeting.

Andrew Munro 6:45
Yeah, absolutely. So as part of the pitch, the vendor walked through the deck. I don’t know if you remember those days. One slide made the extraordinary claim of predictive power, expressed as a validity coefficient of point nine, three, a figure on unheard of in talent assessments. I asked her how was this figure derived? And practically, what did this level of predicts predictive validity mean?

Richard Anderson 7:17
Okay. And so just to go back to that validity coefficient, so she said nought, point nine, three, I think you said so. Yeah, just for the for the benefit of our listeners validity coefficient. What does that mean, in layman’s terms, like the company would, I don’t know, have a 93%? Accuracy in future assessment? Yeah, I don’t know. But like business performance, increasing by 93%? What does that mean?

Andrew Munro 7:46
Well, you ask. And I did, and I was none the wiser. And things actually got a bit more awkward. When I asked about the methodology that had generated this figure of point nine, three, and we’re back to the conjuring trick, and the rabbit from a rabbit from the hat. And my boss he did the diplomatic thing and concluded the meeting. And that was that really?

Richard Anderson 8:15
Well, I’m glad you asked the same question that that I did, that makes me feel a little bit better. But um, let’s Okay, so let’s go into that a bit more. So what are the factors? Well, what was what was what are the factors that affect validity for you?

Andrew Munro 8:34
So let me answer your question this way Richard a few years ago, with my good friend, Dr. Paul Barrett we both posted a competition on LinkedIn. And an award would be given to the test publisher who could provide evidence of the business impact of a personality test in a selection context. This was going to be a variation of the paranormal challenge. This is the one where a magician and sceptic James Randi offered $1 million to anyone who could show evidence of a paranormal power or events. Yeah, over 1000 people applied none more successful. Spoon Bending Uri Geller, he refused to take the challenge.

Richard Anderson 9:26
Sounds like a brave challenge that you set in, in our world of validity then. So what conditions were set like what were the parameters? What did you What did you do to award the million dollars?

Andrew Munro 9:40
Okay, why don’t you have a go yourself or Richard?

Richard Anderson 9:46
Okay, so let’s say something like thinking about validity. So an improvement on current process maybe or just I suppose a tangible evidence of business benefit

Andrew Munro 10:01
You’re thinking as sensible person or Richard, but you’re not thinking as a psychometrician. I won’t run through all the criteria. But for example, number one, a base rate of current success had to be available. Is the new test and improvement on existing selection processes. Is it even better than tossing a coin?

Richard Anderson 10:25
Yeah, I get it that that that that that makes sense as a starter, what else did you have in there?

Andrew Munro 10:29
All right, number two, there had to be a decent sample size. We set it a modest 150. So this was going to rule out the personality test that relies on validation from 45 bus drivers, 63 zookeepers. And it’s the kind of nonsense the BPS, the British Psychological Society. The test review process gives out Smarties for when it comes to evaluate test publisher resolutions.

Richard Anderson 11:02
Okay. I like it. Okay, so 150. Okay, base rate of current success. What else?

Andrew Munro 11:10
And the most important condition. And I think this turned out to be the most demanding. Successful candidates were tracked, and meaningful performance data linked to business outcomes, sales, productivity service, they were obtained after a year. So rather than rely on subjective supervisory ratings, objective criteria, work outcomes of some organizational value, had to be applied to meet the criteria of the of the psychometric challenge.

Richard Anderson 11:46
Okay. So I’m no expert in validity as you as you know, Andrew, but it does crop up from time to time. And one of the things that I often observe or think is that, isn’t it pretty difficult to obtain those sorts of metrics in a lot of businesses that you’ve required to meet the challenge?

Andrew Munro 12:08
It’s a fair point. And I know, you want to come on to talk about metrics. But quickly, I’ll mention a paradox. On the one hand, the test publisher say that objective testing is required, because managers are completely hopeless in recruitment interviews, performance appraisal, and talent reviews. Not true, by the way, but that’s the narrative. Yeah, we need the rigor that psychometric testing provides. And it does provide rigor. I’m not arguing against our psychometric testing. But hold on a minute, what’s the metric for validation? The evaluations that you psychometricians are criticizing in the first place.

Richard Anderson 12:56
Yeah. So almost the test validation almost hinges on the lack of validity that it’s arguing against in the first place. Yeah, I see the paradox. 100%. Okay, then how do they how did the competition run? Were you nervous about the outcome? I mean, million dollars on the table.

Andrew Munro 13:17
I had complete faith in Paul. So it was a very entertaining exchange. And we reviewed many, many submissions. There are a fair number of challenges about the criteria and the methodology, which were completely reasonable. A few tests publishers got rather heated. But to answer your question, no, no study met the conditions.

Richard Anderson 13:47
Yeah. So just like Randi then, the million dollars was never awarded?

Andrew Munro 13:51
Yeah, absolutely. Absolutely. Good. My wife was relieved. That example, was based on personality testing in our selection scenario. But we’re trying to make a more fundamental points, that we need to look at validity in context. And here, I would highlight another factor, which is often neglected: selection ratios. If I’m in the position of recruiting only one out of every 30 applicants, I’m in a very different ballgame to being forced to choose one in three applicants.

Richard Anderson 14:34
Yeah, I mean, that that is a very fair point that it might actually thinking about and explain something. So one of the things that I was always puzzled by was that piece about Google and Google made the decision to abandon the use of the cognitive ability tests. They found zero correlation between the test results and the performance of those individuals in the roles. Excuse me. But thinking about it in just the point you made there, if you’ve got a hiring ratio that is only probably allowing you to recruit one candidate out of, I mean, hundreds of applicants, probably then how are you going to see any correlation at all or much of one anyway?

Andrew Munro 15:15
For an analytics company, very surprisingly, Google forgot the problem of restriction of range. If the majority of your shortlisted candidates are at the 95th percentile or above, on cognitive aptitudes, why would you expect much differentiation and performance from the test scores within a very highly selected group? And here Google should have looked at other potential predictors in the assessment process?

Richard Anderson 15:46
Because I think a lot of companies just followed suit, didn’t they? Because Google is supposed to be the exemplar the shining, best practice, but they’re not representative of every company out there. Because, you know, other companies, they’ll not have anything near the volume of applicants that the Google get.

Andrew Munro 16:05
Yeah, it was a big mistake. Yeah.

Richard Anderson 16:07
And I think, as I remember it with Google, they had problems with the, you know, when the algorithm the number crunchers came up with, an algorithm for promotion readiness within the business and the line managers refused to accept it.

Andrew Munro 16:27
A few issues on that one, Richard, one was about ownership of decision making. Mainly however, I suspect even Google’s managers didn’t trust their own algorithm on this one. Yeah. And with good reason, which I think you want to explore and cover a bit later.

Richard Anderson 16:47
Excellent. Okay, so just very quickly, to summarize the last part of this discussion, or the last part of the conversation that we just had there. So you’re saying that validity must be considered in the round? You know, I get that there’s something that’s niggling a little bit. And I think that we should maybe explore a little further, would you mind just doubling back on the issue of validity and what it means? So I’ve heard quite a few times and again, it’s nice because I’m, obviously we work with, you know, assessment psychometric technology, but I’m not involved in the validity aspect of it. And one of the things sitting on the periphery almost, I hear the term meta-analysis, when it comes to validity and validation studies. Would you mind telling our listeners what do we mean by meta-analysis Andrew?

Andrew Munro 17:40
So meta-analysis is a methodology to consolidate hundreds of different studies from different samples. And the procedure is intended to well basically iron out the wrinkles to correct for various statistical anomalies, from all the vagaries of different research designs, and the outcome, to summarize the evidence for the validity of different assessment methods.

Richard Anderson 18:09
Okay, so given we know what we know, from the meta-analysis research, I suppose we can generalize to say that this assessment method has solid validity. And as a practitioner, I guess we can use it with confidence

Andrew Munro 18:28
Yes and no, but mainly no. Meta analytical studies, they certainly do provide an overview of which assessment methods deserve more or less attention. So we can rule out graphology as a selection method, no robust study at any time was found any validity for it. But, and this is the big but, meta-analysis suggests an assessment might work. But that’s a far cry from claiming it actually does work within a practical setting. And here, we’re back to the issues of context, base rates and so on. The other issue for practitioners is which meta analytical study do we sign up to? Do you know the BBC docu-sitcom The Thick Of It?

Richard Anderson 19:24
Yeah, yeah, of course. Yeah.

Andrew Munro 19:26
And there’s a line from Malcolm Tucker. So Malcolm’s the political spin doctor, and he’s saying to a civil servant. I won’t attempt the voice of the actor Peter Capaldi. You’ve been speaking to the wrong expert. You’ve got to ask the right expert. And there’s a bit of which expert in the world of meta-analysis.

Richard Anderson 19:51
Okay, I got Yeah. I checked the reference that you sent me you know, the LinkedIn post by Paul Barrett. And he rank ordered the results from four different meta analytical studies, basically to find conflict in summaries. And as I understand that, because of the different assumptions of statistical methodology, they came up with quite different conclusions. It’s it just seems really messy.

Andrew Munro 20:24
It is, but it’s largely one of the makings of the psychometricians, through a lack of transparency, and also applying statistical wheezes. That in principle indicate validity. But in practice are a million miles from real life application,

Richard Anderson 20:43
it’s really interesting stuff I have to say. Okay, why don’t we move on to methodology and metrics?

Andrew Munro 20:52
Yep. Because if we don’t have the relevant metrics, no magical methodology, we’ll pull our white rabbit from the hat.

Richard Anderson 21:01
Exactly. So let’s go back to what we were talking about around objective measures of work performance, they’re often much more difficult to access as we’ve as we’ve previously touched on. So if an organization doesn’t know, who is or isn’t, you know, within the staff base, having a business impact, whether that service responsiveness, productivity, sales, innovation, whatever, that seems to me to be a bit of a more fundamental problem.

Andrew Munro 21:34
Of course, if an organization can’t differentiate levels of performance and other around success outcomes, its yardstick for validation becomes next to a meaningless. Yeah.

Richard Anderson 21:46
Okay, so just to check my understanding on this. So we’re going to identify a key success metric on many metrics, and check the relationship with these metrics that we’ve identified against the scores on the assessment that we want to validate. I’m assuming that if we find a decent relationship between the two, then there’s going to be potential to improve performance on that success outcome, whether that’s in selection or in their learning and development practices. And then of course, we apply that.

Andrew Munro 22:23
Yeah, I mean, that’s how the methodology works. Imagine a scenario. And we want to validate a test that we think has potential to improve the effectiveness of surgical teams in the National Health Service. This is not trivial. This test has the potential to save patient lives. What metric would you draw on for test validation?

Richard Anderson 22:49
Well, I guess in, in that example, maybe outcomes from operations over time, something like that, I guess, preferably across a range of hospitals and medical procedures.

Andrew Munro 23:06
Yeah. But what if we find if the head surgeon of hospital is so extraordinarily talented, that he or she and her teams end up taking on the difficult cases that other surgeons and other hospitals don’t go near? Their apparent success rates might be relatively lower vis a vis their peers in other hospitals, but only because only because they are so successful.

Richard Anderson 23:36
I see, the problem there,

Andrew Munro 23:39
Or just to keep the NHS theme going, what if we find nursing teams, which report higher, higher error rates during operations, turn out in fact, to be the better teams. The better teams encourage honesty and acceptance of mistakes to learn. Conversely, the worst nursing teams cover up their mistakes and report lower error rates,

Richard Anderson 24:09
Black box thinking. So that reminds me of the Cobra effect, which, which no doubt you’re aware of. And it’s it, I suppose it’s a great example of the law of negative, unintended consequences. And for anybody who isn’t familiar with the story of the Cobra effect, it was during British rule in India, and I think the government were becoming increasingly concerned about the number of venomous cobras in I think it was Delhi. I think the government basically offered a bounty for anybody that could bring in a dead Cobra and obviously the consequence of that you had a lot of entrepreneurial opportunists who began to breed cobras, and the government obviously once it realized what was going on the end of the program, but the other consequence to that was that the Cobra breeders who You were left with basically 1000s of worthless snakes, the free the Cobras, and it grew. So it was infinitely worse than it was in the first place.

Andrew Munro 25:09
It’s a great example. We have to choose our metrics with great care as part of our validation projects.

Richard Anderson 25:20
Absolutely. So okay, so we’ve run our validation study, let’s assume that. And again, assuming that we’ve got a decent data set, and that the metrics that we’ve decided on in terms of success, are the robust the defensible, how do we report back and use the findings for practical improvements in assessment and development?

Andrew Munro 25:46
The standard format, which we’ve touched on briefly is the correlation coefficient. And this is the typical statistic, which gets reported in test publisher manuals, as well as in research articles.

Richard Anderson 26:02
Okay, so just again, just to touch on that term, the correlation coefficient. We’ll try and get it explained for the layman. What do we mean by the correlation coefficient? Andrew?

Andrew Munro 26:15
Yeah, good question. I mean, this is validation unplugged. So the validity coefficient is an index running between zero and one. Zero relationship means there isn’t a relationship between test scores, and whatever success criterion is being applied – work performance being the most common – through to 1, a perfect correlation. In the assessment space, we are typically looking at validity coefficients of .3 to .5.

Richard Anderson 26:50
Got it? Makes sense.

Andrew Munro 26:54
Couple of risks with the correlation coefficient. The most obvious is this number is a spurious number. A bit of a tangent, but bear with me, Richard, so what? What do you think the correlation is between per capita cheese consumption and death rates through being tangled up in bedsheets?

Richard Anderson 27:20
Okay. That’s an interesting question. I think I might know where you’re going. But you know, intuitively, I would say, zero. But then unless we think that the cheese apparently gives us nightmares, and then nightmares creates, like night time panic, and you become snarled up with the sheets, I suppose. Perhaps a very, very, very, very small correlation.

Andrew Munro 27:47
As it turns out, it’s a heft .94. Wow, extraordinary results. And there’s a great website, and it’s worth posting the link in the transcripts, that reports more of these types of spurious correlations. They’re great fun. Robert Matthews, mathematics professor at Birmingham, he said correlations are like coincidences. We would take them less seriously, if we’re more aware of how easily we find them. So first off, let’s check, we’re not fooling ourselves by being fooled by randomness, correlations that are thrown up as statistically significant, but are just a consequence of the games. We play in statistics and have no practical value,

Richard Anderson 28:39
Lies, damn lies, and statistics.

Andrew Munro 28:42
And the other giveaway is phony precision. If spurious correlations are too good to be true, phony precision is the red flag that something odd is going on.

Richard Anderson 28:56
Okay. And you’re sceptical of studies that report a correlation to more than two decimal points.

Andrew Munro 29:06
Yeah, someone’s playing the science game without understanding science, okay. And it’s the law of phony precision that caught out a very well-known us psychologist with her positivity ratio. Embarrassed by the finding that this was all flim flam, she said quote, I didn’t understand the maths. Okay. She still continues to promote the book though.

And the final observation and I know we are sort of slightly, plugging, plugging back rather than unplugged is that a correlation coefficients is a summary index. You report a validity coefficient or point five. So far, so good. It is a respectable figure. But what does the pattern of test scores mapped to the success criteria look like?

In life, show me the money is good advice. In the validity world, visualize the data is great advice. Literally display the pattern visually on a scatterplot to indicate the relationship between test scores, x axis and the success criteria on the Y axis. Is it a nice clustering of dots indicating a clear pattern? Or is it just a mess of our plotted data?

And there’s a terrific site on correlation coefficients that asks the question, what does a correlation of point five look like? The answer, once you plot all the permutations on a scattergram is pretty pretty much anything you like

Richard Anderson 30:53
Right? So what then are the alternatives to the correlation coefficient.

Andrew Munro 31:00
My preference is to apply good old fashioned expectancy tables. So here, we group our data, or data points into quadrants low and high test scores, vis a vis low and high criterion scores. And we report as percentages and it sounds simplistic, it isn’t. It’s a variation of an approach the actuaries use all the time to forecast likely outcomes. And to my mind it is a much more direct and meaningful way to interpret validation results, rather than the abstraction of a number, the number of the correlation coefficient.

Richard Anderson 31:42
That’s really interesting. Okay, so that I think is a good link to the final topic, we said we discuss and that’s all about validity in its application in decision making for selection. So the question, how do we translate the validity studies into a decision making model that improves? For example, recruitment?

Andrew Munro 32:05
A possible tangent, tangent again? But a thought has just been that triggered? Are we selecting in for exceptional levels of success? Or are we screening out to avoid damaging failure?

Richard Anderson 32:25
Okay, well, what do you mean by that? Could you would you expand a little bit please?

Andrew Munro 32:30
There are some roles where more is more, sales is a good example. Every successful appointment you make has a direct impact on the company’s bottom line. So we therefore want to select the outstanding performers. There are some roles however, where less is more. You take the head of safety at a nuclear processing company. Brilliance isn’t going to put much on the bottom line, but incompetence will have devastating consequences. And our validity research should guide our strategy – selecting in or screening out. And expectancy tables are way better at highlighting which selection strategy is optimal, rather than use a single index of the validity coefficients.

Richard Anderson 33:26
And in your experience, then how open would you say companies are in the way that they implement their assessment processes

Andrew Munro 33:35
Two strategies, one is explicit. And there’s a clear logic which is open is transparent. It’s defensible. And there’s a theory behind how we’ve used the validity research to shape our decision making algorithm to connect cause and consequence. The second is the mysterious secret sauce of proprietary intellectual property. This is becoming increasingly common and linked to development, development and the use of AI and assessment.

Here the assessment isn’t interested in theory, we’re only interested in patterns or associations that connect test data to a metric of success. And the problem – without a model of cause and effect – we get excited with number one, back to the randomness of number crunching. And secondly, potential bias from the data set that generated the numbers in the first place,

Richard Anderson 34:42
Okay, and wasn’t it Amazon? They got caught out with this, this whole approach and as I recall it, they brought in artificial intelligence to review the job applicant resumes – CVS. The idea was to widen the talent pool by scanning the internet for suitable candidates. And I think the machine learning that they used, which was screening out applicants, it was using data from an overwhelming proportion of males of males, based on previous applicants, and as a result the new recruiting engine did not like women. And the project was abandoned.

Andrew Munro 35:26
Yeah, yeah. Earlier this week, I saw a McKinsey Report. And the researchers announced companies are already using AI to create sustainable talent pipelines. At this stage, I would say, one, I doubt it very much. But do share a few examples. And secondly, if these companies are – without any kind of validation – they’re being very, very brave, if not reckless.

Another example, one of my clients was using a well-known assessment application from one of the big global consultancies, no names, but lots of shame. And my client was concerned the results didn’t quite feel right. Okay. That was her intuition. She’s a highly experienced professional. She’s worked with lots of assessment tools. And she had a bit of unease about the report outcomes

How did it play out? Well, I said, trust your intuition. Ask the vendor for the original test data, the data before the black box weights recalibrates and does its other magical stuff. And we can analyse what’s going on. The firm refused to share the data. When an assessment firm isn’t prepared to collaborate as part of an independent validation review, someone’s fooling someone. It’s a bit like the Wizard of Oz, don’t look behind the curtain. Yeah, of course not. If it is only bluff and bananas behind the behind the curtain.

Apart from the secrecy, my other reservation is the complexity of the algorithms hidden away inside the black box. One for another discussion, probably. But complexity is fragile, complicated algorithms break down very quickly.

And the other worrying aspect of the black box is the impact on diversity and inclusion. You mentioned Amazon. A few other firms are facing legal challenge over black box decision making more will in future. I have no doubt of that whatsoever.

Richard Anderson 37:53
Yeah, I’m sure I’m sure you’re right. I mean, that was fascinating listening, Andrew, I’m going to probably attempt a bit of a summary of the conversation if that’s okay. So I think that the key takeaways, I suppose, number one, we seem to be making validity, probably a lot more complicated than it needs to be, I guess, you know, in Unplugged, the question is, what is the specific claim made by an assessment? What’s the evidence to support that claim? For practitioners, this seems a much more helpful question than show me the number of like a validity coefficient, for example.

Andrew Munro 38:38
Yes, keep it grounded, keep it practical, rather than abstract.

Richard Anderson 38:44
So I know that we’ve mentioned unplugged as a term throughout this podcast and another one, I guess. So if we’re not clear on what the definition of work success and the metrics that we use to validate the assessment, there’s there’s going to be a real risk. Firstly, I guess simply selecting from the past, and those that line minute managers have previously rated highly. And secondly, the hazard of the negative unintended consequences that we discussed before if our metrics don’t reflect the reality of success.

Andrew Munro 39:21
And just to jump in on that one, another example Richard, there’s a well-known measure of Dark Side leadership, the Hogan Development Survey. And this profiler identifies the negative traits, dysfunctional, destructive behaviours of leadership. And it was once deployed in a validation exercise at West Point. West Point is the Academy for future, military, military leaders in the US. And the research is in the public domain. And a bit of a surprising finding; several of the dark side dimensions – narcissism being overly dramatic. being critical of others, being overly focused on rule compliance actually had a positive effect – I’ll repeat that actually had a positive effect on the cadets’ leadership development over time. The authors conclude the results require some explanation. While they certainly do when the so called Dark Side turns out to be the bright side of success, something strange is going on in the validity world.

Richard Anderson 40:35
Yeah. Interesting. Very strange. Okay, I guess the other observation would be the need for transparency and simplicity in the process of validation, like how it’s reported, how the findings are then incorporated into organizational processes. We might run into problems legally. And ethically if we’re not going to ask what’s in the black box.

Andrew Munro 41:02
In three minutes, Richard, you’ve distilled my 30 minutes of our wandering wonderings in validity world into a clear unplugged summary or at least our version of an unplugged summary. And I’m sure many of many of our listeners might disagree, but I think that’s a good summary. Richard.

Richard Anderson 41:27
Brilliant, and it was the 30 minutes that you mentioned was fantastic to listen to. And there’s a huge thank you, Andrew, I guess a couple of things. Just before we bring things to a close. A transcript is going to be available like it was last time as part of this podcast, which will be incorporated in the whole blog post on our website will also have references in there from the various different sources that Andrew was cited and discussed. Andrew, I must ask this in the last podcast, you talked about the sequel to A to Z coming out soon Is it is it available yet?

Andrew Munro 42:06
Z to A, that one had to be pushed back a wee bit due to a couple of other projects. The plan is for autumn now, Richard but thanks. Thanks for the plug on Unplugged.

Richard Anderson 42:20
Plug on unplug there you go. Brilliant Andrew. Well, thanks ever so much yet again. Really enjoyed the discussion. And thank you for joining me on Psyched for Business.

Andrew Munro 42:29
But you’re very welcome, Richard. Thank you.

Richard Anderson 42:32
Thanks, Andrew.

Voiceover 42:33
Thanks for listening to Psyched for business. For Show Notes resources and more. Visit evolveassess.com

Notes and references
1. Types of validity; http://psychyogi.org/types-of-validity/
2. “You’ve been speaking to the wrong expert. You’ve got to ask the right expert.” “The Thick Of It”; https://youtu.be/lADB9Qu53CY)
3. Spurious Correlations. https://www.tylervigen.com/spurious-correlations
4. “What does a 0.5 correlation look like?”; https://janhove.github.io/ teaching/2016/11/21/what-correlations-look-like
5. “If your ratio was greater than 2.9013 positive emotions to 1 negative emotion you were flourishing in life.”; https://www.theguardian.com/science/2014/jan/19/mathematics-of-happiness-debunked-nick-brown
6. “For example, let’s say you’re training an AI model to filter job candidates so you only need to interview a fraction of the applicants. Clearly, you want candidates that will do well in the job. So you get some numbers together on your old employees, and make a model that predicts which candidates will succeed. Great. First round of interviews and in front of you are 15 white men who mentioned golf — your CEO’s favourite pastime — on their resume. Why? Well, those are the kinds of people who have been promoted over the past 50 years.”
Why AI is Arguably Less Conscious Than a Fruit Fly; https://www.webworm.co/p/insulttolife?
7. The State of Organizations 2023: Ten shifts transforming organizations; https://www.mckinsey.com/capabilities/people-and-organizational-performance/our-insights/the-state-of-organizations-2023#/
8. Leader development and the dark side of personality, P.D. Harms et al; The Leadership Quarterly, 2011
9. From A To Z And Back Again; https://www.amazon.co.uk/Back-Again-Adventures-Misadventures-Talent/dp/1914424204

Spread the word:

Published March 19, 2025 by Evolve Assess

Episode 15:What does validity mean in assessments and how do we evaluate it? – with Andrew Munro

Episode 15:
What does validity mean in assessments and how do we evaluate it? – with Andrew Munro