Why Scoring ‘Soft’ Skills Is Actually a Hard Problem

Written by Ilia Rushkin | Apr 1, 2026 1:04:20 PM

If I asked you to rate your own communication skills, most people reading this would say they’re above average, which is, of course, mathematically impossible. This isn’t vanity; it’s that these skills are genuinely hard to see in ourselves. A musician knows instantly when they hit a wrong note. But when did a colleague last pull you aside and say, “You’re not actually listening in meetings - you’re just waiting to talk”?

This is the problem my team at Ignis AI is solving: how do you measure collaboration or communication in a way that is fair, consistent, and actually useful - not with a vague rating scale or a manager’s gut feeling, but with real evidence?

There’s No Answer Key for “Be a Good Collaborator”

Most tests have one right answer. That model falls apart when you’re assessing human interaction skills. In our Ignis AI PowerSkillsAssessment™, a person watches a short video of a virtual team meeting where two colleagues are at odds, then responds to an open-ended question: as the team lead, how would you handle this?

There’s no single correct answer - but there are clearly better and worse ones. Someone who says “I’d just make a decision and move on” is demonstrating something very different from someone who recognizes the underlying tension, creates space for both perspectives, and thinks about preserving trust across the team. Both might sound reasonable on the surface. One reflects a much deeper understanding of how teams actually work.

Defining What ‘Good’ Actually Looks Like

Before AI touches a single response, we do the hard human work of defining what skill proficiency looks like at each level of someone’s career. For communication, someone early in their development might communicate clearly enough most of the time, but tend to miss what’s not being said - the anxiety or political tension underneath a disagreement. Someone more advanced does something qualitatively different: they listen in a way that surfaces what people aren’t saying, they reframe charged situations in ways that move the whole group forward, and they stay composed under real pressure.

These descriptions are written and validated by assessment scientists and subject matter experts. They define the target. Without this foundation, you’re not really measuring a skill - you’re measuring something that feels like one.

How the AI Scores an Open-Ended Response

Different methods are used for different skills. As a fundamental method, the AI is given two things: a detailed rubric describing what each score level means, and a set of real, previously scored responses as reference points. Language is ambiguous, and rubrics alone leave room for interpretation. Showing the AI concrete examples - “this response scored a 3, here’s why; this one scored a 1, here’s why” - grounds its scoring in actual human judgment rather than abstract description.

We also run multiple independent scoring models on each response and compare their answers to reduce variance and increase confidence. Every score comes with a written rationale, which makes human review faster and builds a growing library of explained examples that continuously improves the system. Early results show our AI scores agree with human expert scores at a level consistent with high-quality professional assessment.

Creative thinking is a uniquely important skill, and in many respects a special one. In our assessment, we measure several distinct types of creativity, some requiring specialized approaches. Using machine learning techniques, we can quantify the degree to which a person synthesizes multiple ideas and generates genuinely novel solutions. In a sense, we can assign a number to their ability to process information flexibly and think outside the box — making what was once considered intangible, measurable.

AI Scales Human Expertise - It Doesn’t Replace It

Human expertise guides and calibrates AI. We keep people in the loop to ensure models stay aligned with what actually matters. Otherwise, models can learn shortcuts. For instance, it’s easy for a system to start rewarding longer answers or answers with more sophisticated vocabulary regardless of whether the thinking is any good.

From Scores to Signal: Estimating True Skill

So far, we’ve been talking about how individual responses get scored. But a score on a single task isn’t the same thing as someone’s underlying skill. In practice, people produce a range of responses—even within the same skill area. Context matters, prompts differ, and performance can fluctuate. Treating any one score as “the truth” would be both noisy and misleading.

To bridge that gap, we use latent-variable modeling. Instead of taking scores at face value, we treat them as observations generated by an underlying, unobserved proficiency. The model aggregates evidence across multiple tasks and takes into account relationships among skills, and infers that hidden variable—what the person is actually capable of, not just what they showed in one moment. This approach gives us more than a single point estimate. It produces a full probabilistic picture of someone’s skill. In other words, we move from a single number to a structured understanding.

That richer representation becomes especially valuable downstream. Whether you’re matching someone to a role, building a balanced team, or identifying development opportunities, it’s not just about “how good” someone is—it’s about how confidently you know it, and in what contexts that assessment holds.

Does It Actually Work?

We’ve run two validation studies. The first, with 353 U.S. adults, showed strong consistency across all seven skills we measure. Human reviewers found clear, meaningful differences between high and low scorers - especially in originality, adaptability, and integrating competing perspectives. The second study, with employees at two technology companies, reinforced those results. One finding I find particularly telling: our seven skills turned out to be largely independent of each other. Being strong analytically doesn’t predict much about your collaboration skills. The one exception was leadership and communication, which were more tightly linked - because so much of effective leadership comes down to how well you communicate.

The Bottom Line

Collaboration, communication, leadership, creative thinking - these are the skills that matter most in an AI-driven world, and exactly the ones that have been written off as too fuzzy to measure. The result: talent decisions about these skills get left to interview intuitions and manager impressions, both of which carry well-documented biases.

Over more than a decade of research with the OECD, Harvard, and Microsoft - including a communication assessment that reached over 100,000 employees globally and predicted job performance better than traditional certification tests - my colleagues and I have helped establish that these skills are measurable. Not easy to measure. Not perfectly measurable. But measurable with real rigor, at scale, in ways that give people and organizations something they can actually act on.

People deserve to have their real strengths seen. We’re committed to building the tools that make that possible.

About the Author Dr. Ilia Rushkin is VP of AI & Data Science at Ignis AI, where he leads the AI/ML and data science systems powering the PowerSkillsAssessment™. His prior work includes adaptive learning research at Harvard University with Microsoft and HarvardX, and Principal AI/ML Engineer at BrainPOP, where he holds multiple U.S. patents in AI-enabled assessment.

View full post