Today when you think of AI the first thing that comes to mind is chatGPT, the recent chatbot launched by openAI. But that is neither the beginning nor the end. In data driven HR AI is also making a revolution. AI in data driven HR can tell us whom to hire, whom to promote, what to focus on in our employee engagement strategies and even predict who is about to quit. It does this by modeling psychological processes, preferences and predispositions. The golden age of data driven HR is upon us. We are rapidly becoming much better at understanding what happens between your ears. In my recently defended doctoral dissertation I contribute to some of the psychometric models that predict attitudes and behavior on the work floor. Or should we say behind the work screen, as few of us share a work floor in 2023. As with any new revolutionary technology there are some pitfalls and dangers we have to learn to live with. The fundamental questions are “What could go wrong?” and “What are the privacy rights of employees?” and, as an HR department or HR consultant, “What do we have the right to measure?” and “can we ethically let the AI decide?”. What follows is an adaptation of a chapter in my dissertation that dives deeply into the five main risks of AI in data driven HR. The full version can be found here.
TLDR:
The five risks of AI in data driven HR are:
1: Discrimination by AI
2: Opportunity inequality
3: Dehumanization
4: Mind Policing
5: Threat to privacy
Risk Nr. 1 of AI in data driven HR: Discrimination by AI
In 1991 at a high water mark of pop culture and a lasting memory for many in my age cohort Arnold Schwarzenegger said “hasta la vista baby” to the evil AI bot before overly dramatically terminating him. Rogue AI’s set to destroy us are long time Hollywood favorites with movies from The Matrix to the Space Odyssey and many more. But catastrophic effects of AI are just around the corner, are much more subtle, and do not require the computer to become sentient, nor any kind of ill intentions or rebellion.
Examples of AI going wrong:
In 2015 Amazon said “hasta la vista baby” to their CV vetting algorithm (Dastin, 2018) after it insisted on discriminating against women in the CV selection process, even when manual interventions were done to hide gender and correct for bias in the machine. This AI did not like women and would downgrade the CV when it could identify via secondary clues that the candidate was likely female.
COMPAS, stands for Correctional Offender Management Profiling for Alternative Sanctions. It is a case management and decision support tool used by US courts to assess the likelihood of a defendant becoming a recidivist. Basically the AI tells judges whether it thinks the individual in question would get in trouble again if the judge lets him go early or offers an alternative sanction (Brennan et al.2009). Sounds great, but …, investigative journalists for ProPublica analyzed the outcomes of the engine and found widespread Machine Bias against minorities, especially against black people. (Allen & Masters, 2021) ( ProbPublica, 2020). If you are black the AI would evaluate your chances to be recidivist higher. Basically the judge’s computer is racist.
Why does AI go wrong?
So how is this possible? How do neutral algorithms and data driven HR become engines of discrimination. It is because the machine looks for correlation not causality. Causality is complicated to establish statistically. Correlation is very straightforward. Specifically in machine learning we need massive amounts of data in order for the machine to find patterns and develop a predictive model. Typically, as well as in both Amazon’s CV selection AI and in COMPAS historical data was used. But history is tainted by inequality and discrimination. If we have been hiring mostly wealthy white males in the past, and wealthy white males have better education, better connections and more career opportunities, the AI will learn that being a wealthy white male is the proxy for success. So it will start looking for clues of who the wealthy white males are. In this way our next batch would have even more wealthy white males and the problem exacerbates. You may think the solution is as easy as hiding race, gender or anything that could be discriminatory from the AI, but here the AI will outsmart you. It will find patterns in other clues by which it will discriminate. For example an HR recruitment engine was found to use the postal codes of Chicago to estimate effectiveness at a certain job. Chicago is a very segregated city with different neighborhoods representing different ethnicities and social economical classes. Obviously those from less fortunate neighborhoods have had less opportunities for education and career development than those from the wealthy neighborhoods. So the AI in the name of the numbers will avoid those neighborhoods. Which of course isn’t fair, if I score well on objective criteria but come from a disadvantaged neighborhood I should at least have the same likelihood to do well at the job than someone from a more pampered background.
Now I hear you thinking, “well we should obviously not allow the AI to judge based on the address” but again, it is not that straight forward. What if, in our research, by creating a taxonomy of experienced utility, we are building weapons for mass discrimination in AI? Some of our affective preferences are cultural and may be related to ethnicity or social class. This would be reflected in a survey such as ours. We are psychometrically looking into the mind, once the door is open computers and machine learning will be used to optimize outcomes, that is unavoidable. The AI will then develop psychometric patterns or fingerprints of the profiles which in the past have been successful for whatever purpose the AI is employed. This means that it will discriminate on ethnicity, race, culture, social class, religion and even philosophical convictions.
How can we prevent discrimination by AI?
So how do we mitigate this problem? Well the jury is still out on it, as Kochling & Wehner (2020) point out in their meta analysis of 36 papers on discrimination by algorithms. There is no silver bullet but the answer builds on three pillars: transparency, interpretability, and explainability. We want to avoid any “black box” and create a “glass box” as Roscher et al. (2020) illustrates. Transparency, interpretability and explainability are really about keeping a human at the helm. But will this remain realistic as automation and economic pressures push people away? I wonder.
Another possible approach to the problem is to look at the three computational steps and address each individually, input, processing and output. If the input is biased the machine will exacerbate this bias, so we should try to have non-biased inputs. However, curated datasets are costlier and smaller. This would also mean the AI can not continuously learn about its environment because the inputs have to be curated first. For the processing we should follow Roshcer’s advice regarding transparency and have an active role for humans in the process.
And lastly there may be a place for affirmative action in setting the objectives for the AI, maybe the AI should have quotas to fill based on each discriminatable characteristic. If the outputs are locked on certain quotas then the AI will adjust accordingly. However, is that fair? That is an ongoing unresolved debate. Both sides of the argument invoke Rawls’ Theory of Justice (Rawls, 1999). “Rawlsian Affirmative Action” (Taylor, 2009) refers to the interpretation of modern libertarian ideas of Rawls in the context of affirmative action. Rawls is a highly influential philosopher in the American political and ethical zeitgeist. Samuel Freeman reads his views as follows:
“So-called “affirmative action,” or giving preferential treatment for socially disadvantaged minorities, is not part of FEO [Fair Equality of Opportunity] for Rawls, and is perhaps incompatible with it. This does not mean that Rawls never regarded preferential treatment in hiring and education as appropriate. In lectures he indicated that it may be a proper corrective for remedying the present effects of past discrimination. But this assumes it is temporary. Under the ideal conditions of a “well-ordered society,” Rawls did not regard preferential treatment as compatible with fair equality of opportunity. It does not fit with the emphasis on individuals and individual rights, rather than groups or group rights, that is central to liberalism.” (Freeman, 2007)
Suffice to say we are not going to resolve the debate on affirmative action in this paper. What is important to note is that, all academics and professionals dealing with AI and predictive modeling of behavior have to be aware of the prevalence of Machine Bias and to be well versed in its dynamics and remedies, even as the remedies are still being cooked up. The coming decades, with the rise of data driven HR, will bring an ongoing battle to fight discrimination by algorithms. We have to try to not make things worse with our work. Because if we let the machine loose on our minds it will be “hasta la vista baby” for any hope of a fair society.
Risk Nr. 2 of AI in data driven HR: Opportunity inequality
Our work contributes to the emergent practice of data driven HR by providing taxonomy and dimensions on which to analyze. Some describe the advent of data driven HR practices as an integral part of the so called “industry 4.0” (Sivathanu & Pillai, 2018). Also in 2018 the futurist Bernard Marr wrote a book on data driven HR (Marr, 2018). In it he identifies four purposes for the use of data:
“1: Using data to make better decisions
2: Using data to improve operations
3: Using data to better understand your employees
4: Monetization of data” (Marr, 2018)
Our work would be used in the third category, to better understand employees” but subsequently also in the first one “to make better decision”. Right now decision making in HR is messy and inefficient, there are a myriad of selection criteria and clues to look for a handful of attributes such as conscientiousness, motivation and intelligence. The fact that this is messy means that the outcomes are noisy. Let’s do a thought experiment: Let’s say hypothetically we are looking for conscientiousness and intelligence. Imagine three candidates: one, candidate A, who would if we could perfectly measure these attributes score high on them, candidate B, who would score a bit lower and candidate C the lowest. Based on an interview and the CV we may have 50% chance of selecting candidate A, 30% chance of selecting candidate B and 20% chance of candidate C (I’m making up reasonable numbers for the sake of the thought experiment). Suppose we get better at measuring, maybe with an IQ test and Big 5 personality analysis. Then the percentages would shift to maybe, A: 70 %, B: 20%, and C: 10%. Suppose we become really good at psychometric analysis and we measure things near perfection. Then we will hire A 100% of the times. B and C have no chance. If all companies do this all companies will be going after the same employees and C will never get a job. Of course in this simplified example with arbitrary criteria of selection, intelligence and conscientiousness. But we could also apply big data analysis to learn exactly what are the ideal psychometric attributes of the employee for a given function. Initially there may be some discrepancy within the algorithmic models but as they get better they will become more and more uniform identifying the ideal psychometric profile. For a while there will be a competitive advantage to the recruiters that have the best models. But like everything in tech, in a short while the access to the technology will democratize. Soon everyone will have the same excellent open source model. At some point only one specific profile can get a specific type of job. This would mean that B and C never get the job, only the A types.
I guess most of us recognize that at some point in our careers we have been offered an opportunity that was a bit of a jump for us, that may have been offered because someone subjectively believed in us. Hopefully this belief became a self-fulfilling prophecy and we grew into the new role. This dynamic of imperfect selection scatters opportunity for everyone, and yes, sometimes we hire the wrong person for the job, but these exceptional opportunities also create growth and opportunity. If all of these decisions are made by data driven algorithms there will be massive opportunity inequality and individuals will lose the freedom to try to “wing it” at different roles. This will push up the price for the A types and push down the salaries of B and C causing more inequality and decreasing social mobility.
Here there is hope in the market mechanism, if in a competitive labor market there is an incentive in identifying alternative indicators of performance potential and possible roadmap for development of B and C types, then those recruiters who do will have a competitive advantage.
A side effect of this will be that future participants of the labor market will train or be trained on profiling themselves to match the desired profile of the algorithms. This would cause extensive social desirability bias in all psychometric tests. And if subjects are not at all honestly answering the questions but rather trying to guess what the algorithm wants to hear, the tests lose all their value.
I would therefore argue that we may be aided in the decision making process by data driven tools but we should allow some room for human intuition, messy as it may be, it will create opportunities for individuals and companies and will hopefully keep respondents humane and fight the gaming of the algorithm.
Of course subjective intuition is highly biased, and maybe letting the computer decide is more objective and more meritocratic. But maybe a little bit of chaos gives everyone a chance?
Risk Nr. 3 of AI in data driven HR: Dehumanisation
It is a widespread urban legend that exceptional entrepreneurs often performed exceptionally badly at school (Denin, 2021). This is an indication that whatever we do in education may work less well for the edge cases than for the median case. Some people’s minds work differently, they think differently and they “tick” differently. Yet it is those exceptional people that change the world.
In our models we are modeling for the vast majority of people, p = 0.05 is our usual cutoff point to say that something is statistically significant. But that doesn’t mean it applies to all people, especially exceptional people, the outliers. Arguably the human mind can never be understood quantitatively, to try to do so is to strip humanity of its beauty. The edge cases should be analyzed qualitatively because the dimensions and parameters may be fundamentally different to the rest of the group.
But then is it fair to throw all the “average Joe’s” by buckets into our quantitative models? Maybe everyone is exceptional in a way, we are just not very good at identifying all the ways. So if we are going to analyze the outliers qualitatively, why do other people get reduced to a series of parameters?
We are herd animals, and, whether we are conscious of it or not, we are used to communicating, managing relationships and managing our position in a group. We expect our social groups, including our work teams, to communicate with us and to signal when there may be issues. Our brain has been trained for thousands of years in social communication. But now an algorithm trained in the last few months may define my future role in the social group or even terminate my membership of the group. This feels unfair, unhuman and damages my trust relationship with the social group. It will feel arbitrary because it is not in line with what our social senses tell us about our functioning in the group.
The more my relationship to the organization is managed by algorithms the more I am alienated from the social fabric of the organization. The social utility I derive from the job will be damaged as well as my sense of loyalty and belonging.
So how can we mitigate the dehumanization caused by data driven HR practices? Well again transparency goes a long way, Lepri and his colleagues at MIT (Lepri et al. 2018) explored the requisites for fair, transparent and accountable algorithmic decision making. It is important that the rationale of the algorithm can be explained and understood. Ideally the subjects have access to the algorithms and their parameters at all times so they can manage their performance in the system. When dealing with outcomes of the job, this makes sense and essentially becomes gamified KPI’s. Gamification has its challenges but may generally be considered to be effective to drive behavior. (Chou, 2019)
However when we are dealing with affective states and predispositions this becomes problematic, as we want to measure the real affective states and predisposition without subjects gamifying their parameters. In the context of affective states this would lead to absurd and terrifying scenarios of mind policing.
Risk Nr. 4 of AI in data driven HR: Mind Policing
I once bought a new car from a car dealer. The salesman was OK, a bit pushy and a bit annoying, but overall he did his job, I knew what I wanted so it was an easy sale. At the end of the process he tells me that I will receive a call from the brand polling the quality of his service. It is very important that I give him a five star rating otherwise he will get in trouble. He literally said (translated from Spanish) “I prefer you punch me in the face now than that you give me anything other than 5 stars”. I was feeling a bit of reactance, i.e. rebellion to go along with it. Clearly such pressure to rate highly undermines the feedback mechanism and makes the net promoter score (NPS) useless. Should I tell the reviewer, if it is not a bot, that he pressured me into giving a high review? If it were my company I would hope this would happen. However I still rely on this garage for warranty and maintenance, so I need to have a good relationship with them. So it is not in my best interest to flag this behavior. So I rated the OK service 5 stars. This event left me curious to see how widespread this issue is with the NPS and I nosed around a bit online. Turns out this is a widespread problem (Fisher & Kordupleski 2019), (Shevlin, 2021).
There is also an employee net promoter score (eNPS) that is used to gauge whether employees would recommend the employer to family and friends, or to whoever. Another critique of the NPS and the eNPS use however is that it doesn’t say why someone would recommend or dissuade. (Stahlkopf, 2019). Here we hope that our Job Utility Survey can help, but by doing so it will inherit the problems around pressured reviewers. In a sense the social utility dimension of our survey is a review of the leader’s soft skill and ability to create a positive supportive work environment, the outcome of the survey would soon become one of the KPIs monitored by their Managers-once-Removed (MoRs) or the algorithm that is evaluating the leader. When the employee scores lowly on transformational utility it could be argued that there is a lack of transformational leadership. If the employee scores low on material utility the manager could be blamed for doing a bad job at managing salary expectations and growth trajectories.
So generally the manager will be held accountable, now think back to the car dealership. There is an important difference here, the manager has a formal power relationship over the employee. I rely on the dealership for service and warranty, the employee relies on the manager for his employment, career development, performance reviews etc… It would be easy for the boss to pressure his team into certain answers. Yes I know, the surveys are anonymous, but are they really? If you have less than ten direct reports I bet you know who was the bastard that gave a bad rating.
There is an additional problem, the honest answers to the survey are not compatible with our social interaction. I don’t tell the card dealer he is mildly annoying. You don’t tell the boss you don’t like working with him/her. So if the boss asks whether you like them personally or you like working on the team the social etiquette is always to say yes, only the most well spoken expert is feedback would dare to try to find a tactful way to communicate such a thing. Most people would just let it slide, especially if the dislike is personal and not function related.
So the manager will be pressuring the team into getting positive answers on all aspects that may reflect on their leadership. They will also start micromanaging the emotional state of employees, making sure everybody is happy, everyone gets along and any tensions are resolved. This will create an Orwellian hunt on negative emotions within the team.
Imagine coming in to work on a Monday morning and your boss saying “Are you happy? You have no reason to be unhappy right? I need you to smile!….”
And with that image in your head we segway onto the issue of privacy.
Risk Nr. 5 of AI in data driven HR: Threats to Privacy
“What Orwell failed to predict was that we’d buy the cameras ourselves, and that our biggest fear would be that nobody was watching.” Keith Lowell Jensen (Jensen, 2020)
I know it feels somewhat ironic to talk about privacy when most of us have unceremoniously given up our privacy rights to the digital realm. But still, there is a difference. The privacy was yours to give up and you traded it for a bunch of tools and access to your friends on social media. But do you have to give up your privacy to your employer? You can choose not to use social media, but can you choose not to be employed?
One day when he wasn’t laying the foundation for our utilitarian philosophy Jeremy Bentham designed a prison called the “Panopticon”. The design was set up so that the guards could easily look into all cells but the inmates cannot see whether they are being watched (Semple, 1993). As the inmates cannot see whether the guard is watching or not, they must assume at all times that the guard could be watching and hence behave accordingly. Because of this reason the Panopticon would require less guards to operate. Around 300 prisons around the world have been built following this model. The social philosopher Michel Foucault in “Discipline and Punish” (Foucault, 2012) uses Panopticon as an analogy for all power relationships between social institutions and individuals. Foucault highlighted the transition from repressive power, the threat of punishment, to “dynamic normalization” or to put it in our jargon, the internalization of social norms. For Foucault, the possibility of social scrutiny, combined with the inability to know when you are being watched are essential in this internalization process. He describes “le regard” (the glance) from the institution as essential. The outcomes are docile compliant citizens. The downside is a lack of individuality, creativity, diversity and risk taking.
Now, would not our surveys become “le regard” in the relationship between the employee and the organization? Yes surveys should be anonymous, but as was mentioned before, anonymity is questionable in small teams.
The question arises, in this context, what may we ethically ask about? Arguably the employer should only be granted insight into behavior that is directly related to the job (Bhave et al. 2020). As an employer we may not ethically request information that is to be considered private. John Stuart Mill, the disciple of Jeremy Bentham pointed out that part of the individual’s life is private and only subject to self management. Furthermore the mind of the individual is private and may not be coerced or molded (Mill, 1978).
“Human nature is not a machine to be built after a model, and set to do exactly the work prescribed for it, but a tree, which requires to grow and develop itself on all sides, according to the tendency of the inward forces which make it a living thing.” John Stuart Mill
In our psychometric analysis we are peering into the minds of employees. Many of the questions in our survey are not directly behaviorally related to function of the job. Many questions are affective in nature, they may start with “I like…”, “I enjoy…” etc. Evidently employees should not be pressured into answering these questions.
So what does that mean for our surveys? Are they rendered useless? Well, no. First of all for our research there are three key factors that mitigate the privacy concerns: 1: This research is not done in association with the employers in fact, our sample is random selection of employees across the world, we don’t even know who they work for an whether there is more than one of them working for the same employers, employers will never find out whether any of their employees answered these surveys. 2: our dataset is rather big with around 500 respondents, responses are anonymised and agreggated and no individual data is ever shared. 3: respondents volunteered to submit their answers, they were allowed to leave questions blank if they felt uncomfortable with a question. The only pressure upon them is the pressure to make 1 pound, which was our compensation awarded for responding.
But the ultimate goal here is to make a tool that is useful for companies to manage their talent retention efforts. In that case any use should be subject to strict requisites to guarantee individuals privacy. These guarantees should cover 3 main aspects.
1: The process should be managed by external professional HR consultants that are not part of the organization and that are well versed in privacy requisites and place ethical compliance above incidental pressure from their customer. It is their responsibility to set up the process so no identifiable information will reach the customer (the management). And they may not report on teams smaller than a certain number as this jeopardizes anonymity.
2: Participation should be voluntary, at the level of participating, and at individual question level.
3: The technological infrastructure on which the process runs must be best off class privacy by design infrastructure that undergoes regular privacy threat modeling analysis, so as to minimize risk of accidental, or not so accidental unauthorized access to data. This should be operated by a third party data security expert.
Synthesising reflections on risks of AI in Data Driven HR
In cybersecurity we say security is a journey not a destination. This is said because 100% security can never be achieved and the environment is in constant change. It is therefore important not only to do a deep analysis on a regular basis, but also to continuously monitor, reevaluate and reassess. The same is true for the ethical implementation of new technologies used to decide over peoples careers and lives. Threats mutate and new threats emerge. We therefore advise for a structural procedural way of implementing regular deep dives such as this in the moral implication and continuous monitoring of early signals of issues arising. In cybersecurity we do this structure risk analysis with SAMMY, the OWASP SAMM Tool. In HR there is not yet a tool available to map the ethical risks of AI in data driven HR, someone should make it.
What else does Codific build with privacy by design principles?
Codific is a team of security software engineers that leverage privacy by design principles to build secure cloud solutions. We build applications in different verticals such as HR-tech, Ed-Tech and Med-Tech. Secure collaboration and secure sharing are at the core of our solutions.
Videolab is used by top universities, academies and hospitals to put the care in healthcare. Communication skills, empathy and other soft skills are trained by sharing patient consultation recordings for feedback.
SAMMY Is a security posture management tool. It enables companies to formulate and implement a security assurance program tuned to the risks they are facing. That way other companies can help us build a simple and safe digital future. Obviously our AppSec program and SAMMY itself is built on top of it.
We believe in collaboration and open innovation, we would love to hear about your projects and see how we can contribute in developing secure software and privacy by design architecture. Contact us.