From Correlation to Violation: Distinguishing Bias from Discrimination in the AI Act
- Arnoud Engelfriet
- 17 hours ago
- 7 min read
Author:
Arnoud Engelfriet - Data Protection and AI/ML Lawyer & Computer Scientist
Author of the The Annotated AI Act
Creator of the CAICO® course for AI Compliance Officers
Chief Knowledge Officer at ICTRecht
Abstract
The term 'bias' has become a ubiquitous reference point in discussions about Artificial Intelligence (AI), particularly within regulatory frameworks such as the European Union’s AI Act. It is frequently invoked as a threat to fairness, a cause of harm, and a defect to be mitigated. Yet despite its rhetorical prominence, the concept itself remains underexamined.
Within machine learning (ML), bias refers to a statistical pattern or deviation, often arising from data distributions or model architecture.
In law, by contrast, discrimination denotes a normative failure: the unjust or unlawful treatment of individuals based on protected characteristics.
These definitions are not interchangeable. Their conflation introduces a conceptual ambiguity with significant legal and regulatory consequences.
Conceptual foundations: Bias vs. Discrimination
In statistical and ML contexts, bias is a descriptive concept. It refers to systematic deviation from an expected value, a model’s tendency to favour certain outcomes, or an imbalance in training data that affects predictive performance. Bias, in this sense, is not inherently undesirable. Some forms of bias are necessary for generalization, and others can reflect real-world distributions. The challenge lies not in eliminating bias altogether, but in identifying which forms are normatively problematic and under what conditions.
Legal discrimination, by contrast, is a normative and institutional concept. It arises when individuals or groups are treated unequally in a way that violates established legal standards, such as the principles of equal treatment, dignity, or non-discrimination under European Union law. Discrimination is not a statistical pattern, but a legal violation. It presupposes a framework of protected attributes, an evaluation of harm, and a mechanism of redress. In this framework, the question is not whether a system is statistically skewed, but whether its outcomes produce unjustified or impermissible disadvantage.
The association of bias with discrimination can be traced to high-profile cases in which legal discrimination was facilitated, or at least plausibly explained, by the presence of statistical bias. One of the most frequently cited is the COMPAS system in the United States, which was found to assign higher recidivism risk scores to Black defendants despite comparable criminal histories.
In a different domain, Amazon’s experimental hiring algorithm was shown to penalize applications that included signals associated with women, such as attendance at all-women’s colleges. However, it is problematic to mistake one for the other, thereby confusing the tools of diagnosis with the grounds for accountability.
The conceptual boundary between bias and discrimination is therefore critical.
A model may exhibit statistical bias without producing discriminatory effects, and conversely, a system may produce discriminatory outcomes while appearing statistically balanced. For example, a credit scoring model might systematically disadvantage applicants from a certain region without explicitly using geographic data, by relying on correlated features such as income or employment type. Whether this constitutes bias, discrimination, or both depends on the legal context, the protected grounds involved, and the interpretive choices made by regulators or courts.
How the AI Act addresses Bias
The AI Act acknowledges bias as a key concern in its regulation of high-risk AI. Its treatment of the concept, however, remains conceptually diffuse and operationally underdeveloped. While the regulation repeatedly refers to the need to identify, prevent, or mitigate bias, it does so without offering a stable definition.
Bias is most directly addressed in Article 10, which outlines requirements for data and data governance. It mandates that training, validation, and testing datasets be “relevant, representative, free of errors and complete,” and that they “take into account the characteristics or elements that are particular to the specific geographical, behavioral or functional setting.”
The intent is to reduce the likelihood that biased data will lead to distorted outcomes. However, this formulation frames bias as a problem of data quality, rather than one of legal harm. It does not distinguish between forms of statistical imbalance that are benign, those that are correctable, and those that constitute proxies for protected characteristics or systemic disadvantage. Other provisions (notably Article 15 on robustness and accuracy) similarly address bias.
By not distinguishing between types or sources of bias, the AI Act leaves open the interpretation that any deviation from parity, regardless of legal salience, constitutes a compliance failure. This ambiguity has practical consequences. Developers may overcorrect in pursuit of abstract statistical balance, while overlooking the need to examine whether the observed disparity maps onto legally protected grounds or structural patterns of exclusion. Conversely, systems that produce discriminatory outcomes may evade scrutiny if the underlying statistical relationships appear technically sound.
This regulatory framing reflects a broader tension within the AI Act. On the one hand, it aspires to safeguard Fundamental Rights by imposing obligations on high-risk systems. On the other, it operationalizes these safeguards primarily through technical and organizational controls, rather than through rights-based adjudication.
The result is a framework that treats bias as a quality assurance problem rather than as an expression of legal inequality.
Without clear criteria to distinguish harmful from harmless forms of bias, and without mechanisms to connect model behavior to normative principles, the AI Act may fail to detect or prevent the very harms it seeks to address. It risks regulating correlation while overlooking violation.
When Bias becomes Discrimination
To evaluate whether bias in an AI system constitutes a legal harm, it is necessary to examine the conditions under which statistical asymmetry crosses the threshold into discrimination. The legal significance of biased outcomes is not determined by their magnitude or frequency alone, but by their relationship to protected attributes, contextual justification, and the availability of redress mechanisms.
European anti-discrimination law distinguishes between direct and indirect discrimination.
Direct discrimination occurs when individuals are treated less favorably explicitly on the basis of a protected characteristic, such as gender, ethnicity, or religion. Indirect discrimination arises when an ostensibly neutral criterion disproportionately disadvantages members of a protected group, unless the criterion can be objectively justified by a legitimate aim and the means of achieving that aim are proportionate.
This doctrinal structure reflects a central legal insight: that inequality is often reproduced not through overt exclusion, but through rules and systems that appear neutral yet functionally disadvantage particular groups. Bias becomes discrimination when a model’s outputs produce outcomes that are either unjustifiable under this legal framework or systematically replicate historical patterns of exclusion.
For instance, a credit scoring model that incorporates postal codes may disproportionately disadvantage ethnic minorities, even if race itself is not an input variable. In such cases, the legal violation arises not from the presence of statistical skew per se, but from its disproportionate impact and the absence of adequate justification.
The absence of integrated discrimination standards within the AI Act complicates the role of auditors and compliance officers. Without a clear link to substantive legal norms, risk assessments may focus on technical anomalies rather than normative harms. Fairness metrics such as demographic parity or equalized odds are often deployed as proxies for anti-discrimination compliance, yet these metrics are neither necessary nor sufficient under EU law. Legal discrimination must be assessed contextually, with regard to justification, proportionality, and social meaning--criteria that no statistical formula can capture in isolation.
Thus, the transition from bias to discrimination is not automatic, but mediated by legal interpretation. It requires that statistical disparities be situated within normative frameworks capable of evaluating their legitimacy. Absent this linkage, the regulatory apparatus risks displacing the very forms of scrutiny it purports to enable.
Protecting individuals from algorithmic discrimination demands more than technical vigilance. It requires that systems of automation be made accountable to the legal principles that structure democratic societies.
Conclusion
The challenge addressed in this article is not the presence of algorithmic bias itself, but the persistent confusion between bias and discrimination within regulatory, technical, and legal discourse. As long as these terms are treated as interchangeable, the conceptual foundation for effective governance remains unstable.
Statistical bias is a property of data and models; discrimination is a legal wrong. Conflating the two obscures the normative thresholds that define when an outcome infringes on individual rights and when it merely reflects a technical imbalance.
Moving forward, regulatory clarity must begin with terminological precision. The AI Act and related instruments should explicitly distinguish statistical irregularity from legal discrimination and identify the conditions under which biased outputs trigger obligations under Fundamental Rights law. This demands a framework that links algorithmic decision-making to legal standards of justification, proportionality, and redress.
Equally important is the need for institutional alignment. The enforcement of non-discrimination principles in AI systems cannot be left to technical design alone. It must involve legal institutions capable of interpreting context, evaluating normative harm, and ensuring meaningful remedies. Fairness metrics, while useful as diagnostic tools, are not substitutes for legal judgment. They should support--not replace--normative reasoning.
Ultimately, what is needed is a shift in regulatory orientation: from treating bias as a generic risk to treating discrimination as a specific violation. This shift enables both greater conceptual discipline and more effective legal protection. It affirms that fairness in automated systems is not a mathematical output, but a constitutional demand.
Biography of the Guest Expert
Arnoud Engelfriet is a Dutch IT lawyer and computer scientist with over 30 years of experience exploring the complex intersection of law, data, and emerging technologies. Renowned for his ability to bridge complex legal frameworks with technical insight, Arnoud has been a leading voice on issues related to software law, AI, and data governance since the early 1990s.
With an academic background in both computer science and law, Arnoud has cultivated a rare dual perspective that informs his work on legal and technical challenges in digital environments. His legal qualifications include specialization in intellectual property and patent law, further enhanced by his certification as a Dutch and European patent attorney, credentials that laid the foundation for his career in high-stakes tech law and regulation.
Arnoud currently serves as Chief Knowledge Officer at ICTRecht, where he leads the firm’s Academy and designs training programs tailored to legal and business professionals navigating digital compliance. He is the creator of the CAICO® course for AI Compliance Officers and frequently lectures on IT and law at Vrije Universiteit Amsterdam. In addition to his academic and consulting roles, he is a prolific public educator, publishing a daily blog on IT law and emerging technologies.
His published works include ICT en Recht and AI and Algorithms, both of which explore the legal dimensions of AI transformation. He is the author of The Annotated AI Act, one of the leading commentaries on the EU AI Act. Earlier in his career, Arnoud spent a decade as IP Counsel at Royal Philips, where he advised on software licensing and intellectual property matters, cementing his reputation as a pioneering legal mind in the tech sector.