Countless State and Federal regulations and statutes—not to mention the U.S. Constitution—prohibit discrimination against protected groups. However, AI systems might slip discrimination past current laws through “proxy discrimination” without new regulatory and statutory approaches. Today’s AI systems and algorithms are capable of dredging oceans of big data to find statistical proxies for protected characteristics and create algorithms that disparately impact protected groups. AI systems with such capabilities already exist in the fields of health and automobile insurance, lending, and criminal justice, among others. In Proxy Discrimination in the Age of Artificial Intelligence and Big Data, Anya E.R. Prince and Daniel Schwarcz address this particularly “pernicious” phenomenon of proxy discrimination. Current anti-discriminatory regimes which simply deny AI systems the ability to use the protected characteristics or the most inuitive proxies will fail in the face of increasingly sophisticated AI systems.
They provide a coherent definition for proxy discrimination by AI: usage of a variable whose statistical significance for prediction “derives from its correlation with membership in a suspect class.” For instance, consider a hiring algorithm for a job where a person’s height is relevant to job performance, but where the algorithm does not have access to height data. In attempting to factor height, the algorithm might discover the correlation between height and sex, and correlations between sex and other data. This would be an example of proxy discrimination because the statistical significance of the other data derives from its correlation with sex, a protected class.
Prince and Schwarcz first foray into the pre-AI system history of proxy discrimination, i.e. human actors intentionally using proxies to discriminate. This discussion is interesting and gives background to the modern legal principles regarding disparate impact discrimination. However, while proxy discrimination by AI could of course be implemented by human actors, proxy discrimination by AI is more likely to be unintentional.
In some discussions of AI discrimination, statistics and legal principles become confusing and muddled, but one of Prince and Schwarcz’s great strengths is their clear and distinct presention of both statistical analyses—which are critical to identifying and addressing disparate impacts—and legal principles. Their discussion of alternative regulatory and statutory strategies for achieving anti-discriminatory goals is effective. Prince and Schwarcz focus on three potential strategies: (1) allowing AI models to collect data on individuals’ protected characteristics so that this data can be reported to regulators and/or the public; (2) implementing “ethical algorithms” that use statistical methods to eliminate or correct for correlations between facially neutral characteristics and protected characteristics; and (3) prohibiting all forms of discrimination except forms that are specifically allowed.
Prince and Schwarcz give compelling reasons why, in the absence of such strategies, current laws will likely fail to prevent proxy discrimination by AI in many cases. Simply banning obvious and intuitive proxies for protected characteristics is likely to be ineffective because AI systems can find and use less obvious and intuitive proxies. Approaches based on traditional disparate impact liability are also week because proxy discrimination is often done unintentionally by AI systems seeking to achieve legitimate objectives. In such situations, the creators of such systems can more easily escape liability by showing that their practices are in line with business necessity and by downplaying the availability of less discriminatory alternatives.
In other words, the current regime for addressing AI discrimination by proxy is likely to fall short of what society needs. However, Prince and Schwarz persuasively argue adopting new measures could better address the issue of proxy discrimination. For one, giving AI systems access to more protected data, rather than less protected data, could in some instances be beneficial because these systems could (1) more transparently report on the actual impacts of the systems to regulators or the public and (2) explicitly control for disparate impacts through statistical methods. There are, of course, potential risks and tradeoffs of giving increased access to protected data, such as valid privacy concerns would need to be addressed.
AI systems are getting smarter, and regulatory and statutory systems will need to become smarter if they are to prevent instances of proxy discrimination against protected groups. The approaches discussed by Prince and Schwarz show promise and are worth discussing as regulators and legislatures evaluate the regulation of artificial intelligence.
Jacob Ladd is an Associate Editor on the Michigan Technology Law Review.