Beyond individual tech companies being punished, we should be thinking about how to change the market system that rewards and empowers these companies.
Breaking up big tech has become a favorite rallying call across the political aisle — from Elizabeth Warren to Donald Trump, there has been a bi-partisan feeling that tech companies have become too powerful in their excesses.
From data hacks to algorithm echo chambers to the growing sense that today’s technology is less about serving us than selling us, public confidence in the security and handling of their data is low. It’s tough to feel any sense of control as an ethical consumer in today’s tech landscape — the relentless cycle of boycotting technology platforms like Facebook or Uber, finding few appealing alternatives online for the same services, and coming back to the app after a few weeks is disheartening.
A quote from a 2017 Pew study on general attitudes toward cybersecurity is revealing:
“A majority of Americans (64%) have personally experienced a major data breach, and relatively large shares of the public lack trust in key institutions — especially the federal government and social media sites — to protect their personal information”
Recent efforts have emerged around defining, measuring, and moderating electronic and algorithmic systems, but the tech industry is still grappling with the hard work of defining meaningful frameworks for overseeing the production, processing, consumption, and sale of data. A persistent question I have is, “Why has the bi-partisan public outcry against unrestrained technology focused so much on breaking up big tech instead of on policy change?”
What would it take for tech ethics to become a standard part of the way that software applications get built and maintained over time?
I wanted to explore what existing efforts have been going on inside tech companies to protect consumer data, see if breaking up big tech might actually help reform the industry, and clarify the current state of data regulation policy.
In a recent O’Reilly blog post that included DJ Patil, the first Chief Data Scientist for the US government, as one of its authors, the problem of ethical design was addressed by putting pressure on tech companies to change their culture:
“Users want to engage with companies and organizations they can trust not to take unfair advantage of them. Users want to deal with companies that will treat them and their data responsibly, not just as potential profit or engagement to be maximized. Those companies will be the ones that create space for ethics within their organizations. We, the data scientists, data engineers, AI and ML developers, and other data professionals, have to demand change. We can’t leave it to people that “do” ethics. We can’t expect management to hire trained ethicists and assign them to our teams. We need to live ethical values, not just talk about them. We need to think carefully about the consequences of our work. We must create space for ethics within our organizations. Cultural change may take time, but it will happen — if we are that change. That’s what it means to do good data science.”
Surveying the front lines of data ethics, there is tremendous effort being put forward by social scientists and ethicists working with engineers and data scientists. There’s been no shortage of ethical checklists around designing AI systems, just as there’s no shortage of increasingly intricate statistical metrics being invented by engineers that attempt to capture “fairness.” A few standout efforts include the AI 360 open source library built by IBM for implementing fairness metrics, algorithm impact assessment protocols from the AI Now institute, and deon — a command line tool for integrating ethics checklists into software projects. With all these tools emerging from best practices at different organizations, we gain a growing body of lessons that move us closer to precise and robust data governance frameworks.
However, these tools are being developed mostly divorced from any kind of value proposition — what is the current benefit to a mature software product that incorporates these tools? How do these tools integrate into the production-grade software that tech companies rely on?
The tool-building and advocacy of thoughtful and principled data practitioners in cases like Google’s employees’ protest against its Project Maven, Microsoft’s internal struggle around collaboration projects with ICE, and Salesforce’s internal review of its US government contracts have been powerful examples of principled tech organizing, but do little to reliably constrain bad actors in the industry. Corporate culture shift of the kind mentioned in the blog post requires a fundamental shift in the incentives that drive products, beyond that of advocacy against individual companies.
What that says to me is that the work going into defining ethics principles is not yet meaningfully translating into the implementation of these ethical principles into technology systems. Possible reasons might include misaligned incentives between the engineers and the users of technology, the lack of open source software support for incorporating data ethics principles into the most popular engineering packages, as well as fragmented data regulation policies.
Chris Wiggins, Chief Data Scientist at the New York Times, addresses this disparity between theory and practice in his fast.ai interview on tech ethics. He distinguishes between defining ethics and designing for ethics in technical systems, making the point that:
“Defining ethics involves identifying your principles. There is a granularity to ethics: we need principles to be granular enough to be meaningful, but not so granular that they are context-dependent rules which change all the time. For example, ‘don’t be evil’ is too broad to be meaningful.
Ethical principles are distinct from their implementation, and defining principles means being willing to commit to do the work to define more specific rules that follow from these principles, or to redefine them in a way more consistent with your principles as technology or context changes.”
— Chris Wiggins, Chief Data Scientist at the New York Times
Shifting from the perspective of defining tech ethics to designing for tech ethics requires shifting from thinking about ethics as a legal or sociological device to ethics as a technology feature. Open source software, the lifeblood of most software companies in the world, represents a major way that best practices in technology are disseminated at scale to software developers. Companies copy over open source code, customize it for their own needs, contribute back to the original code, and engage in a fluid negotiation around the best ways to build software. Until these open source packages start supporting reliable and robust solutions to data ethics problems, the implementation of data ethics won’t scale.
Following this line of thinking, many of the foundational open source packages are now being built and advanced by private tech companies like Facebook, Google, and Airbnb, to name a few. These centralized points of software distribution are therefore implicitly guided by the goals of these tech titans. As big data demanded larger scaling solutions, LinkedIn released Kafka as a distributed message bus solution, Facebook’s Cassandra became a widely used large-scale database, and Google pushed out Kubernetes as a distributed container orchestrator for building global-scale application services. These have all become industry standards, ubiquitous as hiring requisites in the majority of job listings for modern engineering jobs. Software development necessarily follows the pace and interest of the industry, which suggests that moving the direction of software development requires tweaking the direction of the industry at large. Enter GDPR and HIPAA.
Originally passed in 2016 and officially rolled out in May of 2018, GDPR is the largest national data regulation and protection policy implemented in the world. With its punitive powers to impose hefty fines (4% of global annual revenue or 20 million euros) on companies that violated its tenets of the right to be forgotten, data portability, and data minimalism for EU citizens, GDPR was a focusing event that caused companies worldwide to hastily redesign their tech stacks and internal processes.
Similar to this was HIPAA in the US, a body of legislation that reframed health data as a protected class that also used heavy fines to cause the US healthcare industry to prioritize data security and privacy. These two pieces of legislation were effective in asserting protective guard rails around data because they were implemented as meaningful constraints on market systems. There were serious legal and financial consequences for violating these policies, and industries paid attention.
From a systems perspective, GDPR and HIPAA changed the profit/loss equation for private business, creating a surge of innovation in tools, software, and checklists for helping companies becoming GDPR and HIPAA compliant. Another way of thinking about this is that government regulatory policy provided the definition of ethics, while private industry innovated the design of ethical technology features inside of those policy constraints.
When the effort to redesign systems for GDPR at a previous workplace rolled around, concepts like ad-hoc data deletion requests and data portability became important design principles for the first time. Figuring out a way to accommodate personal data deletion and masking in our data warehouses became a serious architecture consideration because of the legal pressure applied by GDPR.
The take-away point from this discussion is that blaming private companies exclusively for data ethics failures while also expecting private companies to lead the way in designing for ethics is an unsustainable assumption. Following massive public outcry around data ethics failures, Facebook launched a large PR campaign for the company towards privacy protection and established an IRB review pipeline for their internal processes, now calling themselves a “privacy-focused company”. How much of this image rebranding is lip service remains in question, especially as Facebook continues to demonstrate a largely reactionary platform response to the spread of doctored videos on Nancy Pelosi.
Utilizing technology boycotts and sustained public pressure is effective for calling out specific bad actors, but is not a realistic strategy for stimulating change in the entire field. Many other tech companies utilize similar data targeting practices and have as much bias exposure in their algorithms as Facebook. This is also why calls for breaking up Facebook is also unlikely to have much effect on data protection in the long term — other companies would likely step into the void left by Facebook’s absence. Previous examples of the breakups of AT&T in 1982 and Microsoft in 2000 illustrate that enforcing antitrust laws on tech monopolies does not necessarily lead to meaningfully competitive successors. This is because the market is structured in a way where the builders of technology systems are incentivized to optimize defined values of financial profit, technical efficiency, and human resources at increasing economies of scale. We cannot count on corporations to ethically innovate towards a moving target against their own interests — we need a federal data regulation policy.
The US has yet to match the level of regulatory policy towards data that others like the EU, Canada, and Australia have piloted, and has been largely reactionary in addressing data breaches, privacy/surveillance concerns, the buying and selling of personal data, biased AI, and technology failure accountability. A recent interview that Kara Swisher’s Recode/Decode conducted with Nancy Pelosi in April 2019 revealed the lack of precise policy around data regulation present in US Congress — when pushed on what specifically Democrats would do to combat data misuse, her response lacked specifics:
“At our federal level, people are working on … Committees of jurisdiction are working on privacy. We haven’t seen anything in writing — that is to say, for review yet. I’m sure they have it in writing someplace.”
— Nancy Pelosi, Speaker of the House of Representatives
Clearly, it’s not a priority for Congressional leadership. This is a massive wasted opportunity, since only federal legislation would have the ability to regulate interstate commerce and impose restrictions on the ability of businesses to collect data on consumers. Left only to state policy, data protection is limited to empowering consumer choice using “opt-out” guarantees. States like California have been proactive in passing legislation like the California Consumer Privacy Act of 2018 that allow citizens the “right to know” what data is collected on them as well as a “right to opt out” of the sale of their personal information — this puts the burden of action and awareness on the consumer, rather than allowing a blanket set of protections afforded to every digital citizen. A federal data policy codified into law would be able to enforce a common set of standards across states that would remain stable across different presidential administrations.
A strong first step would be recognizing data regulation as a federal priority and pressuring Congressional representatives to call for a sub-committee to draft and legislate data bills. Lawmakers have to move beyond individual legislators bringing up pet bills. With the 2020 US election coming up, we must carefully evaluate political candidates on their stances towards federal data regulation in addition to other big ticket issues like the environment, healthcare, and foreign policy. As voters, citizens have a unique chance to drive stronger change than what product boycotts or punishing tech companies can accomplish. As data producers and consumers, we must be critical and determined in building an ethical digital world accountable to laws like our real one.
Feel free to connect with me on LinkedIn, Twitter, Github, or Medium!
References:
https://www.wired.com/story/what-microsofts-antitrust-case-teaches-us-about-silicon-valley/
https://www.businessinsider.com/att-breakup-1982-directv-bell-system-2018-02
https://www.pewinternet.org/2017/01/26/americans-and-cybersecurity/
https://www.wired.com/story/what-microsofts-antitrust-case-teaches-us-about-silicon-valley/
https://www.fast.ai/2019/03/04/ethics-framework/
https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html
https://eugdpr.org/the-regulation/
https://www.oreilly.com/ideas/the-problem-with-building-a-fair-system
https://www.nytimes.com/2018/06/19/technology/tech-companies-immigration-border.html
https://gizmodo.com/google-plans-not-to-renew-its-contract-for-project-mave-1826488620
https://www.oreilly.com/ideas/the-problem-with-building-a-fair-system
https://www.oreilly.com/ideas/doing-good-data-science
This writing reflects my personal views and research. It does not reflect the stances of any of my current or previous employers.