Sarah Lamdan is a Professor of Law at the City University of New York School of Law. Her book, Data Cartels: The Companies That Control and Monopolize Our Information, shows how these unregulated entities mine, commodify and sell our data, threatening the democratic sharing of knowledge.
In 2017 your work as a law librarian at the City University of New York led you to begin investigating what you refer to as “data cartels.” Why did you become interested in data analytics companies?
In 2017, I was sitting at my desk in the law library, when someone sent me an Intercept article called These are the Technology Firms Lining Up to Build ICE’s “Extreme Vetting” Program. The reporters, Sam Biddle and Spencer Woodman, had filed a Freedom of Information Act request to obtain a list of the companies in attendance at ICE’s “investor day” for this massive digital surveillance system. I was surprised to see that LexisNexis and Thomson Reuters representatives were among the guests. I knew LexisNexis and Thomson Reuters as the companies that provide Lexis and Westlaw, the two main legal research companies in the U.S. I used their products every day, and my main job was to teach students, teachers and lawyers how to use them. What could they possibly be contributing to a predictive policing and people-tracking program?
In your new book, Data Cartels: The Companies that Control and Monopolize Our Information, you write about how companies such as LexisNexis or Elsevier — the biggest academic publisher in the world — are moving into data analytics. Can you walk us through what’s happening here?
With a little digging, I discovered that Thomson Reuters and RELX were not just publishers like I thought they were. They may have been called publishers in the past, but now they are sprawling information corporations that sell both published information products (case law, academic journals, news stories, etc.) and personal data about all of us, collected from over 10,000 sources and updated in real time. What’s more, the companies were transitioning away from publishing and towards data analytics, building predictive policing systems, academic analytics products, and other products that work by sifting through our personal data to rank us, track us, or assess our “risk.”
I was shocked that I hadn’t known this before, and that this wasn’t common knowledge among librarians, who use Lexis, Westlaw, Elsevier, and other RELX/Thomson Reuters products every day. The more I learned about these companies and their business model, the more I felt the library community should know about what these companies are up to.
Companies that were once publishers (and that many of us still know as publishers) are pivoting towards creating and selling personal data products or expanding their business to include data products. RELX is building up its data analytics business as its traditional publishing business wanes. In fact, the MSCI (Morgan Stanley Capital International) switched RELX’s designation from “media” company to “business services company.”
The reason that companies like RELX and Thomson Reuters are so well-equipped to make this transition is that both companies have a huge amount of digitized information and personal data in their collections. They have access to volumes of news, legal information, financial data points, and academic knowledge and data. They also have billions of pieces of personal data. This means that they have plenty of material to use in data processing systems. So, they can build predictive and prescriptive data analytics systems to make all sorts of guesses about our future behavior and success, and to tell institutions who to trust, who to hire, who to deny services to, etc. Personal data-based assessments are very lucrative, which is probably helpful to maintain profits when the traditional publishing industry struggles.
What are the potential repercussions of this shift for academia?
The first is that, as our publishing companies focus more on building data analytics products, we must ensure that the companies continue to produce high-quality academic research publications. In a business model that focuses on data, and not academic research, information quantity is prized over information quality. The more information and personal data a data analytics company can amass, the more robust its data products will be. Also, the more a company focuses on new technologies, the less they might focus on legacy publishing enterprises.
Another potential issue for academics is personal data collection. When we use these companies’ products, we aren’t just consumers; we are data providers. We give the companies information about our scholarship, who our associates are, what other articles we read, etc. We know Elsevier is using our data to assess our professional impact, and the impact of our work. Similarly, Lexis’s law product is building legal analytics that likely incorporate information about what its users read and cite. We need to be careful that when Elsevier acquires companies like Interfolio, a popular product that acts as a repository for records from academic hiring and tenure processes, that the information collected there is kept separate from RELX’s other personal data products.
How can librarians, archivists and other academic staff safeguard personal data and other sensitive information while still making sure that individuals have open access?
Open access projects (and funding for those projects) gives us the opportunity to create surveillance-free research infrastructure. In systems that don’t require password access, researchers do not necessarily have to trade their personal data in exchange for access to journal articles and other research. Sufficient, sustained funding for open access infrastructure gives us an opportunity to build systems that are separate from personal data surveillance and data analytics products.
What can academic staff do to challenge what’s happening?
As academic staff, we are among these companies’ customers. Especially in the case of Elsevier, we are major consumers. As customers, we are in a position to voice our satisfaction with product features we like, and, conversely, to complain about product features we don’t like. For example, we can let Elsevier know that we don’t like spyware on our research platforms, or that we think the company should create better open access schemes for its journals. Because these companies dominate so much of the academic information market, we may not have a lot of choice about whether we use their products. But whether or not we can part with their products, as consumers, we can let companies know what we want them to do better.
Another thing we can do is to push for better oversight and regulation of both data brokers and publishers. We can see how our efforts in the open access movement have prompted the White House’s Office of Science and Technology Policy to push for public access of taxpayer-supported research. We can ask our legislators and regulators to focus on improving open access and researchers’ privacy. To this end, we can support and participate in organizations doing open access and researcher privacy work, like SPARC.