Say no to privately owned “AI” for biodiversity monitoring

 

Artificial General Intelligence: "AI under the hood - AI represented here by geometric matrices has a go at generating cellular data. It represents a future whereby AI could in theory replicate or generate new organic structures used in areas of research such as medicine and biology." Artist: Domhnall Malone

by Michael Catchen

“Once, people turned their thinking over to machines in the hope that this would set them free. But that only permitted other people with machines to enslave them."

- Frank Herbert, Dune

We are in the middle of an AI hype wave. This is not the first AI-hype wave, and I doubt it will be the last. During the 2014 AI hype wave, self-driving cars were imminent, and rising unemployment would become the new normal as automation was going to upend the economy.

That didn’t happen, of course. The 2014 AI hype wave was spawned by rapid advancement in a handful of computer vision tasks (AlexNet, among others). This was distorted into a bunch of bullshit peddled by tech marketers to sell increasingly farfetched investments to venture capital firms. And venture capitalists love bullshit.  

So, yet again, we are in an AI hype wave. Sparked late last year by ChatGPT,  perhaps because it has been uniquely successful at giving the public access to cutting-edge models, now stapling “AI” onto whatever it is you do has become a reliable way to get gullible VCs to invest millions without a second thought.

“Artificial Intelligence” is a marketing term. Like all marketing terms, it doesn’t exist to communicate information, it exists to evoke emotion. “AI” conjures some mix of awe and fear---images of either a dystopian Terminator nightmare scenario or a post-scarcity utopia, or perhaps some mixture of the two, depending on one’s media diet. 

What “AI” certainly does not evoke is algorithms endlessly churning away doing matrix multiplication. The transformer architecture, now proposed six years ago, is the breakthrough idea at the center of the flagship products of this hype wave, from GPT to Midjourney. In short, for a while recurrent neural networks (RNNs) were the best way to handle information in sequences (like sentences), but Transformers offered a vast improvement over RNN architectures, at the expense of way, way more computing power. 

The key idea behind the “attention” mechanism that makes the transformer work is “what if we do a bunch of dot-products”, an operation that can be explained to a grade schooler. After the first few papers using transformers for language modeling, it became clear that the only limits on improved performance was more data and more computing resources .

So, although it is only matrix operations, that many dot-products isn’t cheap. GPT-3 (the 2019 predecessor to ChatGPT) cost roughly $4.5 million to train, based on market rates for cloud GPUs. And, even though OpenAI has (despite its name) refused to disclose anything about the architecture or data GPT4 is trained on, competitive models are roughly an order of magnitude more expensive to train than GPT3, with energy requirements (and therefore carbon emissions) growing in the same way.


The Role for Machine Learning in Biodiversity Monitoring

The UN’s Convention on Biological Diversity (CBD) COP15 established a new monitoring framework for Earth’s biodiversity. With the contemporary hype-wave going on in the background, there was naturally a lot of interest in the utility of “AI” for biodiversity monitoring.

And yes, there are many potential use cases for machine-learning methods for biodiversity monitoring: satellite imagery, camera traps for occurrence data , LIDAR sensing for forestry , species distribution modeling, interaction prediction, forecasting, causal attribution, and more. These will be essential tools toward developing a Global Biodiversity Observation System (GBiOS), which is essential to meet the goals of the CBD.  

But increased adoption of ML into biodiversity monitoring networks must be approached carefully. 

Due to the exploding cost to training state-of-the-art-models, the most influential research in ML increasingly tends to be done by researchers employed by large tech companies. These companies have the resources to pay talented researchers and engineers far more than academic positions, and have far more to spend on state-of-the-art computing resources. 

Right now, there isn’t a ton of interest in applications of ML research to ecological questions from mainstream research in big tech, but if Google or Microsoft decided tomorrow to roll out state-of-the-art technology for camera traps, hardware and software combined, at a global scale, they have far more resources to throw at it than the handful of us applied practitioners of ML in biodiversity science. 

And would Microsoft make this product, the algorithm and data it was trained on, publicly available out of the kindness of its heart? Or would it charge nations for access to them to make a profit? This question is of course rhetorical, but the point is that building an effective GBiOS requires considerations that simply cannot be met under the modus operandi of private companies.

As some companies are dabbling in environmental questions, they are willing to play the game of participating with scientists to better understand the scope of the problems. But make no mistake, businesses are driven only by profit. OpenAI was willing to participate in the norms of open science while developing GPT-2 and GPT-3, only to shut the door on any information on its newest models once it found a way to make money on the technology that many people outside OpenAI helped build, and privately owned tools for biodiversity monitoring would fare no different.     

Biodiversity change is a global issue without concern for political borders, but still modern geopolitical boundaries impact how environmental policy and regulation is adopted. Privatization is a direct threat to the integrity of equitable Earth observation. Privately owned data and models could lead to a nightmare scenario where monitoring systems based on closed, geographically biased data are being used and sold as the basis for conservation decisions with no input from local communities. 

Avoiding private ownership of a global biodiversity monitoring system is necessary to avoid repeating the historical mistakes of conservation turning into a vehicle of neo-colonialism. Thus it is imperative we commit to building an open-source, open-data GBiOS built on FAIR principles.