Ethical and Sustainable Data BySarah-Jane Smyth & Jasmine Grimsley

width= Sarah-Jane Smyth is CEO and founder of The London Data Company, a data consultancy focusing on data solutions for the betterment of society. She specialises in applying ethical and sustainable AI solutions for impactful insights. Sarah-Jane has extensive experience in business consultancy, digital transformation and delivering technology solutions for public and private sectors.
Jasmine Grimsley is Chief Data Officer at The London Data Company. Her previous roles include Head of Science and Research at the UK Health Security Agency, and Innovation and Horizon Scanning Lead at Joint Biosecurity Center. Jasmine has a PhD in Biomedical Sciences, Neuroscience from the University of Nottingham.
In this post Sarah-Jane and Jasmine from The London Data Company explore some of the key issues in AI ethics, focussing on one of the less acknowledged elements of data ethics: sustainability. Big Data, they argue, poses a significant threat to global warming. What steps can we take to ensure AI becomes environmentally sustainable?

By now, we have all become accustomed to the idea that our capacity and ability to harness data and develop AI solutions have a profound impact on businesses and society alike. It is propelling us, at pace, into what is often described and considered to be a modern-day equivalent of an Industrial Revolution. And just as with the previous Industrial Revolution, it is generating positivity and fear.

For many years now businesses have been increasingly tapping into the power of data to derive valuable insights that are specific to their business needs. These insights have been revolutionary in transforming their decision-making, their processes and propelling them forward. The recent surge in the use of generative AI has further accelerated this trend. However, it is important to recognise that while AI excels at optimising output and generating insights, it falls short in explaining the underlying rationale or evaluating associated risks and assumptions. It is not uncommon to come across stories where AI has been portrayed as a failure, often due to unrealistic expectations or its lack of training for a specific use case. Within the AI and Data industry, we need to collectively recognise the importance of explaining to users how AI generates its output and the level of certainty involved, we need to be building AI that delivers accessible and trusted insights. Currently, there appears to be a challenge among data leads, particularly those working with generative AI, in effectively communicating the inner workings of their AI models. Transforming raw data into trusted insights demands a strategic approach that surpasses the boundaries of conventional analytics.

Here we will explore some of the key aspects surrounding AI ethics and dive with a deeper focus into one of the less acknowledged elements of data ethics, sustainability, providing suggestions and tools to help you be part of the solution.

AI ethics is far more than control of personal information, or representation of at-risk groups. Though these are crucial, to truly establish trust in the AI solutions we create, we must adopt a broader perspective on AI ethics. Issues such as bias , discrimination , and privacy breaches have rightly gained significant attention in recent years. As data practitioners, it is our responsibility to address these ethical challenges headon and ensure that our models and algorithms are fair , transparent , and capable of being explained to the users of our solutions.

In addition to these concerns, we must also acknowledge the growing ethical threat posed by adversarial attacks. These attacks manipulate data to deceive AI systems, potentially compromising the accuracy and reliability of the insights generated. To address this challenge, it becomes imperative to develop robustness in our AI systems . We need to ensure that we have thoroughly tested and are confident our solutions are resilient to adversarial attacks, whilst also not prone to being over-trained. By doing so, we can maintain the integrity of organisational data and safeguard against unauthorised modifications. We must also be cognizant of the future-proofs of our solutions as ensuring the accuracy of our AI solutions over time is an ethical focus. Many Data Scientists will have found themselves in a situation where they have built an ethical explainable model that they are rightly proud of, only to hand it over to a customer who does not have the capability and/ or capacity to maintain it effectively. The delivery of a data solution is not a one-time endeavour, most, (if not all) require ongoing maintenance and updates to remain effective and crucially to ensure they don’t drive misinformation. To address this ethical challenge, we need to prioritise the maintainability of our data solutions. This involves incorporating built-in continuous monitoring, evaluation, and mechanisms for identifying areas that require improvement or updates. By doing so, we collectively ensure that our data solutions remain adaptable to changing requirements and evolving data quality. By striving for maintainability , we can guarantee the longevity and long-term value of our data solutions. Sustainable practices are crucial in this endeavour, as we must be aware and vocal about the future impact of our AI solutions.

When we are designing ethical AI solutions and looking at that future impact, sustainability has to be a primary consideration. With more and more data being generated, processed, stored and backed up, the way we consider and plan for this impact in our AI designs will be crucial for future generations.

ENVIRONMENTALLY SUSTAINABLE AI

As data practitioners, it is imperative for us to openly discuss and address the sustainability and environmental consequences of our AI solutions. The exponential growth of data storage and usage, especially with the rise of generative AI, poses a significant threat to global warming (Constance et al., 2021). The sheer volume of data worldwide is expanding rapidly, leading to a substantial increase in electricity consumption by data centres. Between 2005 and 2010, global electricity consumed by data centres surged by an alarming 56%, accounting for 1% of global electricity consumption in 2020. It is estimated that this trend will persist, with data centres forecast to contribute around 13% of global energy consumption by 2030 (Huang et al., 2020, Haddad et al., 2014, Jouhara et al., 2014, Güğü et al., 2023). While the adoption of green energy by many data centres is commendable, sustainability in data extends much beyond the simple metric of how energy centres are powered, we must also consider the impacts from the extraction of precious metals in hardware production, the land-footprint, water consumption and the environmental impact associated with constructing and operating the growing volume of data centres.

The exponential growth of data, data analytics and AI presents a large challenge to the CO2 generated and its storage and processing. Figure 1 shows the dramatic growth in the volume of data created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025 in Zettabytes (Taylor, 2022).

As of today, a significant amount of data remains unused, contributing to negative environmental impacts and escalating storage costs. This surplus data, known as dark data, continues to accumulate, primarily due to the vast amount generated by the Internet of Things and sensor devices. Astonishingly, it is estimated that up to 90% of data generated by IoT devices go unused (Gimpel and Alter, 2021). Moreover, a substantial portion of this data loses its value, up to 60%, within milliseconds of its generation (Corallo et al., 2021). If not managed effectively in the future, the worldwide CO2 emissions resulting from storing dark data could exceed 5.26 million tons per year (Kez et al., 2022). It’s worth noting that the associated CO2 emissions for this dark data are again not the only environmental concern; for dark data alone, the estimated water used for data centre cooling and the land footprint are also significant factors, amounting to 41.65 billion litres and 59.45 square kilometres respectively.

Hao (2019) highlighted the environmental risk of CO2 emissions generated by the use of AI technologies. It has been estimated that energy use is split between 10% on training a model and 90% on serving it. This highlights that it is critical to consider the whole life cycle as a model when thinking about sustainability. A model may have a higher energy consumption during training, but it could reduce overall total carbon emissions if that model also cut serving energy by 20%

(Patterson et al., 2021). Patterson et al. (2021), estimated carbon emissions due to training GPT3 are 552 tCO2e, this would require 9,127 tree seedlings to be grown for 10 years to offset it (computed from Greenhouse Gas Equivalencies Calculator | US EPA). That’s just for getting the model trained, not served, and used by so many of us. By optimising energy consumption, reducing data redundancy, employing responsible data management practices, and being thoughtful when developing and maintaining models organisations can help reduce the environmental impact of data.

While in the UK there is no specific legislation on making AI sustainable, the Well-being of Future Generations (Wales) Act 2015, is a legislation enacted in Wales that aims to promote sustainable development and ensure the well-being of future generations. This applies to data-driven solutions. It emphasises the importance of sustainable practices and decisionmaking that balances the needs of the present without compromising the ability of future generations to meet their own needs. It serves as a framework for creating a sustainable and inclusive future for the country.

While acknowledging that data can contribute to environmental damage, it is crucial to recognise the significant positive role that data-driven insights can play in addressing sustainability challenges. At LDCO, we firmly believe that as data practitioners, we have the power to be part of the solution and we are vocal advocates for using data for good. By implementing sustainable practices across the entire data lifecycle, from collection, storage, processing and disposal, we can make a positive impact and contribute to a more sustainable future. By using the most efficient algorithms possible, minimising data collection, storing only the data we need, and using emerging technologies like Edge analytics, we can mitigate the environmental footprint of the projects we work on. Each of us has the potential to reduce the overall environmental impact of data storage and insights. Balancing the benefits of data-driven insights with responsible practices ensures that AI becomes part of the solution, not the problem. By sharing knowledge on methods for energy efficiency, establishing best practices, and promoting transparency in CO2 use, we can all harness the potential of data while minimising potential environmental impact. We can make this explainable by reporting the CO2 impact of training our models, and storing our data using tools like this one; carboncalculator.ldco.ai/home

We all have a role to play and a personal impact we can achieve by embracing responsible and ethical AI practices, prioritising maintainability, and integrating sustainability into our data strategies. By implementing robust frameworks for ethical AI development, conducting regular audits to identify and mitigate biases, and fostering diversity and inclusion in our data teams, we can build AI systems that promote social good while upholding ethical and sustainable standards. One way we have been able to support this at LDCo is by developing an AI health check, we are excited about the good this does in as little as 10 days.

REFERENCES

Constance Douwes, Philippe Esling, Jean-Pierre Briot. A multiobjective approach for sustainable generative audio models. 2021. (hal-03296897)

Corallo, et al. (2021). Understanding and defining dark data for the manufacturing industry. IEEE Trans. Eng. Manag. (2021), pp. 1-13

Gimpel, A. Alter. (2021). Benefit from the Internet of Things right now by accessing dark data. IT Professional, 23 (2) (2021), pp. 45-49

Güğül, Gül Nihal & Gökçül, Furkan & Eicker, Ursula, 2023. “Sustainability analysis of zero energy consumption data centres with free cooling, waste heat reuse and renewable energy systems: A feasibility study,” Energy, Elsevier, vol. 262(PB).

Hao, K. (2019, June 6). Training a single AI model can emit as much carbon as five cars in their lifetimes. MIT Technology Review. www.technologyreview.com/2019/06/06/239031/training-a-singleai-model-can-emit-as-much-carbon-as

Heidorn, Patrick. (2008). Shedding Light on the Dark Data in the Long Tail of Science . Library Trends. 57. 280-299. 10.1353/ lib.0.0036.

Huang, Pei & Copertaro, Benedetta & Zhang, Xingxing & Shen, Jingchun & Löfgren, Isabelle & Rönnelid, Mats & Fahlen, Jan & Andersson, Dan & Svanfeldt, Mikael, 2020. “A review of data centres as prosumers in district energy systems: Renewable energy integration and waste heat reuse for district heating,” Applied Energy , Elsevier, vol. 258(C).

Jouhara, Hussam & Meskimmon, Richard, 2014. “Heat pipe based thermal management systems for energy-efficient data centres,” Energy , Elsevier, vol. 77(C), pages 265-270.

Al Kez, D., Foley, A. M., Laverty, D., Del Rio, D. F., & Sovacool, B. (2022). Exploring the sustainability challenges facing digitalization and internet data centres. Journal of Cleaner Production , 371, 133633.

Maroua Haddad & Jean-Marc Nicod & Marie-Cécile Péra & Christophe Varnier, 2021. “Stand-alone renewable power system scheduling for a green data centre using integer linear programming,” Journal of Scheduling , Springer, vol. 24(5), pages 523-541, October.

Patterson, David, Joseph Gonzalez, Quoc Le, Chen Liang, LluisMiquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. “ Carbon emissions and large neural network training .” arXiv preprint arXiv:2104.10350 (2021).

Taylor. (2022). The volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025 , Statistics, Total data volume worldwide 20102025 | Statista

“While acknowledging that data can contribute to environmental damage, it is crucial to recognise the significant positive role that data-driven insights can play in addressing sustainability challenges.”