Internet-Draft | Sustainability Arch Considerations | December 2024 |
Pignataro, et al. | Expires 30 June 2025 | [Page] |
This document describes several of the design tradeoffs for trying to optimize for sustainability along with the other common networking metrics such as performance and availability.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 30 June 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
Over the past decade, there has been increased awareness of the environmental sustainability impact produced by the widespread adoption of the Internet and internetworking technologies. The impact of Internet technologies has been overwhelmingly positive over the past years (e.g., providing alternatives to travel, enabling remote and hybrid work, enabling technology-based endangered species conservation, etc.), and there is still room for improvement.¶
At the same time, internetworking technologies themselves have a significant environmental footprint. Reducing this footprint and making network deployments environmentally more sustainable becomes a matter of increasing importance to network providers for various reasons, including the desire to reduce operational expenses associated with energy usage, regulatory pressures related to Net Zero mandates, and corporate citizenship demands from customers to become "greener".¶
Efforts to make internetworking more sustainable may in some cases conflict with other important goals, such as network availability, resilience against sudden spikes in traffic demand, and (at least in some cases) Quality of Experience. For example, a network operator might want to include excess capacity in their network to be able to absorb sudden spikes in demand and provision additional paths to increase resilience against failures. However, doing so may involve powering up additional ports and equipment, resulting in higher energy usage and increased greenhouse gas emissions, thus compromising sustainability goals. This indicates that there are certain choices that a network operator has to make, some of which may involve tradeoffs between goals.¶
This document describes some of the tradeoffs that could be involved while optimizing for sustainability in addition to or in lieu of traditional metrics such as performance or availability. Further, it discusses how Internet technologies can be used to help other fields become more sustainable.¶
Specifically, this document details environmental sustainability implications to Internet protocols, architectures, and technologies.¶
This document leverages the terminology and concepts defined in [I-D.pignataro-green-enviro-sust-terminology], and readers are expected to be familiar with those.¶
Traditionally, digital communication networks are optimized for a specific set of criteria that proxies for business metrics. A network operator providing services to their customers intends to maximize profits, by increasing top-line revenue and decreasing bottom-line associated costs. This directly translates to goals of optimizing performance and availability, while reducing various costs.¶
Most recently, various forces elevate the need for sustainability in networking technologies and architectures, to quantify and minimize negative environmental impact.¶
Optimizing only network availability (e.g., by having excess capacity and backup paths) or optimizing only performance (e.g., by increasing speeds selecting paths based on delays only) can seemingly be in opposition to optimizing sustainability objectives. For a given application, use-case, or vertical realization of technology, a Pareto-efficient choice can potentially improve sustainability without sacrificing availability or performance beyond the application tolerance. That is, a win-win.¶
Consequently, network architects and designers are presented with a set of new design tradeoffs: a multi-objective optimization that satisfies border requirements and global optima for availability, performance, and sustainability simultaneously. This is not unlike the doughnut economics model concept described in [Doughnut].¶
To understand this new model, we can analyze a simplified example. Assume the following topology, passing traffic from A to B:¶
Router 1 is directly connected to Router 2 through five parallel links, of 10 Gbps each. Router 1 can also reach Router 2 through Router 3 with 40 Gbps links between Router 1 and Router 3, and between Router 3 and Router 2. Let's assume that the capacity-planned traffic between A and B equals 15 Gbps.¶
In this scenario, a topology optimized for performance and availability/resiliency would have all links and routers on, and would likely forward traffic using two of the parallel links. Utilizing the path through Router 3 might lower performance, but it serves as a backup path.¶
On the other hand, when we add sustainability as a consideration, different options present themselves, each of which involves a tradeoff. One of the options is to remove from the topology Router 3 and associated links, and shutdown links and optics in two or three of the parallel links. This will allow to conserve energy that will no longer be required to operatore Router 3 and the affected links. Another option is to completely shutdown all the parallel links and route traffic through Router 3 (i.e., not maximizing performance alone, but maximizing at the time performance, availability and resiliency, and sustainability.) The choice between these two options, as well as the option to stick with the original topology, will depend on choices that the network operator will have to make, taking into account the aggregate sustainability metrics of network elements in each of the two topologies as well as the effect these choices will have on availability/resiliency which will be reduced as a result.¶
Another option is to use flexible Ethernet, where the five links combined are aggregated into one active virtual link which has 15 Gbps, and another inactive link of 35 Gbps of capacity -- although a physical link cannot be half-active and half-inactive as far as PHY and optics are concerned.¶
When we add sustainability considerations, resiliency is not the single objective to optimize.¶
There are many methods to improve network resiliency, including a design eliminating single-points-of-failure, performing software safe-release selections and upgrades, deploying network real-time testing systems (including operations, administration, and maintenance (OAM) tools, monitoring systems (e.g., [RFC8403]), chaos-based testing, and site reliability engineering (SRE) principles), and utilizing redundancy across network elements as well as across a topology. Each one of these methods incurs also a sustainability cost. Yet, the functions for resiliency improvement and sustainability cost for each of these methods are not the same. Considering sustainability means quantifying its impact in the decision of how to improve resiliency, and how much is needed.¶
Let's first explore redundancy. For example, consider the ratio of overall network capacity (in bandwidth, compute power, etc.) over the used network capacity, and let's call it "Redundancy Index". If this number is one, there's no redundancy; and as the ratio grows, so does the potentially unused capacity that could be utilized in a failure event. Similarly, consider the values of sustainability metrics for when the Redundancy Index is one and for when it is two. These border points might give an indication of the slope for each objective function.¶
Please note that "justified redundancy" may be higher than "adequate redundancy" when we manage to organize the redundancy in a multi-layer fashion: (1) capacity that is "always on" and always uses energy; (2) capacity that can turn on quickly when needed; (and possibly (3) capacity that is "on the shelf" (even in the box) but ready to be deployed quickly when needed.)¶
Network performance and Quality of Experience have always played an important role in the development of networking technology. Key parameters such as latency, jitter, and loss as well as their impact on the quality that the users of networked applications experience are well understood, the optimization of those parameters and the adherence to corresponding service level objectives being an important goal in most network deployments. However, the desire to improve sustainability and energy efficiency can conflict with those goals. For example, in order to ensure minimal latency, a network operator may need to provision additional paths that require additional ports that need to be powered, instead of relying on a topology with fewer links and nodes. Such a topology might result in greater power efficienc as a result more resources being shared, but it could also result in longer paths and an increased possibility for congestion, both of which would be detrimental to latency and associated Quality of Experience.¶
This implies a tradeoff between different goals. The challenge for operators lies in finding the sweet spot in which acceptable network performance is obtained and a point of diminishing returns is reached at which any incremental further performance improvement would come at the expense of significant deterioration in energy usage.¶
The networking industry is in the starting phases of addressing this objective. We are seeing a sprinkling of sustainability features across the networking stack and components of devices, whether it is on forwarding chips, power supplies, optics, and compute. Many of those optimizations and features are typically local in nature, and widely scattered across different elements of a network architecture. An opportunity for maximizing the positive environmental impact of these technologies calls for a more cohesive and complementary view that spans the complete product lifecycle for hardware and software, as well as how some of these features work in unison.¶
For example, features that provide energy saving modes for devices can be dynamically managed when the network utilization is such that performance would not significantly suffer. A core router, instead of becoming obsolete due to the need for higher throughput in the core, could become a future edge/access router. That is an example of reuse and repurpose, before recycling. There are benefits of macro-optimizations by clustering in specific features, versus micro-optimizing locally without awareness of the network context.¶
Many of the subtleties and nuances of the measurement of GHG and environmental impacts stem from the very important distinction between attributional and consequential models. Detailed definitions can be found at [UNEP-LCA].¶
While both models are quite different, they do use the same terms and frames of references, measures, and language. Without explicit clarifications, they are prone to confusion.¶
For example, measuring the carbon footprint attributed to a batch process or a workload based on its energy efficiency would not consider that the hardware is still there running. When is it most effective to charge battery-powered devices, during the night when there's less load, or during the day when there's solar energy? In other words, if a person who commutes by train to their office five days a week starts working from home two days a week, there could be an attributional reduction of GHG emissions, yet the train still continues running equally. However, if that person commutes by combustion-engine car alone, the consequences are different.¶
Considering the attributional versus consequential distinction, there are some implications and a potential corollaries:¶
For an environmental-impact analysis, it is critical to explicitly cite the model used, as well as clearly define the boundary.¶
The activities that we embark upon as internetworking and protocol designers - including the ones targeting reduction of negative environmental impacts - have an energy footprint of themselves.¶
"Do no harm" in the context of improving sustainability of networks is to look beyond bounded attributions and consider (both intended and unintended) consequences.¶
Deployment and operational aspects play a critical role in making networks more sustainable. A detailed explanation of that role, the associated challenges, as well as an outline of solution approaches is provided in [I-D.irtf-nmrg-green-ps]. Here are some areas in which network management can help make networks more sustainable; for a more extensive treatment, please refer to that document.¶
In addition to those aspects, perhaps the most important role of network management is to provide network operators with the necessary visibility into how and where power is used in their network. This is required in order to assess where the network stands in terms of sustainability. It also allows to track progress over time, compare different alternatives for their effectiveness, and generally to facilitate network sustainability optimization. Providing this visibility requires the definition of metrics and corresponding instrumentation of the network so that those metrics can be monitored, assessed, compared, and improved.¶
The architectural considerations for environmental sustainability cannot always be achieved at the same time and we expect the following high level phases:¶
Visibility: In this phase we focus on the measurement and collection of metrics.¶
Insights and Recommendations: In this phase we focus on deriving insights and providing recommendations that can be acted upon manually over large time scales.¶
Self-Optimization via Automation: In this phase we build awareness into the systems to automatically recognize opportunities for improvement and implement them.¶
Visibility represents collecting and organizing data in a standard vendor agnostic manner. The first step in improving our environmental impact is to actually measure it in a clear and consistent manner. The IETF, IRTF and the IAB have a long history of work in this field, and this has greatly helped with the instrumentation of network equipment in collecting metrics for network management, performance, and troubleshooting. On the environmental-impact side though, there has been a proliferation of a wide variety of vendor extensions based on these standards. Without a common definition of metrics across the industry and widespread adoption we will be left with ill-defined, potentially redundant, proprietary, or even contradicting metrics. Similarly, we also need to work on standard telemetry for collecting these metrics so that interoperability can be achieved in multi-vendor networks.¶
Once the metrics have been collected, categorized, and aggregated in a common format, it would be straightforward to visualize these metrics and allow consumers to draw insights into their GHG and energy impact. The visualizations could take the form of high-level dashboards that provide aggregate metrics and potentially some form of maturity continuum. We think this can be accomplished using reference implementations of the standards developed in "Phase 1: Visibility". We do expect vendors and other open projects to customize this and incorporate specific features. This will allow identifying sources of environmental impact and address any potential issues through operational changes, creation of best-practices, and changes towards a greener, more environmentally friendly equipment, software, platforms, applications, and protocols.¶
Manually making changes as mentioned in "Phase 2: Insights and Recommendations" works for changes needed on large timescales but does not scale to improvements on smaller scales (i.e., it is impractical in many levels for an operator to be looking at a dashboard monitoring usage and making changes in real-time 24x7). There is a need to provision some amount of self-awareness into the network itself, at various layers, so that it can recognize opportunities for improvement and make those changes and measure the effects by closing the loop. The goals of the consumers can be stated in a declarative fashion, and the networks can continually use mechanisms such as machine learning (ML), deep learning (DL), and artificial intelligence (AI) with an additional goal to optimize for improvements in the environmental impact. These include, for example:¶
Discovery and advertisement of networking characteristics that have either direct or indirect environmental impact,¶
greener networking protocols that can move traffic onto more energy efficient paths, directing topological graphs to optimize environmental impacts, and¶
protocols that can instruct equipment to move under-utilized links and devices into low-energy modes.¶
The three phases run in an iterative fashion, such that after phases 1, 2, and 3 are completed for an interation, there will be an added awareness of what (else) to collect back to phase 1.¶
Further, sustainability-aware self-optimization is something to explore in Autonomic Networking.¶
Sustainable practices offer many environmental, economic, and social benefits, and security is a route to sustainability rather than a hurdle to clear.¶
The creation of sustainability features for an element or a system should not weaken or compromise their security posture, nor should it increase the surface of attack or create attack vectors.¶
Sustainability metrics and data models ought to describe how to secure the sustainability data exposed and surfaced through telemetry.¶
Sustainability control capabilities, as for example for power management, should consider potential attacks leveraging those controls. Setting a device on low-power or power-save modes during peak traffic can be a denial-of-service attack vector, negatively impacting end-to-end services.¶
The development of security features should, in turn, balance the environmental impact and sustainability considerations detailed in this document.¶
Computational increase on cryptographic operations can result in higher power use. Since generally the increase of energy required is not linear with the increase of computational complexity, there's a desire to satisfy security requirements while minimizing environmental impact.¶
Proof-of-Work schemes' and AI models' energy consumption also grows non-linearly as a function of the precision achieved. In these, perfect is the enemy of good, and bounding precision through specifications supports sustainable compute considerations.¶
This document is created greatly leveraging ideas and text from [I-D.cparsk-eimpact-sustainability-considerations], and consequently acknowledges all the many contributions that improved it.¶