Internet-Draft Lossless WAN Use Cases and Requirements March 2024
Huang, et al. Expires 5 September 2024 [Page]
Workgroup:
Networking
Internet-Draft:
draft-huang-rtgwg-wan-lossless-uc-00
Published:
Intended Status:
Standards Track
Expires:
Authors:
H. Huang, Ed.
Huawei
T. He
China Unicom
T. Zhou
Huawei

Use Cases and Requirements for Implementing Lossless Techniques in Wide Area Networks

Abstract

This document outlines the use cases and requirements for implementing lossless data transmission techniques in Wide Area Networks (WANs), motivated by the increasing demand for high-bandwidth and reliable data transport in applications such as high-performance computing (HPC), genetic sequencing, and audio/video production. The challenges associated with existing data transport protocols in WAN environments are discussed, along with the proposal of requirements for enhancing lossless transmission capabilities to support emerging data-intensive applications.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 5 September 2024.

Table of Contents

1. Introduction

The big data is the very foundation of innovation across numerous fields. From high-performance computing (HPC) in scientific research to the latest advancements in genetic sequencing and the production of high-definition multimedia content, the need for rapid, reliable, and lossless data transmission across wide area networks (WANs) has never been more critical. Traditional network protocols, designed in an era before these immense data demands, struggle to keep up, particularly when it comes to ensuring zero data loss over long distances.

This document focuses on the pressing need for lossless data transmission techniques in WANs, driven by the requirements of data-intensive applications that form the backbone of scientific, medical, and creative industries. For example, the Energy Sciences Network (ESnet) [ESnet] supports vast amounts of scientific data movement that underpin groundbreaking research. Similarly, in the healthcare sector, the explosion of data from genetic sequencing calls for unprecedented levels of data transmission reliability and efficiency. The media and entertainment industry also faces challenges in moving large volumes of raw content with stable network instead of manual tranportation of physical storage.

These scenarios underscore a growing disconnect between the capabilities of existing WAN protocols and the evolving demands of modern applications. The challenges of ensuring zero-loss transmission in an infrastructure not originally designed for such demands highlight the need for new solutions.

This document aims to shed light on the necessity for advanced lossless transmission technologies in WANs. By identifying the limitations of current network protocols and outlining the requirements for new developments, we hope to pave the way for a new generation of WANs. These networks will not only meet the current demands of data-intensive applications but will also support the next wave of digital innovation.

2. Use Cases

The necessity for implementing lossless data transmission techniques in Wide Area Networks (WANs) is underscored by several critical application areas. These use cases highlight the imperative for reliable, high-speed data transfer capabilities to support the demanding requirements of modern data-intensive operations.

2.1. High-Performance Computing (HPC) Services for Scientific Research

High-Performance Computing (HPC) services are fundamental to scientific advancements, where collaborative efforts across various geographical regions are commonplace. For instance, the study of PSII proteins, which are crucial for understanding how water molecules split to produce oxygen, generates between 30 to 120 high-resolution images per second during experiments. This results in 60-100 GB of data every five minutes, necessitating rapid and lossless data transfer from the National Renewable Energy Laboratory's equipment back to analysis labs such as the Lawrence Berkeley National Laboratory. The efficiency and reliability of WANs in this context are not just beneficial but essential for facilitating the seamless collaboration between scientists in different domains, enabling them to share and analyze large datasets effectively.

2.2. Rapid Transmission Services for Genetic Sequencing for Timely Medical Services

The field of genetic sequencing has seen exponential growth, driven by the decreasing costs and widespread application of sequencing technologies. This growth is matched by the burgeoning data volumes generated, which require efficient and lossless transmission to cloud or private data centers for analysis. For example, sequencing a single human genome produces 100GB to 200GB of data. With daily data production rates reaching 6TB to 12TB and annual data management needs surpassing 1.6PB, the demand for high-speed, reliable data transfer is evident. The existing network transfer efficiencies present significant bottlenecks, extending the turnaround times for sequencing services and impacting the timely delivery of precision medicine.

2.3. Stable Transmission Services for Large-Scale Audio/Video Data Migration

The competitive landscape of the audio and video industry, coupled with the shift towards cloud-based post-production processes, necessitates the transfer of large volumes of raw footage across WANs. Traditional methods of data transportation, involving physical media and manual transfer, are not only time-consuming but also inefficient. For instance, film crews generating 2TB of data daily resort to physically moving storage media to processing locations, a process that significantly delays the production cycle and weakens market responsiveness. The requirement for a network infrastructure capable of handling such extensive data transfers quickly and without loss is critical for maintaining the pace of production and ensuring the quality of the final multimedia content.

3. Problem Analysis and Goal

3.1. Problem Analysis

The primary objective in the realm of Wide Area Networks (WANs) is to provide long-term, stable, and high-capacity network services that can accommodate the sudden surges in data transmission demands, essential for data migration across diverse geographical locations. This goal is predicated on leveraging the inherent statistical multiplexing advantage of IP networks, which allows for cost-effective bandwidth allocation and enhanced overall network throughput. The ability to meet these data transmission requirements efficiently is crucial for supporting the backbone of today’s data-driven applications, ranging from scientific research to global financial transactions and multimedia content delivery.

Despite the advantages of statistical multiplexing in IP networks, such as cost reduction and throughput optimization, this model introduces significant challenges in ensuring absolute resource guarantee and, consequently, zero packet loss. The practice of overprovisioning bandwidth, common among service providers, does not equate to lossless data transmission, which is a critical shortfall when compared to dedicated light networks or resources with hard isolation.

3.1.1. Impact of Packet Loss

In the scenarios outlined for data migration—whether for high-performance computing services, genetic sequencing, or audio/video data migration—the reliance on traditional transmission protocols like TCP or RDMA [RoCEv2] is common. However, both protocols are adversely affected by packet loss, especially over long-haul transmissions.

For TCP, algorithms such as CUBIC, a loss-based congestion control mechanism, see a dramatic throughput decline of up to 89.9% with just a 2% packet loss when the Round-Trip Time (RTT) is 30ms. BBR, another TCP congestion control that bases on bandwidth and delay, also suffers significantly when packet loss exceeds 5%, with throughput plummeting in scenarios where packet loss reaches 20%. The cost of retransmissions in these conditions is notably high, with slight packet loss (<1%) scenarios showing a retransmission rate 6-10 times higher than CUBIC, and in severe packet loss scenarios, the rate can increase exponentially.

RDMA, often used within data centers for inter-node data access over UDP, relies on a goBackN retransmission mechanism. Its throughput dramatically decreases with packet loss rates greater than 0.1%, and a 2% packet loss rate effectively reduces throughput to zero. To maintain unaffected throughput, the packet loss rate must be kept below one in a hundred thousand.

These challenges underscore a critical gap in the current capabilities of IP networks to support the demanding requirements of modern, data-intensive applications. The inability to ensure zero packet loss across WANs not only impacts application performance but also limits the potential for innovation and collaboration across key sectors reliant on rapid and reliable data transmission.

3.2. Goal

The overarching goal in the evolution of Wide Area Networks (WANs) to serve the afore-mentioned use cases is to enable lossless, zero-packet-loss transmission services tailored for the seamless migration of data across different geographical areas. In an age where digital data's volume, velocity, and variety are expanding exponentially, ensuring the lossless transmission of this data during inter-regional migration activities becomes indispensable. This is critically important for applications and operations that rely on the integrity and timeliness of data, such as AI/HPC computing and data backup and recovery.

4. Challenges and Requirements

The quest for lossless data transmission in Wide Area Networks (WANs) is confronted with significant challenges, notably the phenomenon of elephant flows—large, bursty data transfers that can cause instantaneous congestion and packet loss within network device queues. This not only increases application latency but also diminishes throughput, adversely affecting application performance. In data centers, certain lossless technologies are deployed to enhance the performance of such applications:

However, the application of these data center-oriented lossless techniques to WANs encounters obstacles due to the larger scale and longer RTTs inherent in WAN environments. Challenges and corresponding requirements arise such as:

These challenges underscore the need for tailored solutions that address the unique demands and conditions of WANs. By adapting and innovating on existing lossless transmission technologies from data center networks, the goal of achieving zero packet loss in WANs becomes attainable, paving the way for enhanced data mobility and application performance.

5. Security Considerations

TBD.

6. IANA Considerations

TBD.

7. Informative References

[RFC3168]
Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, , <https://www.rfc-editor.org/rfc/rfc3168>.
[RoCEv2]
"Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A17 - RoCEv2 (IP routable RoCE).", n.d..
[DCQCN]
et.al., Y. Z., "Congestion Control for Large-Scale RDMA Deployments", , <https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p523.pdf>.
[PFC]
"IEEE Standard for Local and metropolitan area networks--Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks--Amendment 17- Priority-based Flow Control", n.d..
[ESnet]
"Energy Sciences Networks", n.d..

Appendix A. Appendix-title

Acknowledgements

TBD.

Contributors

TBD.

Authors' Addresses

Hongyi Huang (editor)
Huawei
Beijing
China
Tao He
China Unicom
Tianran Zhou
Huawei