Network File System Version 4 T. Haynes Internet-Draft T. Myklebust Intended status: Standards Track Hammerspace Expires: 8 June 2025 5 December 2024 Add LAYOUT_WCC to NFSv4.2's Flex File Layout Type draft-ietf-nfsv4-layoutwcc-05 Abstract The Parallel Network File System (pNFS) Flexible File Layout allows for a file's metadata (MDS) and data (DS) to be on different servers. It does not provide a mechanism for the data server to update the metadata server to changes to the data part of the file. The client has knowledge of such updates, but lacks the ability to update the metadata server. This document presents a refinement to RFC8435 to allow the client to update the metadata server to changes on the data server. Note This note is to be removed before publishing as an RFC. Discussion of this draft takes place on the NFSv4 working group mailing list (nfsv4@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group information can be found at https://datatracker.ietf.org/wg/nfsv4/ about/. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 8 June 2025. Haynes & Myklebust Expires 8 June 2025 [Page 1] Internet-Draft LAYOUT_WCC December 2024 Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. Weak Cache Consistency (WCC) . . . . . . . . . . . . . . . . 4 3. Operation 77: LAYOUT_WCC - Layout Weak Cache Consistency . . 5 3.4. Implementation . . . . . . . . . . . . . . . . . . . . . 6 3.4.1. Examples of when to use LAYOUT_WCC . . . . . . . . . 6 3.4.2. Examples of what to send in the LAYOUT_WCC . . . . . 7 3.5. Allowed Errors . . . . . . . . . . . . . . . . . . . . . 8 3.6. Extension of Existing Implementations . . . . . . . . . . 8 3.7. Flex Files Layout Type . . . . . . . . . . . . . . . . . 8 4. Extraction of XDR . . . . . . . . . . . . . . . . . . . . . . 10 4.1. Code Components Licensing Notice . . . . . . . . . . . . 10 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.1. Normative References . . . . . . . . . . . . . . . . . . 11 7.2. Informative References . . . . . . . . . . . . . . . . . 12 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 1. Introduction In the Network File System version 4 (NFSv4) with a Parallel NFS (pNFS) Flexible File Layout (see Section 12 of [RFC8435]) server, there is no mechanism for the data servers to update the metadata servers for when the data portion of the file is modified. The metadata server needs this knowledge to correspondingly update the metadata portion of the file. If the client is using NFSv3 as the protocol with the data server, it can leverage weak cache consistency (WCC) to update the metadata server of the attribute changes. In this document, we introduce a new operation called LAYOUT_WCC to Haynes & Myklebust Expires 8 June 2025 [Page 2] Internet-Draft LAYOUT_WCC December 2024 NFSv4.2 which allows the client to periodically report the attributes of the data files to the metadata server. Using the process detailed in [RFC8178], the revisions in this document become an extension of NFSv4.2 [RFC7862]. They are built on top of the external data representation (XDR) [RFC4506] generated from [RFC7863]. 1.1. Definitions See Section 1.1 of [RFC8435] for a fuller set of definitions. (file) data: that part of the file system object that contains the data to be read or written. It is the contents of the object rather than the attributes of the object. data server (DS): a pNFS server that provides the file's data when the file system object is accessed over a file-based protocol. (file) metadata: the part of the file system object that contains various descriptive data relevant to the file object, as opposed to the file data itself. This could include the time of last modification, access time, EOF position, etc. metadata server (MDS): the pNFS server that provides metadata information for a file system object. storage device: the target to which clients may direct I/O requests when they hold an appropriate layout. Note that each data server is a storage device but that some storage device are not data servers. (See Section 2.1 of [RFC8434] for a discussion on the difference between a data server and a storage device.) weak cache consistency (WCC): In NFSv3, WCC allows the client to check for file attribute changes before and after an operation (See Section 2.6 of [RFC1813]). 1.2. Requirements Language The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Haynes & Myklebust Expires 8 June 2025 [Page 3] Internet-Draft LAYOUT_WCC December 2024 2. Weak Cache Consistency (WCC) A layout type for pNFS enables the metadata server to tell the client both the storage protocol and location of data to be used to communicate with the storage devices. The Flex Files Layout Type (in [RFC8435]) details how NFSv3 data servers can be accessed. The client is only allowed to perform NFSv3 READ (see Section 3.3.6 of [RFC1813]), WRITE (see Section 3.3.6 of [RFC1813]), and COMMIT (see Section 3.3.21 of [RFC1813]) operations on the file handles provided in the layout. I.e., the client is only allowed to use NFSv3 operations which directly act on the data portion of the data file. Because there is no contol protocol (see [RFC8434]) possible with all data servers, NFSv3 is used as the control protocol. As such, the NFSv3 CREATE (see Section 3.3.8 of [RFC1813]), GETATTR (see Section 3.3.1 of [RFC1813]), and SETATTR (see Section 3.3.2 of [RFC1813]) are operations commonly used by the metadata server. I.e., the metadata server is only allowed to use NFSv3 operations which directly act on the metadata portion of the data file. GETATTR allows the metadata server to mainly retrieve the mtime (modify time), ctime (change time), and atime (access time). The metadata server can use this information to determine if the client modified the file whilst it held an iomode of LAYOUTIOMODE4_RW (see Section 3.3.20 of [RFC8881]). Then it can determine the time_modify (see Section 5.8.2.43 of [RFC8881]), time_metadata (see Section 5.8.2.42 of [RFC8881]), and time_access (see Section 5.8.2.37 of [RFC8881]) for the metadata file. I.e., the information to return to clients in a NFSv4.2 GETATTR response. For example, the metadata server would issue a NFSv3 GETATTR to the data server. This query is most likely triggered in response to a NFSv4 GETATTR issued by a client to the metadata server. Not only are these NFSv3 GETATTRs to the data server individually expensive, the data server can become inundated by a storm of such requests. NFSv3 solved a similar issue by having the READ and WRITE operations employ a post-operation attribute to report the weak cache consistency (WCC) data (See Section 2.6 of [RFC1813]). Each NFSv3 operation corresponds to one round trip between the client and server. So a WRITE followed by a GETATTR would require two round trips. In that scenario, the attribute information retrieved is considered to be strict server-client consistency. For NFSv4, the WRITE and GETATTR can be issued together inside a compound, which only requires one round trip between the client and server. And this is also considered to be a strict server-client consistency. In essence, the NFSv4 READ and WRITE operations drop the post-operation attributes, allowing the client to decide if it needs that information. Haynes & Myklebust Expires 8 June 2025 [Page 4] Internet-Draft LAYOUT_WCC December 2024 Whilst NFSv4 got rid of the requirement for WCC information to be supplied by the WRITE or READ operations, the introduction of pNFS re-introduces the same problem. The metadata server has to communicate with the data server in order to get at the data which could be provided by a WCC model. With the flexible file layout type, the client can leverage the NFSv3 WCC to service the proxying of times (See Section 4 of [I-D.ietf-nfsv4-delstid]). But the granularity of this data is limited. With client side mirroring (See Section 8 of [RFC8435]), the client has to aggregate the N mirrored files in order to send one piece of information instead of N pieces of information. Also, the client is limited to sending that information only when it returns the delegation. In this document, we present a new NFSv4.2 operation called LAYOUT_WCC, which allows the client to update the metadata server with information from the data server. The client is responsible for taking the NFSv3 WCC information (which is returned by the 3 operations it is allowed to use) and pass that back to the metadata server in the NFSv4.2 attributes. The metadata server MAY then avoid costly NFSv3 GETATTR calls to the data servers. As this is a weak model, the metadata server MAY make such calls anyway in order to strengthen the model. 3. Operation 77: LAYOUT_WCC - Layout Weak Cache Consistency 3.1. ARGUMENT /// struct LAYOUT_WCC4args { /// stateid4 lowa_stateid; /// layouttype4 lowa_type; /// opaque lowa_body<>; /// }; stateid4 is defined in Section 3.3.12 of [RFC8881]. layouttype4 is defined in Section 3.3.13 of [RFC8881]. 3.2. RESULT /// struct LAYOUT_WCC4res { /// nfsstat4 lowr_status; /// }; Haynes & Myklebust Expires 8 June 2025 [Page 5] Internet-Draft LAYOUT_WCC December 2024 nfsstat4 is defined in Section 3.2 of [RFC8881]. 3.3. DESCRIPTION The current filehandle and the lowa_stateid identifies the particular layout for the LAYOUT_WCC operation. The lowa_type indicates how to unpack the layout type specific payload inside the lowa_body field. The lowa_type is defined to be a value from the IANA registry for 'pNFS Layout Types Registry'. The lowa_body will contain the data file attributes. The client will be responsible for mapping the NFSv3 post-operation attributes to those in a fattr4. Just as the post-operation attributes may be ignored by the client, the server may ignore the attributes inside the LAYOUT_WCC. But the server can also use those attributes to avoid querying the data server for the data file attributes. Note that as these attributes are optional and there is nothing the client can do if the server ignores one, there is no need to return a bitmap4 of which attributes were accepted in the result of the LAYOUT_WCC. 3.4. Implementation 3.4.1. Examples of when to use LAYOUT_WCC The only way for the metadata server to detect modifications to the data file is to probe the data servers via a GETATTR. It can compare the mtime results across multiple calls to detect a NFSv3 WRITE operation by the client. Likewise, the atime results indicate the client having issued a NFSv3 READ operation. As such, the client can leverage the LAYOUT_WCC operation whenever it has the belief that the metadata server would need to refresh the attributes of the data files. While the client can send a LAYOUT_WCC at any time, there are times it will want to do this operation in order to avoid having the metadata server issue NFSv3 GETATTR requests to the data servers: * Whenever it sends a GETATTR for any of the following attributes: size (see Section 5.8.1.5 of [RFC8881]), space_used (see Section 5.8.2.25 of [RFC8881]), change (see Section 5.8.1.4 of [RFC8881]), time_access (see Section 5.8.2.37 of [RFC8881]), time_metadata (see Section 5.8.2.42 of [RFC8881]), and time_modify (see Section 5.8.2.43 of [RFC8881]). Haynes & Myklebust Expires 8 June 2025 [Page 6] Internet-Draft LAYOUT_WCC December 2024 * Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or LAYOUTERROR - it could have already gotten the NFSv3 uid and gid values back in the WCC of the WRITE, READ, or COMMIT operation which got the error. Thus it could report that information back to the metadata server, saving it from querying that information via a NFSv3 GETATTR. * Whenever it sends a SETATTR to refresh the proxied times (See Section 4 of [I-D.ietf-nfsv4-delstid]) - the metadata server is going to want to correlate these times in order to detect later modification to the data file. 3.4.2. Examples of what to send in the LAYOUT_WCC The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT are a smaller subset of what can be transmitted as a NFSv4 attribute. The mapping of NFSv3 to NFSv4 attributes is shown in Table 1. The LAYOUT_WCC MUST provide all of these attributes to the metadata server. Both the uid and gid are stringified into their respective attributes of owner and owner_group. The reason to provide these two attributes is in case of NFS4ERR_ACCESS, the metadata server can compare what it expects the values of the uid and gid of the data file to be versus the actual values. It can then repair the permissions as needed or modify the expected values it has cached. +=================+===================+ | NFSv3 Attribute | NFSv4.2 Attribute | +=================+===================+ | size | size | +-----------------+-------------------+ | used | space_used | +-----------------+-------------------+ | mode | mode | +-----------------+-------------------+ | uid | owner | +-----------------+-------------------+ | gid | owner_group | +-----------------+-------------------+ | atime | time_access | +-----------------+-------------------+ | mtime | time_modify | +-----------------+-------------------+ | ctime | time_metadata | +-----------------+-------------------+ Table 1: NFSv3 to NFSv4.2 Attribute Mappings Haynes & Myklebust Expires 8 June 2025 [Page 7] Internet-Draft LAYOUT_WCC December 2024 3.5. Allowed Errors The LAYOUT_WCC operation can raise the errors in Table 2. When an error is encountered, the metadata server can decide to ignore the entire operation or depending on the layout type specific payload, it could decide to apply a portion of the payload. Note that there are no new errors introduced for the LAYOUT_WCC operation and the errors in Table 2 are each defined in Section 15.1 of [RFC8881]. Table 2 can be considered as an extension of Section 15.2 of [RFC8881]. +============+====================================================+ | Operation | Errors | +============+====================================================+ | LAYOUT_WCC | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | NFS4ERR_INVAL, NFS4ERR_ISDIR, NFS4ERR_MOVED, | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | | NFS4ERR_NO_GRACE, NFS4ERR_OLD_STATEID, | | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, NFS4ERR_REQ_TOO_BIG, | | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | | | NFS4ERR_WRONG_TYPE | +------------+----------------------------------------------------+ Table 2: Operations and Their Valid Errors 3.6. Extension of Existing Implementations The new LAYOUT_WCC operation is OPTIONAL for both NFSv4.2 ([RFC7863]) and the flexible file layout type ([RFC8435]). 3.7. Flex Files Layout Type Haynes & Myklebust Expires 8 June 2025 [Page 8] Internet-Draft LAYOUT_WCC December 2024 /// struct ff_data_server_wcc4 { /// deviceid4 ffdsw_deviceid; /// stateid4 ffdsw_stateid; /// nfs_fh4 ffdsw_fh_vers<>; /// fattr4 ffdsw_attributes; /// }; /// /// struct ff_mirror_wcc4 { /// ff_data_server_wcc4 ffmw_data_servers<>; /// }; /// /// struct ff_layout_wcc4 { /// ff_mirror_wcc4 fflw_mirrors<>; /// }; The flex file layout type specific results MUST correspond to the ff_layout4 data structure as defined in Section 5.1 of [RFC8435]. There MUST be a one-to-one correspondence between: * ff_data_server4 -> ff_data_server_wcc4 * ff_mirror4 -> ff_mirror_wcc4 * ff_layout4 -> ff_layout_wcc4 Each ff_layout4 has an array of ff_mirror4, which have an array of ff_data_server4. Based on the current filehandle and the lowa_stateid, the server can match the reported attributes. But the positional correspondence between the elements is not sufficient to determine the attributes to update. Consider the case where a layout had three mirrors and two of them had updated attributes, but the third did not. A client could decide to present all three mirrors, with one mirror having an attribute mask with no attributes present. Or it could decide to present only the two mirrors which had been changed. In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and ffdsw_fh_vers will uniquely identify the attributes to be updated. All three arguments are required. A layout might have multiple data files on the same storage device, in which case the ffdsw_deviceid and ffdsw_stateid would match, but the ffdsw_fh_vers would not. The ffdsw_attributes are processed similar to the obj_attributes in the SETATTR arguments (See Section 18.34 of [RFC8881]). Haynes & Myklebust Expires 8 June 2025 [Page 9] Internet-Draft LAYOUT_WCC December 2024 4. Extraction of XDR This document contains the external data representation (XDR) [RFC4506] description of the new open flags for delegating the file to the client. The XDR description is embedded in this document in a way that makes it simple for the reader to extract into a ready-to- compile form. The reader can feed this document into the following shell script to produce the machine-readable XDR description of the new flags: #!/bin/sh grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' That is, if the above script is stored in a file called 'extract.sh', and this document is in a file called 'spec.txt', then the reader can do: sh extract.sh < spec.txt > layout_wcc.x The effect of the script is to remove leading white space from each line, plus a sentinel sequence of '///'. XDR descriptions with the sentinel sequence are embedded throughout the document. Note that the XDR code contained in this document depends on types from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This includes both nfs types that end with a 4, such as offset4, length4, etc., as well as more generic types such as uint32_t and uint64_t. While the XDR can be appended to that from [RFC7863], the various code snippets belong in their respective areas of that XDR. 4.1. Code Components Licensing Notice Both the XDR description and the scripts used for extracting the XDR description are Code Components as described in Section 4 of 'Legal Provisions Relating to IETF Documents' [LEGAL]. These Code Components are licensed according to the terms of that document. 5. Security Considerations There are no new security considerations beyond those in [RFC7862]. Haynes & Myklebust Expires 8 June 2025 [Page 10] Internet-Draft LAYOUT_WCC December 2024 6. IANA Considerations This section is to be removed before publishing as an RFC. There are no IANA considerations for this document. 7. References 7.1. Normative References [I-D.ietf-nfsv4-delstid] Haynes, T. and T. Myklebust, "Extending the Opening of Files in NFSv4.2", Work in Progress, Internet-Draft, draft-ietf-nfsv4-delstid-08, 2 October 2024, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4506] Eisler, M., Ed., "XDR: External Data Representation Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 2006, . [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, November 2016, . [RFC7863] Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 External Data Representation Standard (XDR) Description", RFC 7863, DOI 10.17487/RFC7863, November 2016, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, . [RFC8434] Haynes, T., "Requirements for Parallel NFS (pNFS) Layout Types", RFC 8434, DOI 10.17487/RFC8434, August 2018, . Haynes & Myklebust Expires 8 June 2025 [Page 11] Internet-Draft LAYOUT_WCC December 2024 [RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, . [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 8881, DOI 10.17487/RFC8881, August 2020, . 7.2. Informative References [LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents", November 2008, . [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, DOI 10.17487/RFC1813, June 1995, . Appendix A. Acknowledgments Dave Noveck, Tigran Mkrtchyan, and Rick Macklem provided reviews of the document. Authors' Addresses Thomas Haynes Hammerspace Email: loghyr@gmail.com Trond Myklebust Hammerspace Email: trondmy@hammerspace.com Haynes & Myklebust Expires 8 June 2025 [Page 12]