5-6  FILE PORTING AND FTP 
 *************************
 Files may be transferred between machines:

    1) Over some network, e.g. using the FTP protocol (TCP/IP), 
       Kermit (usually serial line but necessarily). 

    2) Over some network, archived by a program that adds some 
       file-system specific info, e.g. ZIP (VMS), BACKUP (VMS).

       This is important for files created on a record-oriented
       file-system, where files have an internal structure,
       and the operating system keeps the relevant info.

    3) Using standard ANSI magnetic tape

 Porting formatted files between different machines using FTP is usually 
 no problem, the FTP protocol automatically performs the few needed 
 conversions (ASCII/EBCDIC, record structure conventions).  

 Unformatted files, are very machine dependent, and the FTP protocol 
 doesn't support the required conversions, so porting them between 
 different machines may be very difficult. 

 Some relevant information the purpose of file porting:

    Hardware            Floats      Endianity   Unformatted   Control 
    -----------------   ------      ---------   ------------  -------
    Sun UNIX            IEEE        BIG         Variable        4
    IRIX                IEEE        BIG         Variable        4
    CRAY                CRAY        BIG         
    DEC VAX             DEC         LITTLE      Segmented       2+2
    DEC ALPHA           IEEE+DEC    LITTLE      Variable        4
    IBM PC compatibles  IEEE        LITTLE      
    IBM mainframes      IBM         BIG         

 DEC compilers provide good options for the conversion of unformatted 
 files between different platforms.  Sun provides conversion software.  
 These machines can be used as "conversion platforms" for others, 
 however, the best methods are:

   1) Modify the program that produced the unformatted file 
      to produce a formatted one, and run it on the original 
      machine or similar one. 

   2) Write a program that will read the unformatted file and
      write an equivalent formatted one, ON A MACHINE LIKE THE 
      ONE THAT WROTE IT, thus avoiding the machine-specific 
      complications discussed below.

      The translation program may use code excerpts from the 
      original program, or based on some knowledge of the 
      unformatted file's structure.

 Some people use the XDR routine library to solve the problem of 
 porting unformatted files.  They write and read files using the XDR 
 routines instead of Fortran I/O statements, but of course this is 
 not standard Fortran, and makes the programs less portable.

 By the way, HDF files are self-documenting and should be read with 
 HDF routines.  



 A short digression on FTP
 -------------------------
 The File Transfer Protocol (FTP), is usually used interactively by 
 invoking a program with that name. 

 Many of the transfer options proposed by Postel and Reynolds in RFC959 
 were not implemented, and FTP programs can properly handle only text 
 file transfers.  Binary transfers are properly handled only in the 
 simplest case, between two byte-oriented (e.g. UNIX) file-systems.

 FORTRAN require record-oriented files, on byte-oriented systems the 
 FORTRAN compiler has to support this requirement, it produces and 
 reads files with variable-length records.

 However, binary FTP transfers between a record-oriented system (e.g. VMS) 
 and a byte-oriented one are not supported, and all or some of the control 
 information of each record is discarded in one direction, and is passed 
 without proper translation in the other.

 FTP shortcomings can be worked around by proper modification in the 
 FORTRAN source code.  When writing files intended to be transferred 
 from a record-oriented system to a byte-oriented one, a count-field 
 value can be prefixed to each record.  In the other direction a routine 
 that understands the foreign record format should be used for reading.



 FTP of archived files
 ---------------------
 Archiving programs like the VMS version of ZIP (used with "-V") and
 the VMS BACKUP program store some control information of the file. 
 When the file is restored that information can be used.

 This is useful when transferring files between two VMS machines,
 via a UNIX one.



 Porting formatted files
 -----------------------
 This is relatively simple, possible problems are:

    1) Different character codes (EBCDIC on IBM mainframes,
       ASCII on all others).

    2) File type translations (Variable-size-records on VMS,
       some Stream type on almost all others). 

 Direct FTP can take care of both these problems, character codes are 
 transformed into a standard character set (standard 8-bit Network 
 Virtual Terminal-ASCII) before transmission and are transformed again
 to the local character set upon reception.

 Similarly, records are translated to a standard form (stream CR/LF) 
 before transmission and transformed to the local structure upon reception.
 
 It is recommended to use formatted files to transfer information 
 between different systems. The disadvantages are that the formatted 
 files are larger and some precision is lost on the radix translations.



 Porting unformatted files
 -------------------------
 Here the problems start:

    1) Different endianity (DEC machines and PCs are little
       endian, all else are big endian).

    2) Different integer sizes / float formats (integers have
       the same general format, most floats are now IEEE).

    3) Different character codes (EBCDIC on IBM mainframes,
       ASCII on all others).

    4) File type translations (Variable-size-records on VMS,
       some Stream type on others). 

    5) All the above problems are solvable in principle, 
       but if you don't know the layout of variables in 
       the unformatted file, you would have to guess it
       using too little information, with unsafe results.

       The required knowledge can be found in the source 
       of the program that wrote it, or in notes left by 
       the programmer(s).

 Problems #1-3 makes porting unformatted files content dependant,
 i.e. you need to know the contents of a file in order to port it.
 In the general case each variable has to be converted separately,
 so the converting program has to know in detail the layout of
 variables in the file.

 Provided you know the internal structure of the file, porting 
 unformatted files is less frightening than the above list of 
 problems suggests.  For example, UNIX workstations are compatible 
 except for the endianity problem.



 Reading general binary files from Fortran
 -----------------------------------------
 Sometimes you want to read the content of a file "as it is", and 
 bypass the logical structure. 

 The record-oriented Fortran I/O routines must consider files either 
 as formatted or unformatted, and in both cases they treat one or more 
 bytes at the end of each record as control information, not as data. 

 You may need this ability when you want to process non-ASCII files,
 e.g. files in one of the many graphics formats, or unformatted files 
 written on another machine.

 There is no portable solution to this problem, some possible solutions
 are:

    1) Some compilers support a special OPEN keyword: 

       STREAM       (VMS, Digital UNIX) 
       BINARY       (MS Powerstation)
       TRANSPARENT 

    2) Some UNIX compilers allow you to open a file 
       in DIRECT access mode, with RECL=1. 
       You can read then each byte by specifying 
       its location in the file.  

       Compilers suporting a DELETE statement for
       direct files (VMS with the default /VMS option, 
       Digital UNIX with the -vms option), expect a
       special flag located inside the record. 

       DEC uses the first byte of the record with 
       value equal to '@' (or NUL ASCII value 0).  
       To do the trick, the support for the DELETE 
       statement has to be disabled on VMS by the 
       /NOVMS compiler option, on digital UNIX the
       compiler option -vms should not be used. 

       On VMS you should use RECL=2, as RMS assumes
       all records are word aligned, file attributes 
       have to be modified by: 

         SET FILE/ATTRIBUTES=(RFM:FIX,LRL:2,MRS:2)

       When OPENing the file use: RECORDTYPE='FIXED'. 

    3) VMS offers in addition many special techniques: 

       o  Mapping the file to a memory area, 
          e.g. a common block, and reading it. 

       o  Low-level routines: RMS block-mode.

       o  Getting the file size, declaring it to 
          contain fixed-size records, and reading
          it with a buffering routine.



 Porting from a typical UNIX to Digital UNIX
 -------------------------------------------
 This is an easy case, The DEC Fortran compiler supports options
 that makes such porting easy (again, provided you know the internal
 structure of the file).

 Transfer the file to the DUNIX machine (I didn't use FTP as our 
 machines here share filesystems, but I think that FTP wouldn't 
 make a difference)

 The following conversion program assumes:

   1)  All variables are REAL*4 (can be modified)
   2)  There is no record with more than MAXREC records

      program convuf
      integer    MAXREC, BYTE2REAL
      parameter  (MAXREC = 100000, BYTE2REAL = 4)
      real       data(MAXREC)
      integer    count1, count2, i
C     ------------------------------------------------------------------
      open (unit =       10,
     &      file =       'unixfile',
     &      status =     'OLD', 
     &      form =       'UNFORMATTED',
     &      convert =    'BIG_ENDIAN',
     &      recordtype = 'STREAM')
      open (unit =       11,
     &      file =       'decfile',
     &      status =     'NEW',
     &      form =       'FORMATTED') 
C     ------------------------------------------------------------------
100   continue
      read(unit=10, end=999) count1, 
     &                       (data(i), i = 1, count1/BYTE2REAL), 
     &                       count2
C     write (*,*) '... ', count1, count2
      if (count1 .eq. count2) then
        write(unit=11,fmt=*) (data(i), i = 1, count1/BYTE2REAL)
      else
        write (*,*) ' something is wrong '
        write (*,*) ' prefix count is:   ', count1
        write (*,*) ' suffix count is:   ', count2
        stop ' '
      endif
      goto 100
C     ------------------------------------------------------------------
999   write (*,*) ' end of file reached '
      close (10)
      close (11)
      end

 The docs are not clear about the "RECORDTYPE" OPEN keyword, 
 DEC Fortran 90 docs are self-contradictory on this point.

 It seems that the keyword once meant to support text files 
 with records delimited by CR/LF, but evolved to support 
 non-record-oriented files.



 Endianity conversion
 --------------------


 Integer/Float format conversion
 -------------------------------


 Control information conversion
 ------------------------------
 If you have the program source you can do it with a few modifications,
 in the general case you'll need a conversion program. 

 1) Unformatted file from VMS to UNIX:

    On VMS you can use unformatted variable records if your records 
    are no longer than 32764 bytes, specify RECORDTYPE='VARIABLE' 
    in the OPEN statement, as the default for unformatted I/O is 
    'SEGMENTED'.

    FTP discards the the 2-byte count-field of the variable records,
    you can re-prefix (and re-suffix) the record length to the data 
    in each WRITE statement:

      INTEGER       RECLEN
      REAL          X, Y, Z
      ......................
      RECLEN = SIZEOF(X) + SIZEOF(Y) + SIZEOF(Z)
      WRITE (10) RECLEN, X, Y, Z, RECLEN

    If your unformatted records has to be longer, write each record 
    in parts, each one smaller than 32764 bytes, write the record 
    length in the beginning of the first part, and in the end of 
    the last part.

    An unrecommended option is using the C Run-Time-Library function 
    'write', it converts VAX/ALPHA little-endian longwords (4 bytes) 
    to big-endian. 'write' doesn't add the prefix & suffix count-fields, 
    and creates a stream/LF file, an unsuitable type. Include the unixio.h 
    and file.h standard headers, they contain the function prototype and
    associated argument constants. 


 2) Unformatted file from UNIX to VMS:

    FTP will create by default 512 bytes long fixed-length records.
    If the records are of the same known length, you may change the formal
    record-length to that value, without changing anything in the file.

    Use either:   SET FILE/ATTRIBUTES=(LRL:size) filespec   
    or Joe Meadows FILE utility, then use OPEN with RECORDTYPE='FIXED' 
    and RECL=size, read and ignore the count field (first 4 bytes).

    If the records are not the same length, you'll need a routine that
    can reconstruct the original structure.


 EBCDIC/ASCII conversion
 -----------------------


 File type conversion
 --------------------


 FTP options
 -----------
 FTP transfer options can be divided into 5 categories, most
 of them are unimplemented:

    File structure (e.g. stru file)
    -------------------------------
      file         Byte-oriented file system
      record       Record-oriented file system
      page         ----
      mount        ----
      vms          [On VMS/MULTINET] Preserve all VMS characteristics
                     automatically negotiated.

    Transfer type (e.g. type ascii)
    ------------------------------- 
      ascii            For text files, default.
      ebcdic           For IBM mainframes
      backup           [On VMS/MULTINET] for VMS/BACKUP files
      binary           Same as IMAGE
      image            For unformatted data files and executables.
      local-byte-size  
      logical-byte     Same as LOCAL-BYTE-SIZE  
      tenex            

    Transfer mode (e.g. mode stream)
    --------------------------------
      stream       Usual mode
      compressed   Supported by TGV/MULTINET only?
      block        

    Form formats
    ------------
      non-print 
      telnet format effectors
      carriage control (ASA)

    Auxiliary
    ---------
      record-size       [On VMS/MULTINET] 
      site rms recsize  [On VMS/MULTINET] 
      block             [On VMS/MULTINET] 
      case


Return to contents page