CodeAnalyzer is a generic engine for analyzing and formatting REXX code. This is a tool for the REXX programmer who works with large, complex programs.

It is a work in progress and contributions are welcome. I place the code in public domain in so far as I have any right to do so.

Doug Rickman

MSFC/NASA

Doug.Rickman@msfc.nasa.gov

August 20, 2003

Good luck! Remember the problem with knowing what you are doing is that you have deluded yourself. 2/17/94

Introduction:

Capabilities

Known Limitations:

Operation:

To execute from the command line

CodeAnalyzer.cmd PROGRAM

where "PROGRAM" is the REXX code to be analyzed. CodeAnalyzer will post progress information to the screen and create a file with the extension ".AnalOCode.txt" in the directory of PROGRAM. The distribution archive provides examples of both outputs. The information to the screen is provided in the file NormalRun.log of the. An example of the ".AnalOCode.txt" output is provided in the file CodeAnalyzer.AnalOCode.txt. Both are from a run where "PROGRAM" was CodeAnalyzer.cmd.

Algorithm:

The principal actions of the program are initiated in the subroutine MAIN. The steps are

  1. Read the raw source file into memory [rc = ReadRawSource(in)]
In concept there appears to be an ambiguity in REXX interpreters about nesting comments and quotes inside of each other. I have chosen to assume that bounding comments are to be found first.
  1. Find the comments in raw source. [rc = MapNMaskCommentsNLiterals('RAW_C',)]
  2. Find the literal strings in the raw source.' [rc = MapNMaskCommentsNLiterals('RAW_L',)]
It is now possible to reformat the raw code into a consistent pattern.
  1. Make the logical lines.' [rc = MakeLogicalLines()]
  2. Find the comments in the logical lines. [rc = MapNMaskCommentsNLiterals('LOGICAL_C','MAP')]
  3. Find the literal strings in the logical lines. [rc = MapNMaskCommentsNLiterals('LOGICAL_L','MAP')]
There is a copy of the ith new, clean line in the variable LogicalLineI.i LogicalLine1.i holds that logical line with comments removed.LogicalLine2.i holds that logical line with comments and quotes removed. SourceIndex.i is the line number in the raw source for the ith logical line. The position and contents of the comments and literal strings for the line are in the compound variables "Commnet." and "Literals.".
  1. Labels and directives are then found. [rc = FindLabelsNDirectives()]
  2. The list of known functions, i.e. the DLL libraries, is loaded. [rc = LoadKnownFunctions()]
  3. The list of default conditions is loaded. [rc = LoadDefaultConditions()]
This list: ANY, ERROR, FAILURE', HALT, SYNTAX, etc, is not used by the existing code. It is expected to be used in the subroutine SignalAnalysis.
  1. Find all references to functions and subroutines. [rc = FindCalls2Subroutines()]
This is the heart of the logic mapping operation. Information is stored in the "FRef.".
  1. Write the contents of the reference tables. [rc = WriteFRefTable()]
  2. Map the relationship between subroutines. [rc = SubroutineAnalyzer(in)]

Rational

Much of the existing code reflects my desire to extend the analytical part of the program. For example, I would like to be able to find all variables used in a specific subroutine and compare that to the map of subroutines and their exposed variable lists. I have also tried to consider the future needs that might arise as the program is extended.

Major Variables:

Created in ReadRawSource().

 data. - Original source code.

Created in MapNMaskCommentsNLiterals().

 dataEdited1. - Source after replacing all comments with blanks.

 dataEdited2. - Source after blanking comments and literal strings.

Created in MakeLogicalLines().

 LogicalLineI. = Original source code.

 LogicalLine1. = Comments are blanked out.

 LogicalLine2. = Comments and literal strings are blanked out.

 SourceIndex.j = First line in original source of logical line j.

 Comment.i.0 = Number of comments in line i.

Comment.i._Str.k = Character position for start of comment k in line i.

Comment.i._End.k = Character position for end of comment k in line i.

Comment.i._Txt.k = Text of comment k in line i.  

 Literal.i.0 = Number of literals in line i.

Literal.i._Str.j = Character position for start of literal k in line i.

Literal.i._End.j = Character position for end of literal k in line i.

Literal.i._Typ.j = Type of literal k in line i (S|D - single or double).

 Literal.i._Txt.j = Text of literal k in line I.

Notes -

LogicalLines are lines after editing out of continuations, semicolons and blank lines.

Created in FindLabels().

 Label.i = Line# || Type ("STRING"|"SYMBOL") || FunctionName

Notes - Since most programs may not name the initial routine I have chosen to refer to the intitial routine by the lablel "ProgramBegan".

Created in FunctionAnalysis().

 FRef.i.0 = Number of functions referenced in line i.

FRef.i._Str.k = char 1 in name of kth function referenced in line i.

FRef.i._End.k = Last char of name of kth function referenced in i.

FRef.i._Txt.k = Text string (name) kth function referenced in line i.

 FRef.i._Typ.k = Type of function, kth function referenced in line i.

 FRef.i._Open.k = Postion of "(" for kth function referenced in line i.

 FRef.i._Close.k = Postion of ")" for kth function referenced in line i.

 FRef.i._Knd.k = Nature of reference, subroutine call or function.

Notes -

1. Positions are relative to the first non-blank character in the line.

 2. FRef.line._Open.k = FRef.line._Close.k when the reference is done using the CALL instruction and there are no arguments passed.

 3. For CALL instructions FRef.i._Open.k and FRef.i._Close.k give positions of first and last characters of argument string. If there are no arguments FRef.i._Close.k = FRef.i._Open.k.

 4. For a CALL (variable) FRef.i._Str.k and FRef.i._End.k are the same as FRef.i._Open.k and FRef.i._Close.k.

 5. FRef.i._Knd.k = "CALL" | "FUNCTION"

 6. For CALL instructions FRef.i._Open.k and FRef.i._Close.k are computed after all comment blocks have been deleted.

Extending the Program

Design considerations

Reformatting

If your interest is in reformatting REXX code look into the subroutine MAIN. By the line "say 'Finished finding literals in logical lines.' all comments, line continuations and quoted strings have been identified and a consistent, though unformatted, line of code created. There is a copy of the new, clean line in the variable LogicalLineI. LogicalLine1. holds the logical line with comments removed. LogicalLine2. holds the logical line with comments and quotes removed. The contents of comments and quotes are available. To see how look at the code following the comment "Debug aid and illustration ..." that immediately follows the above indicated line.

Logically, a line reformat operation would be inserted in this location. Remember, if the reformat operation modifies the line numbers the "Comment." and "Literal." indices will have to be updated.

Other

The program currently does not handle a comment between the function's name and the opening parenthesis. To do so reference Comment.i._Str. and Comment.i._End. and see if one of the comments fills the space.

Subroutine Discussions

GenericError

CodeAnalyzer incorporates the subroutine GenericError. This routine provides the programmer information in the event of failures. In addition to line number of failure, CONDITION: SYNTAX:, INSTRUCTION:, SIGNAL:, DESCRIPTION: and STATUS:, it gives the current value of strings and the subroutine history to the point of failure. I find these last two details very helpful in debugging code.

Operation of the subroutine is controlled by the variable "GenericErrorQuiet". This variable is set on entry to CodeAnalyzer. For a discussion of other options see the comments in the subroutine.

In order for the subroutine history to be available the variable SUBROUTINEHISTORY is created. On entry to each subroutine the name of the subroutine is prepended to the variable. On leaving the leading word of the variable is removed. When PROCEDURE instructions are used the SUBROUTINEHISTORY variable must be exposed.

A simple illustration of a GenericError output follows. The text in red is the information provided by the subroutine. The cause of the failure in this case was leaving a VisProREXX call in the program when it was not being run under VisProREXX.

Read 3462 lines from source file, D:\source\VisProSource\CodeAnalyzer\CodeAnalyzer.cmd.

Finished finding comments in source.

Finished finding literals in source.

Finished making 2957 logical lines.

Finished finding comments in logical lines.

Finished finding literals in logical lines.

Finished finding labels and directives.

Finished loading table of known functions.

Finished loading table of default conditions.

Finished finding calls to subroutines.

Finished writing function references table.

Finished with the subroutine analyzer.

Analysis of the code: call _VPAppExit /* This will force an exit without opening a panel. */

LIT "_VPAppExit" = _VPAPPEXIT

LIT "This" = THIS

LIT "will" = WILL

LIT "force" = FORCE

LIT "an" = AN

LIT "exit" = EXIT

LIT "without" = WITHOUT

LIT "opening" = OPENING

LIT "a" = A

VAR "panel." = PANEL.

LIT "panel" = PANEL

A serious REXX ERROR has occurred! I do not know what.

Other information for a programmer's use:

The line that generated this error is: 227

" call _VPAppExit /* This will force an exit without opening a panel. */ "

Subroutine history to point of failure, most recent first:

CommandLineExit()

Begin_Program()

Condition: SYNTAX

Instruction: SIGNAL

Description:

Status: OFF

RC: 43 SYS0043: Drive %1 cannot locate a specific area or track on the disk.