This seemingly complex communication pattern mimics a pattern that can occur in an application due to timing variations on each processor. If the message sent by process 2 to process 1 is short but long enough to require a rendezvous protocol, there can be a sigificant delay before the short message from process 2 is received by process 1, even though the receive for that message is already available. Explore the possibilities by considering various lengths of messages.
The code for this assignment is relatively short but hard to describe in words; feel free to look at the solution if you have trouble understanding the description above. A good way to look at this example is to use nupshot logging (see the profiles).
Some MPI implementations will be relatively insensitive to the size of the short message. See the comments and profiles for an example of an implementation that is sensitive.
Think about how you might "trick" the MPI implementation that you use to delay completing a send or a receive by having several processes involved. How would you test your hypothesis?