Collective communication involves a group of processes within a specified communicator.

  • every process needs to call the collective function
  • collective functions are highly optimized in their MPI implementations, so it makes sense to use them over their manual implementations
  • collective calls are matched solely based on communicator and calling order (there are no tags)

MPI_Reduce()

MPI_Reduce takes an array of input elements from each process, processes them reducing them into a single result, which gets sent to the root process.

example

center

header

MPI_Reduce(
   void*          send_data,     // in
   void*          recv_data,     // *out*
   int            count,         // in
   MPI_Datatype   datatype,      // in
   MPI_Op         operator,      // in
   int            root,          // in
   MPI_Comm       comm           // in
);
  • send_data is an array of elements (or an element) of type datatype that each process wants to reduce
  • recv_data is only relevant to the process of rank root, and it contains the reduced result
    • its size is sizeof(datatype) * count
    • even though it doesn’t concern them, all of the processes still need to pass in an actual argument corresponding to recv_data, even if it’s just NULL
  • MPI_Op is the reduction operation
    • custom operators can be created with MPI_Op_create()

collective operations

operation valuemeaning
MPI_MAXmaximum
MPI_MINminimum
MPI_SUMsum
MPI_PRODproduct
MPI_LANDlogical and
MPI_BANDbitwise and
MPI_LORlogical or
MPI_BORbitwise or
MPI_LXORlogical exclusive or
MPI_BXORbitwise exclusive or
MPI_MAXLOCmaximum and location of maximum
MPI_MINLOCminimum and location of minimum

only one call to MPI_Reduce is made - the function will distinguish between the different processes

other caveats

  • the arguments passed by each process must be “compatible”
  • for example, if one process passes in 0 as the dest_process and another passes in 1, then the outcome of a call to MPI_Reduce is erroneous and the program is likely to hang or crash
  • despite the fact

MPI_Bcast

MPI_Bcast sends data belonging to a single process to all of the processes in the communicator

syntax

int MPI_Bcast(
  void* data_p,          // in/out
  int count,             // int
  MPI_Datatype datatype, // in	
  int root,              // in
  MPI_Comm comm,         // int
);
  • although the root process and receiver processes do different jobs, they all call the same MPI_Bcast function.
  • when the root process calls MPI_Bcast, the data_p variable will be sent to all other processes
  • when all of the receiver processes call MPI_Bcast, the data_p variable will be filled in with the data from the root process.

MPI_Allreduce

An MPI_Allreduce is conceptually an MPI_Reduce followed by an MPI_Bcast - the data is processed and the result is distributed to all the processes.

example

center

int MPI_Allreduce(
	void*        input_data_p,  // in
	void*        output_data_p, // out
	int          count,         // in
	MPI_Datatype datatype,      // in
	MPI_Op       operator,      // in
	MPI_Comm     comm           // in
);
  • the argument list is identical to that of MPI_Reduce, but there is no dest_process since all the processes will get the results