Using mpi4py for Parallel Computing in Python on Supercomputers

Utpal Kumar   5 minute read      

The article discusses the mpi4py module, which is a Python wrapper for MPI, used for parallel computing in supercomputing environments. It provides higher-level constructs for parallel programming and offers convenience and performance benefits for developing and scaling parallel Python programs.

Installation

MPI (Message Passing Interface) is a widely used standard for parallel computing, commonly used in supercomputing environments. The mpi4py module is a Python wrapper for MPI that allows users to write parallel Python programs that can run on a cluster of computers.

Usage

To use mpi4py for supercomputing in Python, first, you need to ensure that MPI is installed on your system, and the mpi4py package is installed and accessible to your Python environment. Then, you can import the mpi4py module in your Python script and use MPI functions to distribute computations across multiple processors or nodes.

A typical usage pattern involves initializing MPI, creating a communicator to manage communication between processes, and then dividing the work to be done across the processes. Communication between processes can be achieved through sending and receiving messages, using MPI’s send and recv functions.

Additionally, mpi4py provides several higher-level constructs for parallel programming, including MPI-based parallel data structures and collective operations for broadcasting, reduction, and scatter-gather.

Overall, using mpi4py for supercomputing in Python involves understanding MPI’s basic concepts and functions, as well as taking advantage of mpi4py’s convenience and performance benefits for developing and scaling parallel Python programs.

Installation of mpi4py python library

python -m pip install mpi4py

See pip project description for more.

Example

Here’s an example of using mpi4py to distribute a simple computation across multiple processors:

from mpi4py import MPI

# Initialize MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

# Define the work to be done
data = 100
if rank == 0:
    print("Starting with data = ", data)

# Divide the work across processes
data_per_proc = data / size
local_data = comm.scatter([data_per_proc]*size, root=0)

# Perform computation
local_result = local_data ** 2

# Gather results
results = comm.gather(local_result, root=0)

# Print final results
if rank == 0:
    total_result = sum(results)
    print("Result: ", total_result)

In this example, we initialize MPI, define the work to be done (in this case, computing the square of a number), divide the work across processes using scatter, perform the computation on each process, gather the results using gather, and print the final result on the root process.

When running this script with multiple processes, each process will compute its own local_result, which will then be gathered on the root process, where the final result is computed and printed.

Example for performing computation

This example demonstrate how to use mpi4py to perform a parallel reduction operation, which can be useful for performing computations such as finding the sum, minimum, or maximum of a large dataset:

from mpi4py import MPI
import numpy as np

# Initialize MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

# Generate random data
data_size = 100000
data = np.random.rand(data_size)

# Divide the data across processes
local_data_size = data_size // size
local_data = np.zeros(local_data_size)
comm.Scatter(data, local_data, root=0)

# Perform local computation
local_sum = np.sum(local_data)

# Perform reduction operation to get final result
global_sum = comm.reduce(local_sum, op=MPI.SUM, root=0)

# Print final result on root process
if rank == 0:
    print("Total sum: ", global_sum)

In this example, we first generate a large dataset and divide it across processes using Scatter. Each process computes the local sum of its portion of the data using NumPy’s sum function. We then perform a reduction operation using reduce to obtain the total sum of the data, which is printed on the root process.

Note that in this example, we use the op argument of reduce to specify that we want to perform a sum reduction operation. Other reduction operations, such as MPI.MIN or MPI.MAX, can also be used depending on the desired computation.

Compute the value of pi using Monte Carlo simulation

One can use the similar strategy as above to compute the value of pi using a Monte Carlo Simulation:

from mpi4py import MPI
import random

# Initialize MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

# Set number of samples and total count to 0
num_samples = 1000000
count = 0

# Compute number of samples for each process
samples_per_proc = num_samples // size

# Generate random points and check if they are inside the unit circle
for i in range(samples_per_proc):
    x = random.uniform(-1, 1)
    y = random.uniform(-1, 1)
    if x**2 + y**2 <= 1:
        count += 1

# Reduce the counts across all processes to get total count
total_count = comm.reduce(count, op=MPI.SUM, root=0)

# Compute and print the value of pi on root process
if rank == 0:
    pi = 4 * total_count / num_samples
    print("Estimated value of pi: ", pi)

In this script, we first initialize MPI and set the number of samples and total count to 0. We then compute the number of samples for each process and generate random points within the unit square. For each point, we check if it is inside the unit circle and increment the count if it is. We then use the reduce function to reduce the counts across all processes to get the total count. Finally, we compute and print the estimated value of pi on the root process using the formula pi = 4 * total_count / num_samples.

To run this script with, say, 4 processes, you can use the following command in the terminal:

mpiexec -n 4 python pi_estimate.py

Conclusion

The mpi4py module is a Python wrapper for MPI, a widely used standard for parallel computing, frequently used in supercomputing environments. With mpi4py, users can write parallel Python programs that can run on a cluster of computers. Users need to ensure that MPI is installed on their system and the mpi4py package is installed and accessible to their Python environment. The module provides several higher-level constructs for parallel programming, including MPI-based parallel data structures and collective operations for broadcasting, reduction, and scatter-gather. The article provides two examples of using mpi4py to distribute simple computation across multiple processors and performing a parallel reduction operation.

References

  1. Python for High Performance

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.


Leave a comment