mpify package¶
Module contents¶
-
mpify.ranch(nprocs, fn, *args, caller_rank=0, gather=True, ctx=None, need='', imports='', **kwargs)[source]¶ Execute fn(*args, **kwargs) distributedly in nprocs processes. User can serialize over objects and functions, spell out import statements, manage execution context, gather results, and the parent process can participate as one of the workers.
If caller_rank is 0 <= caller_rank < nprocs, only nprocs - 1 processes will be forked, and the caller process will be a worker to run its share of fn(..).
If caller_rank is
None, nprocs processes will be forked.Inside each worker process, its relative rank among all workers is set up in os.environ[‘LOCAL_RANK’], and the total number of workers is set up in os.environ[‘LOCAL_WORLD_SIZE’], both as strings.
Then import statements in imports, followed by any objects/functions in need, are brought into the python global namespace.
Then, context manager ctx is applied around the call fn(*args, **kwargs).
Return value of each worker can be gathered in a list (indexed by the process’s rank) and returned to the caller of ranch().
- Parameters
nprocs (
int) – Number of processes to fork. Visible as a string in os.environ[‘LOCAL_WORLD_SIZE’] in all worker processes.fn (
Callable) – Function to execute on the worker pool*args – Positional arguments by values to fn(*args….)
**kwargs – Named parameters to fn(x=…, y=….)
caller_rank (
int) –Rank of the parent process.
0 <= caller_rank < nprocsto join,Noneto opt out. Default to0.In distributed data parallel, 0 means the leading process.
gather (
bool) – ifTrue, ranch will return a list of return values from each worker, indexed by their ranks. IfFalse, and if ‘caller_rank’ is not None (meaning parent process is a worker), ranch() will return whatever the parent process’ fn(…) returns.ctx (
Optional[AbstractContextManager]) – User defined context manager to be used in a ‘with’-clause around the ‘fn(…)’ call in worker processes. Subclassed from AbstractContextManager, ctx needs to define ‘__enter__()’ and ‘__exit__()’ methods.need (
str) – Space-separated names of objects/functions to be serialized over to the subprocesses.imports –
A multiline string of import statements to execute in the subprocesses before fn() execution. Supported formats:
import x, y, z as zoo
from A import x
from A import z as zoo
from A import x, y, z as zoo
Not supported: from A import (x, y)
- Returns
None, or list of results from worker processes, indexed by their LOCAL_RANK:[res_0, res_1, .... res_{nprocs-1}]
-
class
mpify.TorchDDPCtx(*args, world_size=None, base_rank=0, use_gpu=True, addr='127.0.0.1', port=29500, num_threads=1, **kwargs)[source]¶ Bases:
contextlib.AbstractContextManagerA context manager to set up and tear down a PyTorch distributed data parallel process group. os.environ[‘LOCAL_RANK’] must be defined prior to __enter__().
- Parameters
world_size (
Optional[int]) – total number of members in the DDP groupbase_rank (
int) – the starting, lowest rank value of among the forked local processesuse_gpu (
bool) – if True, will set the default CUDA device base on os.environ[‘LOCAL_RANK’]addr (
str) – see PyTorch distributed data parallel documentation.port (
int) – see PyTorch distributed data parallel documentation.num_threads (
int) – see PyTorch distributed data parallel documentation.
-
mpify.in_torchddp(nprocs, fn, *args, world_size=None, base_rank=0, ctx=None, use_gpu=True, need='', imports='', **kwargs)[source]¶ A convenience routine to prepare a context manager for PyTorch Distributed Data Parallel group setup/teardown, then calls ranch() to fork and execute fn(*args, **kwargs)
- Parameters
nprocs (
int) – Number of local processes to forkfn (
Callable) – the functions and its arguments*args – the functions and its arguments
**kwargs – the functions and its arguments
world_size (
Optional[int]) – total number of members in the entire PyTorch DDP groupbase_rank (
int) – the lowest, starting rank of in the local processesctx (
Optional[TorchDDPCtx]) – by default will use mpify.TorchDDPCtx to set up torch distributed group, but user can override it with their own if necessary.use_gpu (
bool) – a hint to suggest using GPU if available.need (
str) – names of local objects to serialize over, comma-separatedimports (
str) – multi-line import statements, to apply in each forked process.
- Returns
The result of fn(*args, **kwargs) in the rank base_rank execution.