Numba parallel¶
[1]:
import numpy as np
from numba import prange, njit, guvectorize
Lets first get some test resources. The names and the structure from the examples are taken from the calculation of the expected value function in respy. The original function can be found here.
[2]:
wages = np.ones((100, 4))
nonpecs = np.ones((100, 4))
continuation_values = np.ones((100, 4))
period_draws_emax_risk = np.ones((50, 4))
delta = 0.95
Parallelization of @jit
functions¶
numba
offers automatic parallelization of jit functions. This can either happen implicit on array operations or explicit with the keyword statement parallel=True
and e.g. parralel loops with prange
. The resources for this can be found here.
[3]:
@njit(parallel=True)
def parralel_loop(wages, nonpecs, continuation_values, draws, delta):
num_states, n_ch = wages.shape
n_draws, n_choices = draws.shape
out = 0
for k in prange(num_states):
for i in prange(n_draws):
for j in prange(n_choices):
out += (
wages[k, j] * draws[i, j]
+ nonpecs[k, j]
+ delta * continuation_values[k, j]
)
return out
Diagnostics¶
When calling an explicit parallelized function, numba
tries to create separate calculations to run multiple kernels or threads. The optimization behavior can be inspected by using func.parallel_diagnostics(level=4)
.
The levels can vary from one to four. The resources to this can be found here.
[4]:
# An example of the two things above:
parralel_loop(
wages, nonpecs, continuation_values, period_draws_emax_risk, delta
)
parralel_loop.parallel_diagnostics(level=4)
================================================================================
Parallel Accelerator Optimizing: Function parralel_loop, <ipython-
input-3-cf75dd448160> (1)
================================================================================
Parallel loop listing for Function parralel_loop, <ipython-input-3-cf75dd448160> (1)
-------------------------------------------------------------------------|loop #ID
@njit(parallel=True) |
def parralel_loop(wages, nonpecs, continuation_values, draws, delta): |
num_states, n_ch = wages.shape |
n_draws, n_choices = draws.shape |
out = 0 |
for l in prange(num_states):-----------------------------------------| #2
for i in prange(n_draws):----------------------------------------| #1
for j in prange(n_choices):----------------------------------| #0
out += ( |
wages[l, j] * draws[i, j] |
+ nonpecs[l, j] |
+ delta * continuation_values[l, j] |
) |
|
return out |
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
Parallel region 0:
+--2 (parallel)
+--1 (parallel)
+--0 (parallel)
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel region 0:
+--2 (parallel)
+--1 (serial)
+--0 (serial)
Parallel region 0 (loop #2) had 0 loop(s) fused and 2 loop(s) serialized as part
of the larger parallel loop (#2).
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found
Instruction hoisting:
loop #2:
Failed to hoist the following:
dependency: out_4 = out.3
--------------------------------------------------------------------------------
Parallelization of @guvectorize
functions¶
When using @guvectorize
, you can define functions on multiple arrays, which then can be parallelized across the entries of the arrays with target=”parallel”
. Details to @guvectorize
can be found here.
[5]:
@guvectorize(
["f8[:], f8[:], f8[:], f8[:, :], f8, f8[:]"],
"(n_choices), (n_choices), (n_choices), (n_draws, n_choices), () -> ()",
nopython=True,
target="parallel",
)
def calculate_expected_value_functions(
wages, nonpecs, continuation_values, draws, delta, expected_value_functions
):
n_draws, n_choices = draws.shape
expected_value_functions[0] = 0
for i in range(n_draws):
max_value_functions = 0
for j in range(n_choices):
value_function = (
wages[j] * draws[i, j]
+ nonpecs[j]
+ delta * continuation_values[j]
)
if value_function > max_value_functions:
max_value_functions = value_function
expected_value_functions[0] += max_value_functions
expected_value_functions[0] /= n_draws
The statement target=”parallel”
does not explicitly state that the code inside the @guvectorize
function is parallelized itself. However, one can rule out this possibility, if the function diagnosed with the tools described above does not offer any parallelization. Thus, to my knowledge, there is no explicit possibility to fix a parallelization structure. One can only design the code, such that the intended parallelization happens when the @guvectorized
function is called.