Numba parallel

[1]:
import numpy as np
from numba import prange, njit, guvectorize

Lets first get some test resources. The names and the structure from the examples are taken from the calculation of the expected value function in respy. The original function can be found here.

[2]:
wages = np.ones((100, 4))
nonpecs = np.ones((100, 4))
continuation_values = np.ones((100, 4))
period_draws_emax_risk = np.ones((50, 4))
delta = 0.95

Parallelization of @jit functions

numba offers automatic parallelization of jit functions. This can either happen implicit on array operations or explicit with the keyword statement parallel=True and e.g. parralel loops with prange. The resources for this can be found here.

[3]:
@njit(parallel=True)
def parralel_loop(wages, nonpecs, continuation_values, draws, delta):
    num_states, n_ch = wages.shape
    n_draws, n_choices = draws.shape
    out = 0
    for k in prange(num_states):
        for i in prange(n_draws):
            for j in prange(n_choices):
                out += (
                    wages[k, j] * draws[i, j]
                    + nonpecs[k, j]
                    + delta * continuation_values[k, j]
                )

    return out

Diagnostics

When calling an explicit parallelized function, numba tries to create separate calculations to run multiple kernels or threads. The optimization behavior can be inspected by using func.parallel_diagnostics(level=4).

The levels can vary from one to four. The resources to this can be found here.

[4]:
# An example of the two things above:
parralel_loop(
    wages, nonpecs, continuation_values, period_draws_emax_risk, delta
)
parralel_loop.parallel_diagnostics(level=4)

================================================================================
 Parallel Accelerator Optimizing:  Function parralel_loop, <ipython-
input-3-cf75dd448160> (1)
================================================================================


Parallel loop listing for  Function parralel_loop, <ipython-input-3-cf75dd448160> (1)
-------------------------------------------------------------------------|loop #ID
@njit(parallel=True)                                                     |
def parralel_loop(wages, nonpecs, continuation_values, draws, delta):    |
    num_states, n_ch = wages.shape                                       |
    n_draws, n_choices = draws.shape                                     |
    out = 0                                                              |
    for l in prange(num_states):-----------------------------------------| #2
        for i in prange(n_draws):----------------------------------------| #1
            for j in prange(n_choices):----------------------------------| #0
                out += (                                                 |
                    wages[l, j] * draws[i, j]                            |
                    + nonpecs[l, j]                                      |
                    + delta * continuation_values[l, j]                  |
                )                                                        |
                                                                         |
    return out                                                           |
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
Parallel region 0:
+--2 (parallel)
   +--1 (parallel)
      +--0 (parallel)


--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel region 0:
+--2 (parallel)
   +--1 (serial)
      +--0 (serial)



Parallel region 0 (loop #2) had 0 loop(s) fused and 2 loop(s) serialized as part
 of the larger parallel loop (#2).
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found

Instruction hoisting:
loop #2:
  Failed to hoist the following:
    dependency: out_4 = out.3
--------------------------------------------------------------------------------

Parallelization of @guvectorize functions

When using @guvectorize, you can define functions on multiple arrays, which then can be parallelized across the entries of the arrays with target=”parallel”. Details to @guvectorize can be found here.

[5]:
@guvectorize(
    ["f8[:], f8[:], f8[:], f8[:, :], f8, f8[:]"],
    "(n_choices), (n_choices), (n_choices), (n_draws, n_choices), () -> ()",
    nopython=True,
    target="parallel",
)
def calculate_expected_value_functions(
    wages, nonpecs, continuation_values, draws, delta, expected_value_functions
):
    n_draws, n_choices = draws.shape

    expected_value_functions[0] = 0

    for i in range(n_draws):

        max_value_functions = 0

        for j in range(n_choices):
            value_function = (
                wages[j] * draws[i, j]
                + nonpecs[j]
                + delta * continuation_values[j]
            )

            if value_function > max_value_functions:
                max_value_functions = value_function

        expected_value_functions[0] += max_value_functions

    expected_value_functions[0] /= n_draws

The statement target=”parallel” does not explicitly state that the code inside the @guvectorize function is parallelized itself. However, one can rule out this possibility, if the function diagnosed with the tools described above does not offer any parallelization. Thus, to my knowledge, there is no explicit possibility to fix a parallelization structure. One can only design the code, such that the intended parallelization happens when the @guvectorized function is called.