improver.utilities.neighbourhood_tools module

Provides tools for neighbourhood generation

boxsum(data, boxsize, cumsum=True, **pad_options)[source]

Fast vectorised approach to calculating neighbourhood totals.

This function makes use of the summed-area table method. An input array is accumulated top to bottom and left to right. This accumulated array can then be used to efficiently calculate the total within a neighbourhood about any point. An example input data array:

| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 |

is accumulated to become:

| 1 | 2  | 3  | 4  | 5  |
| 2 | 4  | 6  | 8  | 10 |
| 3 | 6  | 9  | 12 | 15 |
| 4 | 8  | 12 | 16 | 20 |
| 5 | 10 | 15 | 20 | 25 |

If we wish to calculate the total in a 3x3 neighbourhood about some point (*) of our array we use the following points:

| 1 (C) | 2  | 3     | 4 (D)  | 5  |
| 2     | 4  | 6     | 8      | 10 |
| 3     | 6  | 9 (*) | 12     | 15 |
| 4 (A) | 8  | 12    | 16 (B) | 20 |
| 5     | 10 | 15    | 20     | 25 |

And the calculation is:

Neighbourhood sum = C - A - D + B
= 1 - 4 - 4 + 16
= 9

This is the value we would expect for a 3x3 neighbourhood in an array filled with ones.

Parameters
  • data (ndarray) – The input data array.

  • boxsize (Union[int, Tuple[int, int]]) – The size of the neighbourhood. Must be an odd number.

  • cumsum (bool) – If False, assume the input data is already cumulative. If True (default), calculate cumsum along the last two dimensions of the input array.

  • pad_options (Any) – Additional keyword arguments passed to numpy.pad function. If given, the returned result will have the same shape as the input array.

Return type

ndarray

Returns

Array containing the calculated neighbourhood total.

Raises
  • ValueError – If boxsize has non-integer type.

  • ValueError – If any member of boxsize is not an odd number.

pad_and_roll(input_array, shape, **kwargs)[source]

Pads the last len(shape) axes of the input array for rolling_window to create ‘neighbourhood’ views of the data of a given shape as the last axes in the returned array. Collapsing over the last len(shape) axes results in a shape of the original input array.

Parameters
  • input_array (ndarray) – The dataset of points to pad and create rolling windows for.

  • shape (Tuple[int, int]) –

    Desired shape of the neighbourhood. E.g. if a neighbourhood width of 1 around the point is desired, this shape should be (3, 3):

    X X X
    X O X
    X X X
    

    Where O is our central point and X represent the neighbour points.

  • kwargs (Any) – additional keyword arguments passed to numpy.pad function.

Return type

ndarray

Returns

Contains the views of the input_array, the final dimension of the array will be the specified shape in the input arguments, the leading dimensions will depend on the shape of the input array.

pad_boxsum(data, boxsize, **pad_options)[source]

Pad an array to shape suitable for boxsum.

Note that padding is not symmetric: there is an extra row/column at the top/left (as required for calculating the boxsum).

Parameters
  • data (ndarray) – The input data array.

  • boxsize (Union[int, Tuple[int, int]]) – The size of the neighbourhood.

  • pad_options (Any) – Additional keyword arguments passed to numpy.pad function.

Return type

ndarray

Returns

Array padded to shape suitable for boxsum.

rolling_window(input_array, shape, writeable=False)[source]

Creates a rolling window neighbourhood of the given shape from the last len(shape) axes of the input array. Avoids creating a large output array by constructing a non-continuous view mapped onto the input array.

Parameters
  • input_array (ndarray) – An array from which rolling window neighbourhoods will be created.

  • shape (Tuple[int, int]) – The neighbourhood shape e.g. if the neighbourhood size is 3, the shape would be (3, 3) to create a 3x3 array around each point in the input_array.

  • writeable (bool) – If True the returned view will be writeable. This will modify the input array, so use with caution.

Return type

ndarray

Returns

“views” into the data, each view represents a neighbourhood of points.

Raises
  • ValueError – If input_array has fewer dimensions than shape.

  • RuntimeError – If any dimension of shape is larger than the corresponding dimension of input_array.