cde.gmix module¶

This module provides methods for a mixture of Gaussian distributions where the weights, means, and standard deviations are known. The accompanying module cde.kmn provides helper functions for learning these parameters using a neural network.

cde.gmix.cdf(y: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray)[source]¶

The estimated cumulative density at the given target value y.

Parameters

y (float) – The target value where the cumulative density will be estimated.
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.

Returns

The cumulative density estimation for each set of predicted parameters.

Return type

np.ndarray

cde.gmix.make_centers(data: numpy.ndarray, n_centers: int, extend_lower: float = 0, extend_upper: float = 0, method: str = 'jenks')[source]¶

Return an array of kernel centers given some target data. There are currently two supported methods for creating the array. The first method uses the Jenks natural breaks algorithm and the second simply divides the space up uniformly.

If you suspect your training data will not adequately cover the range of values in the target distribution you can optionally extend the lower and upper bounds by a given proportion \(p\) (specified as a floating point number between 0 and 1). The lower range will be extended by:

\(lower\_bound = min(data) - p \cdot (max(data) - min(data))\)

While the upper range will be extended by:

\(upper\_bound = max(data) + p \cdot (max(data) - min(data))\)

The uniform method splits the range (\(min - max\)) using np.linspace() and then returns the mid points between the breaks.

The jenks method simply uses the split points returned by the algorithm as the kernel centers, including the extended lower and upper bounds. It may be more precise to use the mid points between the breaks, but in practice it should not make much difference for many applications. If desired, this can easily be accomplished on the returned results.

Parameters

data – A 1-dimensional array of data used to identify the kernel centers.
n_centers – The number of centers to return.
extend_lower – The proportion to extend the lower bound by.
extend_upper – THe proportion to extend the upper bound by.
method – The method used to find kernel centers from the given data. The choices are jenks or uniform.

Returns

An array representing the center point of each kernel.

cde.gmix.mean(weights: numpy.ndarray, means: numpy.ndarray)[source]¶

Return the mean value of the density estimation using the given parameters.

Parameters

weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.

Returns

The mean values for the given parameters.

Return type

np.ndarray

cde.gmix.median(weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, search_lower_bound: float = - 1000, search_upper_bound: float = 1000)[source]¶

Return the median value of the density estimation using the given parameters.

Parameters

weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.
search_lower_bound –
search_upper_bound –

Returns

The mean values for the given parameters.

Return type

np.ndarray

cde.gmix.mode(weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, method: str = 'cnewton', threshold: float = 0.25, **kwargs)[source]¶

cde.gmix.pdf(y: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray)[source]¶

The probability density function of the Gaussian mixture at a single point y.

Parameters

y (float) – The value where the density will be estimated.
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.

Returns

The estimated density for each set of parameters.

Return type

np.ndarray

cde.gmix.plot(weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, row_names: Optional[Union[list, numpy.ndarray]] = None, x_min: Optional[float] = None, x_max: Optional[float] = None, x_ticks: Optional[List[float]] = None, n_points: int = 1000, show_mixture: bool = False, show_mean: bool = False, show_median: bool = False, show_mode: bool = False, percent_interval: Optional[float] = 0.9, ncol: int = None, figure_size: Tuple[float, float] = None)[source]¶

cde.gmix.ppf(p: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, search_lower_bound: float = - 1000, search_upper_bound: float = 1000)[source]¶

The percent point function. There is no analytic solution to finding the ppf so a heuristic search is performed following the recipe in this Stack Exchange solution.

Parameters

p (float (between 0 and 1)) – The desired probability.
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.
search_lower_bound (numeric) – The lower bound for the search.
search_upper_bound – The upper bound for the search.

Returns

The value such that its probability is less than or equal to p.

Return type

float

cde.gmix.quantile(p: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, search_lower_bound: float = - 1000, search_upper_bound: float = 1000)[source]¶

This is just an alias of cde.gmix.ppf().

Parameters

p –
weights –
means –
sigmas –
search_lower_bound –
search_upper_bound –

Returns

Return type

cde.gmix.random(n: int, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, rng: numpy.random.mtrand.RandomState = None, method: str = 'inverse_transform')[source]¶