cde.gmix module¶
This module provides methods for a mixture of
Gaussian distributions where the weights, means, and standard deviations are
known. The accompanying module cde.kmn provides helper functions for
learning these parameters using a neural network.
-
cde.gmix.cdf(y: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray)[source]¶ The estimated cumulative density at the given target value
y.- Parameters
y (float) – The target value where the cumulative density will be estimated.
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.
- Returns
The cumulative density estimation for each set of predicted parameters.
- Return type
np.ndarray
-
cde.gmix.make_centers(data: numpy.ndarray, n_centers: int, extend_lower: float = 0, extend_upper: float = 0, method: str = 'jenks')[source]¶ Return an array of kernel centers given some target
data. There are currently two supported methods for creating the array. The first method uses the Jenks natural breaks algorithm and the second simply divides the space up uniformly.If you suspect your training data will not adequately cover the range of values in the target distribution you can optionally extend the lower and upper bounds by a given proportion \(p\) (specified as a floating point number between 0 and 1). The lower range will be extended by:
\(lower\_bound = min(data) - p \cdot (max(data) - min(data))\)
While the upper range will be extended by:
\(upper\_bound = max(data) + p \cdot (max(data) - min(data))\)
The uniform method splits the range (\(min - max\)) using
np.linspace()and then returns the mid points between the breaks.The jenks method simply uses the split points returned by the algorithm as the kernel centers, including the extended lower and upper bounds. It may be more precise to use the mid points between the breaks, but in practice it should not make much difference for many applications. If desired, this can easily be accomplished on the returned results.
- Parameters
data – A 1-dimensional array of data used to identify the kernel centers.
n_centers – The number of centers to return.
extend_lower – The proportion to extend the lower bound by.
extend_upper – THe proportion to extend the upper bound by.
method – The method used to find kernel centers from the given data. The choices are jenks or uniform.
- Returns
An array representing the center point of each kernel.
-
cde.gmix.mean(weights: numpy.ndarray, means: numpy.ndarray)[source]¶ Return the mean value of the density estimation using the given parameters.
- Parameters
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
- Returns
The mean values for the given parameters.
- Return type
np.ndarray
-
cde.gmix.median(weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, search_lower_bound: float = - 1000, search_upper_bound: float = 1000)[source]¶ Return the median value of the density estimation using the given parameters.
- Parameters
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.
search_lower_bound –
search_upper_bound –
- Returns
The mean values for the given parameters.
- Return type
np.ndarray
-
cde.gmix.mode(weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, method: str = 'cnewton', threshold: float = 0.25, **kwargs)[source]¶
-
cde.gmix.pdf(y: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray)[source]¶ The probability density function of the Gaussian mixture at a single point y.
- Parameters
y (float) – The value where the density will be estimated.
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.
- Returns
The estimated density for each set of parameters.
- Return type
np.ndarray
-
cde.gmix.plot(weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, row_names: Optional[Union[list, numpy.ndarray]] = None, x_min: Optional[float] = None, x_max: Optional[float] = None, x_ticks: Optional[List[float]] = None, n_points: int = 1000, show_mixture: bool = False, show_mean: bool = False, show_median: bool = False, show_mode: bool = False, percent_interval: Optional[float] = 0.9, ncol: int = None, figure_size: Tuple[float, float] = None)[source]¶
-
cde.gmix.ppf(p: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, search_lower_bound: float = - 1000, search_upper_bound: float = 1000)[source]¶ The percent point function. There is no analytic solution to finding the ppf so a heuristic search is performed following the recipe in this Stack Exchange solution.
- Parameters
p (float (between 0 and 1)) – The desired probability.
weights (np.ndarray) – The weights for each Gaussian in the mixture.
means (np.ndarray) – The mean values for each Gaussian in the mixture.
sigmas (np.ndarray) – The sigma values for each Gaussian.
search_lower_bound (numeric) – The lower bound for the search.
search_upper_bound – The upper bound for the search.
- Returns
The value such that its probability is less than or equal to
p.- Return type
-
cde.gmix.quantile(p: float, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, search_lower_bound: float = - 1000, search_upper_bound: float = 1000)[source]¶ This is just an alias of
cde.gmix.ppf().- Parameters
p –
weights –
means –
sigmas –
search_lower_bound –
search_upper_bound –
- Returns
- Return type
-
cde.gmix.random(n: int, weights: numpy.ndarray, means: numpy.ndarray, sigmas: numpy.ndarray, rng: numpy.random.mtrand.RandomState = None, method: str = 'inverse_transform')[source]¶