machine_learning.t_stochastic_neighbour_embedding

t-distributed stochastic neighbor embedding (t-SNE)

For more details, see: https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding

Functions

apply_tsne(→ numpy.ndarray)

Apply t-SNE for dimensionality reduction.

collect_dataset(→ tuple[numpy.ndarray, numpy.ndarray])

Load the Iris dataset and return features and labels.

compute_low_dim_affinities(→ tuple[numpy.ndarray, ...)

Compute low-dimensional affinities (Q matrix) using a Student-t distribution.

compute_pairwise_affinities(→ numpy.ndarray)

Compute high-dimensional affinities (P matrix) using a Gaussian kernel.

main(→ None)

Run t-SNE on the Iris dataset and display the first 5 embeddings.

Module Contents

machine_learning.t_stochastic_neighbour_embedding.apply_tsne(data_matrix: numpy.ndarray, n_components: int = 2, learning_rate: float = 200.0, n_iter: int = 500) numpy.ndarray

Apply t-SNE for dimensionality reduction.

Args:

data_matrix: Original dataset (features). n_components: Target dimension (2D or 3D). learning_rate: Step size for gradient descent. n_iter: Number of iterations.

Returns:

ndarray: Low-dimensional embedding of the data.

>>> features, _ = collect_dataset()
>>> embedding = apply_tsne(features, n_components=2, n_iter=50)
>>> embedding.shape
(150, 2)
machine_learning.t_stochastic_neighbour_embedding.collect_dataset() tuple[numpy.ndarray, numpy.ndarray]

Load the Iris dataset and return features and labels.

Returns:

tuple[ndarray, ndarray]: Feature matrix and target labels.

>>> features, targets = collect_dataset()
>>> features.shape
(150, 4)
>>> targets.shape
(150,)
machine_learning.t_stochastic_neighbour_embedding.compute_low_dim_affinities(embedding_matrix: numpy.ndarray) tuple[numpy.ndarray, numpy.ndarray]

Compute low-dimensional affinities (Q matrix) using a Student-t distribution.

Args:

embedding_matrix: Low-dimensional embedding of shape (n_samples, n_components).

Returns:

tuple[ndarray, ndarray]: (Q probability matrix, numerator matrix).

>>> y = np.array([[0.0, 0.0], [1.0, 0.0]])
>>> q_matrix, numerators = compute_low_dim_affinities(y)
>>> q_matrix.shape
(2, 2)
machine_learning.t_stochastic_neighbour_embedding.compute_pairwise_affinities(data_matrix: numpy.ndarray, sigma: float = 1.0) numpy.ndarray

Compute high-dimensional affinities (P matrix) using a Gaussian kernel.

Args:

data_matrix: Input data of shape (n_samples, n_features). sigma: Gaussian kernel bandwidth.

Returns:

ndarray: Symmetrized probability matrix.

>>> x = np.array([[0.0, 0.0], [1.0, 0.0]])
>>> probabilities = compute_pairwise_affinities(x)
>>> float(round(probabilities[0, 1], 3))
0.25
machine_learning.t_stochastic_neighbour_embedding.main() None

Run t-SNE on the Iris dataset and display the first 5 embeddings.

>>> main()
t-SNE embedding (first 5 points):
[[...