t-SNE

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a non-linear dimensionality reduction method. Unlike PCA, which rotates the data linearly, t-SNE is designed to preserve local neighbourhoods: samples that are similar in the original space stay close together in the 2D map, while dissimilar samples get pushed apart. t-SNE is useful for:

Spotting non-linear clusters that PCA misses
Quick visual check for natural groupings
Communicating data structure to non-technical stakeholders

t-SNE is for visualization, not modelling. The 2D coordinates are not directly comparable across runs (different random seeds produce different layouts). Don’t use t-SNE coordinates as features for prediction.

Configuring a t-SNE run

In the Configure run card on the analysis detail page, select t-SNE.

t-SNE configuration showing method selection, run name, and perplexity slider

Parameters

Parameter	Default	Range	Description
Perplexity	30.0	5.0 to 100.0	Controls how t-SNE balances local vs. global structure. Roughly the number of effective neighbours each point pays attention to.

Choosing perplexity

Sample count	Suggested perplexity
30-100	5-15
100-500	20-40
500-2000	30-50
2000+	50-80

If the result looks like a single blob, lower the perplexity. If it shatters into many tiny groups, raise it. Try a few values: there’s no single “right” answer.

Click Launch run. Status goes through “Queueing”, “Analysing”, “Done”.

Reading the results

t-SNE run detail showing 2D embedding scatter plot coloured by a property

Embedding plot

A 2D scatter plot. Dim 1 is the horizontal axis, Dim 2 is the vertical axis. Each point is one spectrum. The axes themselves have no inherent meaning in t-SNE. Only relative distances between points matter:

Points close together = similar spectra
Points far apart = dissimilar spectra
Point clusters = natural groupings

Colour by

Pick a property to colour the points. If samples coloured by a property segregate into distinct regions of the map, that property has a strong signal in your spectra.

Common patterns

Pattern	Likely meaning
One single blob	No strong structure (or perplexity too high)
Clear separated clusters	Natural groups in your data
Long curved manifold	Continuous variation along some property
Coloured property gradient	Property is well-correlated with spectral features

Limitations

Stochastic: every run produces a different layout, even with the same parameters
No global meaning: distances between far-apart clusters are not meaningful
Slow on large datasets: 10,000+ samples take noticeably longer
Sensitive to parameters: small perplexity changes can produce very different maps

For these reasons, use t-SNE alongside PCA, not as a replacement. PCA gives you reproducible global structure; t-SNE gives you visual cluster intuition.

Comparing t-SNE runs

t-SNE embeddings are independent across runs (different random initializations). When you compare runs, the comparison shows each run’s embedding side by side. Don’t expect points in run A to land in the same place as run B; only the cluster structure is meaningful.

Getting started

Account & management

Hardware

Data

Exploration

Modelling

Production

Configuring a t-SNE run

Parameters

Choosing perplexity

Reading the results

Embedding plot

Colour by

Common patterns

Limitations

Comparing t-SNE runs

Getting started

Account & management

Hardware

Data

Exploration

Modelling

Production

Documentation Index

​Configuring a t-SNE run

​Parameters

​Choosing perplexity

​Reading the results

​Embedding plot

​Colour by

​Common patterns

​Limitations

​Comparing t-SNE runs

Configuring a t-SNE run

Parameters

Choosing perplexity

Reading the results

Embedding plot

Colour by

Common patterns

Limitations

Comparing t-SNE runs