From the Vega Example Gallery
A histogram subdivides a numerical range into bins, and counts the number of data points with each segment. The resulting bar chart provides a discrete estimate of the probability density function.
This example demonstrates a histogram over a numerical range, with a segment to show the prevalence of null values.
Visual comparison of estimated probability distributions for a sample of numeric values. A normal (Gaussian) distribution parameterized by the mean and standard deviation, and a kernel density estimate. This example supports estimates of either probability density functions (pdf) or cumulative distribution functions (cdf), using Vega’s density transform.
A box plot summarizes a distribution of quantitative values using a set of summary statistics. Here, the boxes show the interquartile range (IQR), with the white bar indicating the median value. The thin lines (“whiskers”) currently show the extent of the minimum and maximum values; other values, such as whiskers extending 1.5 * IQR from each end of the box, are often used as well. See the violin plot example for an alternative approach.
A violin plot visualizes a distribution of quantitative values as a continuous approximation of the probability density function, computed using kernel density estimation (KDE). The densities are additionally annotated with the median value and interquartile range, shown as black lines. Violin plots can be more informative than classical box plots.
A plot of the top-k film directors by aggregate worldwide gross. Performs an aggregation of all directors, ranks them, and filters to only the top results, using the 'window' transform.
A plot of the top-k film directors, plus all other directors, by aggregate worldwide gross. Unlike the previous example, this chart includes a category of all other directors aggregated together. The visualization spec first computes aggregates for all directors and ranks them. It then copies these ranks back to the source data using a lookup transform, and determines which directors belong in the “other” category before performing a final aggregation.
A binned scatterplot is a more scalable alternative to the standard scatter plot. The data points are grouped into bins, and an aggregate statistic is used to summarize each bin. Here we use a circular area encoding to depict the count of records, visualizing the density of data points. For higher bin counts color might instead be used, though with some loss of perceptual comparison accuracy.
A contour plot depicts the density of data points using a set of discrete levels. Akin to contour lines on topographic maps, each contour boundary is an isoline of constant density. Kernel density estimation is performed to generate a continuous approximation of the sample density.
A wheat plot is an alternative to standard dot plots and histograms that incorporates aspects of both. The x-coordinate of a point is based on its exact value. The y-coordinate is determined by grouping points into histogram bins, then stacking them based on their rank order within each bin. While not scalable to large numbers of data points, wheat plots allow inspection of (and interaction with) individual points without overplotting.
Rather than showing a continuous probability distribution, Hypothetical Outcome Plots (or HOPs) visualize a set of draws from a distribution, where each draw is shown as a new plot in either a small multiples or animated form.
This example – inspired by The New York Times – displays random draws for a simulated time-series of values (these could be sales or employment statistics). The noise signal determines the amount of random variation added to the signal. The trend signal determines the strength of a linear trend, where zero corresponds to no trend at all (a flat uniform distribution). When the noise is high enough, draws from a distribution without any underlying trend may cause us to “hallucinate” interesting variations. Viewing the different frames may help viewers get a more intuitive sense of random variation.