Post

S2E3 - Aggregates and Facets

Demystify Vega-Lite Examples in this step-by-step rebuild 🕊️🧙🏼‍♂️✨

S2E3 - Aggregates and Facets

💌 PBIX file available at the end of the article 1 Enjoy!


 Intro

Welcome to Season 2 of the Vega-Lite walkthrough series. In this season, we will go step-by-step through many of the Vega-Lite Examples - learning loads of techniques and tips along the way. Enjoy and welcome! 🕊️



 Data

All data used in this series can be found in the Vega github repo:  
  Official Vega & Vega-Lite Data Source Repo  

Dataset:
NameOriginHorsepowerYear
chevrolet chevelle malibuUSA1301970-01-01
buick skylark 320USA1651970-01-01
plymouth satelliteUSA1501970-01-01
amc rebel sstUSA1501970-01-01
ford torinoUSA1401970-01-01
ford galaxie 500USA1981970-01-01
chevrolet impalaUSA2201970-01-01

. . .

bold text

 Concept

Here is the concept we will rebuild and better understand. The full script can be expanded below.

Important Note: Viewing Vega/Vega-Lite Outputs
When opening in the Vega Online Editor, remember to delete the raw path url before /data/. Examples:

Github pages:
• “data”: {“url”: “https://raw.githubusercontent.com/vega/vega/refs/heads/main/docs/data/cars.json”}

For online editor:
• “data”: {“url”: “data/cars.json}

For PowerBI:
• “data”: {“name”: “dataset”}

 Reveal Vega-Lite Spec:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "Multi-series line chart with labels and interactive highlight on hover.  We also set the selection's initial value to provide a better screenshot",
  "width": 600,
  "height": 300,
  /* when opening in the Vega Online Editor, remember to delete the raw path url before /data/ */
  /* • eg: for online editor | "data": {"url": "data/stocks.csv"} */
  /* • eg: for PowerBI | "data": {"name": "dataset"} */
  "data": {"url": "data/cars.csv"},
  // transforms
  "transform": [
    {
      "aggregate": [
        {
          "op": "distinct",
          "field": "Name",
          "as": "distinct_count"
        },
        {
          "op": "count",
          "field": "__row_number__",
          "as": "normal_count"
        },
        {
          "op": "mean",
          "field": "Horsepower",
          "as": "mean_calc"
        },
        {
          "op": "median",
          "field": "Horsepower",
          "as": "median_calc"
        },
        {
          "op": "min",
          "field": "Horsepower",
          "as": "min_calc"
        },
        {
          "op": "max",
          "field": "Horsepower",
          "as": "max_calc"
        },
        {
          "op": "missing",
          "field": "Name",
          "as": "missing_calc"
        },
        {
          "op": "valid",
          "field": "Name",
          "as": "valid_calc"
        }
      ],
      "groupby": [
        "Origin"
      ]
    },
    {
      "calculate": "round(datum.mean_calc)", "as": "mean_calc"
    }
  ],
  "hconcat": [
    {
      "title": "NORMAL COUNT",
      "width": 200,
      "layer": [
        {
          "mark": {
            "type": "bar"
          }
        },
        {
          "mark": {
            "type": "text",
            "fontWeight": 900,
            "fontSize": 14,
            "dy": -10
          }
        }
      ],
      "encoding": {
        "y": {
          "field": "normal_count",
          "type": "quantitative"
        },
        "x": {
          "field": "Origin",
          "type": "nominal"
        },
        "text": {
          "field": "normal_count",
          "type": "quantitative"
        }
      }
    },
    {
      "title": "DISTINCT COUNT",
      "width": 200,
      "layer": [
        {
          "mark": {
            "type": "bar"
          }
        },
        {
          "mark": {
            "type": "text",
            "fontWeight": 900,
            "fontSize": 14,
            "dy": -10
          }
        }
      ],
      "encoding": {
        "y": {
          "field": "distinct_count",
          "type": "quantitative"
        },
        "x": {
          "field": "Origin",
          "type": "nominal"
        },
        "text": {
          "field": "distinct_count",
          "type": "quantitative"
        }
      }
    },   
    {
      "title": "MEAN",
      "width": 200,
      "layer": [
        {
          "mark": {
            "type": "bar"
          }
        },
        {
          "mark": {
            "type": "text",
            "fontWeight": 900,
            "fontSize": 14,
            "dy": -10
          }
        }
      ],
      "encoding": {
        "y": {
          "field": "mean_calc",
          "type": "quantitative"
        },
        "x": {
          "field": "Origin",
          "type": "nominal"
        },
        "text": {
          "field": "mean_calc",
          "type": "quantitative",
          "format": ".0~f"
        }
      }
    },
    {
      "title": "MEDIAN",
      "width": 200,
      "layer": [
        {
          "mark": {
            "type": "bar"
          }
        },
        {
          "mark": {
            "type": "text",
            "fontWeight": 900,
            "fontSize": 14,
            "dy": -10
          }
        }
      ],
      "encoding": {
        "y": {
          "field": "median_calc",
          "type": "quantitative"
        },
        "x": {
          "field": "Origin",
          "type": "nominal"
        },
        "text": {
          "field": "median_calc",
          "type": "quantitative",
          "format": ".0~f"
        }
      }
    },
  ],
  "config": {"view": {"stroke": null}}
}

. . .

 Build

Step 1: What is the Aggregate transformation

In simple terms, the aggregate transform alters your dataset and summarises your data table by the groups defined in the aggregate groupby property.

There are wide range of aggregate operations available. We won’t look at all of them today, but we will focus on a few key operaitons… just enough to wet our whistles, such as:

  • count
  • distinct
  • sum
  • mean
  • median

Step 2: Observe Aggregate transform behaviours

The first thing that will help us understand how Transforms work in Vega-Lite is to observe their behaviours on our dataset.

Let’s create a VL spec with no encoding, we will pay particular attention to the DATA VIEWER window in the Editor:

Single Aggregate Operation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  "data": {
    "url": "data/cars.json"
  },
  // transform array
  "transform": [
    {
      // aggregation propery
      "aggregate": [
        {
          "op": "count",            // the aggregate operation
          "field": "Name",          // the field we are aggregating
          "as": "standard_count"    // the name of our aggregate 'measure'
        }
      ],
      "groupby": []                 // the groupby property (currently empty)
    }
  ],
	"title": "Aggregates",
	"layer": [
	{
		"mark": {
		"type": "bar"
		}
	}
	]
}

In the Data Viewer, you will see the

standard_count
406

Aggregates

As we’ve used the aggregate’s count operation and we aren’t grouping any fields, we return the result 406, as there are 406 rows in our dataset.

Taking it a step further, we can actually take this a step further an introduce multiple aggregations in a single dataset.


Multiple Aggregate Operations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

{
  "data": {
    "url": "data/cars.json"
  },
  // transform array
  "transform": [
    {
      // aggregation propery
      "aggregate": [
        {
          "op": "count",                // standard count Op
          "field": "Name",              // [Name] field
          "as": "standard_count"        // aggregate 1: count
        },
		{
          "op": "distinct",             // distinct count Op
          "field": "Name",              // [Name] field
          "as": "distinct_count"        // aggregate 2: distinct count
        }
      ],
      "groupby": []                     // no groupby (yet)
    }
  ],
	"title": "Aggregates",
	"layer": [
	{
		"mark": {
		"type": "bar"
		}
	}
	]
}

In our Data Viewer, we now see the following results:

standard_countdistinct_count
406311

Let’s go even further, and add both mean and median operations:

1
2
3
4
5
6
7
8
9
10
11
12
"transform": [
    {
        "op": "mean",
        "field": "Horsepower",
        "as": "mean_horsepower"
    },
    {
        "op": "median",
        "field": "Horsepower",
        "as": "median_horsepower"
    }
]
standard_countdistinct_countmean_horsepowermedian_horsepower
406311105.0895


Inputting GroupBy field

Now, we want to use the groupby property to summarise our result by the Origin field:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

{
  "data": {
    "url": "data/cars.json"
  },
  // transform array
  "transform": [
    {
      // aggregation propery
      "aggregate": [
        {
          "op": "count",
          "field": "Name",
          "as": "standard_count"	// aggregate 1
        },
		{
          "op": "distinct",
          "field": "Name",
          "as": "distinct_count"	// aggregate 2
        },
		{
          "op": "mean",
          "field": "Horsepower",
          "as": "mean_horsepower"	// aggregate 3
        },
		{
          "op": "median",
          "field": "Horsepower",
          "as": "median_horsepower"	// aggregate 4
        }
      ],
      "groupby": ["Origin"]			//<-- group by property
    }
  ],
	"title": "Aggregates",
	"layer": [
	{
		"mark": {
		"type": "bar"
		}
	}
	]
}

And now we begin to see our data outputs taking shape: Note: each aggregate result summarises effortlessly against the [Origin] field

Originstandard_countdistinct_countmean_horsepowermedian_horsepower
USA254191119.9106
Europe736180.977
Japan795979.875


Step 3: Small Multiples

So far so good, and not a DAX measure in sight 🤓

What would be pretty cool now, is if we could separate our four aggregations into four separate charts, in the style of small multiples. It’s tantalisingly easy… follow me!

There are several ways to achieve the affect of small multiples using combinations of view layering and concatenations:

For ease and flexibility, I’ll showcase repeat here. But the PBIX 1 attached contains all varieties 🧙🏼‍♂️

I’ll write the before and after JSONs for comparison. The main difference is we wrap our mark and encoding block withing a spec block:

repeat and spec
1
2
3
4
{
  "repeat": [],
  "spec": { "mark":{}, "encoding":{} }
}

Pre-small multiples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{ 
  "data": {"url": "https://raw.githubusercontent.com/vega/vega/refs/heads/main/docs/data/cars.json"},
  "transform": [
    {
      "aggregate": [
        {"op": "count", "field": "Name", "as": "standard_count"},
        {"op": "distinct", "field": "Name", "as": "distinct_count"},
        {"op": "mean", "field": "Horsepower", "as": "mean_horsepower"},
        {"op": "median", "field": "Horsepower", "as": "median_horsepower"}
      ],
      "groupby": ["Origin"]
    }
  ],
   "title": {"text": "Aggregates and Facets", "anchor": "middle"},
  "width": 150,
  "mark": {"type": "bar"},
  "encoding": {
    "x": {
		"field": "Origin", 
		"type": "nominal"
	},
    "y": {
		"field": "mean_horsepower", 
		"type": "quantitative"
	}
  }
}

Post-small mulitples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

{
  "data": {"url": "data/cars.json"},
  "transform": [
	// Our four aggregate transforms
    {
      "aggregate": [
        {"op": "count", "field": "Name", "as": "standard_count"},
        {"op": "distinct", "field": "Name", "as": "distinct_count"},
        {"op": "mean", "field": "Horsepower", "as": "mean_horsepower"},
        {"op": "median", "field": "Horsepower", "as": "median_horsepower"}
      ],
      "groupby": ["Origin"]
    }
  ],
  // start the repeat operation
  "repeat": [
    "standard_count",
    "distinct_count",
    "mean_horsepower",
    "median_horsepower"
  ],
  "spacing": 10,    // the spacing between charts
  "title": {"text": "Aggregates and Facets", "anchor": "middle"},
  "spec": {                             // wrap the mark properties in spec
    "width": 95,
    "height": 85,
    "mark": "bar",
    "encoding": {
      "x": {
        "field": "Origin",              // x-axis is unchange
        "sort": {"field": "Origin"},
        "axis": {"labels": true}
      },
      "y": {
        "field": {"repeat": "repeat"},  // y-axis uses the repeat fields instead of calculated fields
        "axis": {"labels": true},
        "type": "quantitative"
      },
    }
  },
  "resolve": {"scale": {"y": "shared"}} //
}

We now get this magical result:


Step 4: Finishing touches

We are close to perfection, we just want to add a splash of colour. We’ve been here before, just a simple "color": {"field": "Origin"} property in our encoding block.


Chef’s Kiss et voilà! 🧑🏽‍🍳


. . .

  En Fin, Serafin

Thank you for staying to the end of the article… I hope you find it useful 😊. See you soon, and remember… #StayQueryous!🧙‍♂️🪄

  PBIX 💾

🔗 Repo: Github Repo PBIX Treasure Trove

"Buy Me A Coffee"

. . .


Footnotes

This post is licensed under CC BY 4.0 by the author.