Post

S2E5 - Filter Methods

Demystify Vega-Lite Examples in this step-by-step rebuild 🕊️🧙🏼‍♂️✨

S2E5 - Filter Methods

💌 PBIX file available at the end of the article 1 Enjoy!


 Intro

Welcome to Season 2 of the Vega-Lite walkthrough series. In this season, we will go step-by-step through many of the Vega-Lite Examples - learning loads of techniques and tips along the way. Enjoy and welcome! 🕊️



 Data

All data used in this series can be found in the Vega github repo:  
  Official Vega & Vega-Lite Data Source Repo  

Today’s dataset is provided by yours truly (see PowerQuery custom function fn_generate_randomised_data)

. . .

 Concept

In this episode we will focus on filtering methods.

Important Note: Viewing Vega/Vega-Lite Outputs
When opening in the Vega Online Editor, remember to delete the raw path url before /data/. Examples:

Github pages:
• “data”: {“url”: “https://raw.githubusercontent.com/vega/vega/refs/heads/main/docs/data/stocks.csv”}

For online editor:
• “data”: {“url”: “data/stocks.csv}

For PowerBI:
• “data”: {“name”: “dataset”}


. . .

 Build

What we are building here is the strong foundations of understanding the many ways in which we can filter marks and datasets directly in our Vega-Lite spec.

The concept is fairly simple, and we can reuse and reapply the following methods anywhere and everywhere within your Deneb/Vega/Vega-Lite visuals.

Step 1: How do we filter?

Filters can be applied with a the Filter Transform block. You’ll be familiar with this from our previous tutorials. Here is our example below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

{
  "data": {"name": "dataset"},

  /* transform block */
  //-----------------
  "transform": [{
      "filter": {
        "field": "year_num", //-- field to filter on
        "equal": 2025 //-- numerical value
      }}],
  //-----------------
  
  "layer": [{"mark": {...}}],
  "encoding": {...}
}

For a wordy explanation, we could say:

The Filter Transform must have a filter property. This property is described as the predicate and it determines the manner or method in which we perform the filter operation. A predicate is a logical condition that returns a Boolean value (true or false).

In other words:

A predicate is a test that says whether each row of data should be kept or removed.

So we can indeed think of it like this:
Keep only the rows where this condition is true
Keep only the rows of data where the values in the [category] field equal “Laptops”

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "data": {"name": "dataset"},
  /* transform block */
  //-----------------
  "transform": [{               // transform 
      "filter": {               //-- filter operation
        "field": "category",    //-- field to filter on
        "equal": "Laptops"      //-- predicate
      }}],
  //-----------------
  "layer": [{"mark": {...}}],
  "encoding": {...}
}


Step 2: Filter Transformation Types

Predicate Types

There are four types of predicate:

  1. Field Predicate
  2. Vega Expression
  3. Selection Predicate
  4. A logical composition (a combination of the above with ( and , or , not ) )

Field Predicates:

Observe the table below. We have the field predicate and their descriptions:

predicatedescription
equalequal to
ltless than
lteless than or equal to
gtgreater than
gtegreater than or equal to
rangean array between minimum and maximum value
oneOfbehaves like the IN operator


example with and & gte & lte:
1
2
3
4
5
6
7
"transform": [{
   "filter": { 
    "and": [
	  { "field": "year_num", "gte": 2026 },
      { "field": "year_num", "lte": 2027 }
	 ]
  }}]


example with oneOf:
1
2
3
4
5
6
7
8
9
"transform": [{
   "filter": {
     "field": "subcategory",
     "oneOf": [
       "Smartphones",
       "Mini PCs",
       "Gaming PCs"
     ]
  }}]


Vega-Expressions

An alternative method, and one which I find preferabble for both ease and aesthetic reasons, is to use Vega Expressions. They feel more like your DAX, M-code, SQL-type statements, albeit with a little JSON syntax sprinkles.

Field Predicate vs Vega-Expression

Let’s make a comparison between the two filter methods…

Field Predicate:
1
2
3
4
5
6
7
8
// predicate method
"transform": [{
"filter": { 
  "and": [
    { "field": "year_num", "gte": 2026 },
    { "field": "year_num", "lte": 2027 }
  ]
}}]

— VS —

Vega-Expression:
1
2
3
4
5
6
// vega-expression method
"transform": [
    {
      "filter": "datum.year_num >= 2026 & datum.year_num <= 2027"
    }
  ]

What do you think? There is a slickness to the Vega-Expression method that is most satisying, isn’t there? 🥲


Step 3: Operators

You will notice that Vega Expressions allow us to use Operators. Operators are a kind of shorthand and make complex code easier to read and write.

🔧 Logical Operators (for combining conditions)

OperatorMeaningExample
&&ANDa > 5 && b < 10
| |ORa < 5 | | b == 20
!NOT!(a == 'PCs')


🔍 Comparison Operators (for evaluating values)

OperatorMeaningExample
==Equalsdatum.type == 'A'
!=Not equaldatum.status != 'closed'
>Greater thandatum.value > 100
>=Greater than or equaldatum.score >= 80
<Less thandatum.age < 18
<=Less than or equaldatum.size <= 50


🧮 Arithmetic Operators (for math in expressions)

OperatorMeaningExample
+Additiondatum.x + datum.y
-Subtractiondatum.sales - datum.returns
*Multiplicationdatum.price * 1.1
/Divisiondatum.total / datum.count
%Modulo (remainder)datum.id % 2 == 0


🧠 Ternary Operator (if-else shorthand)

SyntaxMeaning
condition ? valueIfTrue : valueIfFalseExample: datum.score > 50 ? 'Pass' : 'Fail'


Applying this to real-world examples, we can see how easily these can be applied:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

  "transform": [

    // multiplication (*)
    {"calculate": "(datum.price * datum.quantity)", "as": "fx_multiply"},

    // division (/)
    {"calculate": "(datum.price / datum.quantity)", "as": "fx_divide"},

    // modulo (%)
    {"calculate": "(datum.price % datum.quantity)", "as": "fx_modulo"},

    // greater than (>)
    {"calculate": "(datum.fx_modulo >1)", "as": "fx_greater_than"},

    // equals (==)
    {"calculate": "(datum.category_name == 'PCs')", "as": "fx_equals"},

    // not equal (!=)
    {"calculate": "(datum.category_name != 'PCs')", "as": "fx_not_equals"},

    // OR (||)
    {"calculate": "(datum.category_name == 'PCs' || datum.category_name == 'Laptops')", "as": "fx_or"},

    // IF-ELSE - AND (&&)
    {"calculate": "(datum.price >= 1000 && datum.quantity > 2 ? 'pricey' : null)", "as": "fx_condition"}

  ]

Operators


. . .

  En Fin, Serafin

Thank you for staying to the end of the article… I hope you find it useful 😊. See you soon, and remember… #StayQueryous!🧙‍♂️🪄

  PBIX 💾

🔗 Repo: Github Repo PBIX Treasure Trove

"Buy Me A Coffee"

. . .


Footnotes

This post is licensed under CC BY 4.0 by the author.