Introduction

Clustering offers a “birds-eye view” of the data in the collections. Instead of finding individual objects through Searching & Filtering, you instead see the aggregated counts for each field (material, technique, style, etc) of the object records matching on your query (which can also have most of the same parameters available to search & filter, when applicable). This is useful for exploring the data, giving a sense of the scale of different aspects of the collection, and is particularly useful for data visualisation.

Clustering can return results in full across all the authority controlled fields, or, if you already have in mind a specific field you are interested in (for example, wanting to find out which materials were most in use in Venice in the 18th century, according to our object records), you can specify that particular field to return only those counts.

The available fields for clustering are:

* Material - Which materials are used in making an object.
* Technique - Which technique are used in an object's production.
* Place - Where was an object made (or where does it depict, or is associated with).
* Person - Who made the object, or is depicted on it, or is associated with an object.
* Maker - Who made an object
* Depicts - Who or what is depicted on an object
* Depicts (Person) - Who is depicted on an object
* Associated (Person) - Who is associated with an object
* Category - What category is an object asigned to.
* Collection - Which collection in the museum does an object belong to.
* Event - What event is depicted or associated with an object.
* Gallery - Which gallery is the object on display in.
* Object Type - What type of object is this.
* Organisation - Which organisation is created, is depicted, or is associated with an object.
* Style - What artistic style is the object considered to have.
* Accession Year - What year was the object accessioned to the museum collection.

There are two ways to request individual facets, one also returns the meta information about your search results, the other just returns the relevant term counts. The examples below show both in use.

Note

It should be noted the writer of this API documentation is a particular enthusiast for treemap data visualisations (ideally interactive - currently only appearing as fixed images below), but is willing to be convinced there are other ways to visualise cluster data, which may be even be better. Pull Requests either to improve treemaps or replace them with something better welcome.

Note

Please note whilst data visualisations can be an excellent way to see trends in data and spark new hypothesis for research, caution must be taken in reading too much into the figures, the data the visualisations are built on will have absences and biases and will forever remain incomplete.

Cluster Specific API Parameters

As mentioned you can use all the normal parameters for a search or filter query to reduce your results down to a subset you are interested in. There is an alternative parameter available for clustering available as well:

  • cluster_size - Use this instead of page_size to specify the number of results (upto a max of 100). If not set, the default size of 20 is used.

Note

The following code is used to create the treemap visualisation using the Vega data visualisation library. Reveal if you want to see the details, send a pull request if you know a better way of doing this, and especially if you can think of something to encode to show a colour range, or to make sure the text label fits within the box, or to show a representative image in the box.

from IPython.display import display

def Vega(spec):
    bundle = {}
    bundle['application/vnd.vega.v5+json'] = spec
    display(bundle, raw=True)

def treemap(clusters, cluster_name, colour = "blue"):
    
  clusters_json = [{"id": index+1, "name": [x["value"], "%d objects" % x["count"]], "parent": '0', "value": x["count"]} for index, x in enumerate(clusters)]
  clusters_json.insert(0, {"id": 0, "value": 0, "name": cluster_name})
    
  Vega({
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "description": "Hierarchical Data Layout",
  "width": 1200,
  "height": 800,
  "padding": 2.5,
  "autosize": "none",
  "data": [
    {
      "name": "tree",
      "values": clusters_json,
      "transform": [
        {
          "type": "stratify",
          "key": "id",
          "parentKey": "parent"
        },
        {
          "type": "treemap",
          "field": "value",
          "sort": {"field": "value", "order": "descending"},
          "round": True,
          "size": [{"signal": "width"}, {"signal": "height"}]
        }
      ]
    },
    {
      "name": "nodes",
      "source": "tree",
      "transform": [{ "type": "filter", "expr": "datum.children" }]
    },
    {
      "name": "leaves",
      "source": "tree",
      "transform": [{ "type": "filter", "expr": "datum.parent == 0" }]
    }
  ],

  "scales": [
    {
      "name": "color",
      "type": "ordinal",
      "domain": {"data": "nodes", "field": "value"},
      "range": [ colour ]
    },
    {
      "name": "size",
      "type": "ordinal",
      "domain": [0, 1, 2, 3],
      "range": [256, 10, 20, 14]
    },
    {
      "name": "opacity",
      "type": "ordinal",
      "domain": [0, 1, 2, 3],
      "range": [0.15, 0.5, 0.8, 1.0]
    }
  ],

  "marks": [
    {
      "type": "rect",
      "from": {"data": "nodes"},
      "interactive": False,
      "encode": {
        "enter": {
          "fill": {"scale": "color", "field": "value"}
        },
        "update": {
          "x": {"field": "x0"},
          "y": {"field": "y0"},
          "x2": {"field": "x1"},
          "y2": {"field": "y1"}
        }
      }
    },
    {
      "type": "rect",
      "from": {"data": "leaves"},
      "encode": {
        "enter": {
          "stroke": {"value": "#fff"}
        },
        "update": {
          "x": {"field": "x0"},
          "y": {"field": "y0"},
          "x2": {"field": "x1"},
          "y2": {"field": "y1"},
          "fill": {"value": "transparent"},
          "href": {"value": "https://collections.vam.ac.uk/"}
        },
        "hover": {
          "fill": {"value": "green"}
        }
      }
    },
    {
      "type": "text",
      "from": {"data": "leaves"},
      "interactive": False,
      "encode": {
        "enter": {
          "font": {"value": "Helvetica Neue, Arial"},
          "align": {"value": "center"},
          "baseline": {"value": "middle"},
          "fill": {"value": "#000"},
          "text": {"field": "name"},
          "fontSize": {"scale": "size", "field": "depth"},
          "fillOpacity": {"scale": "opacity", "field": "depth"}
        },
        "update": {
          "x": {"signal": "0.5 * (datum.x0 + datum.x1)"},
          "y": {"signal": "0.5 * (datum.y0 + datum.y1)"}
        }
      }
    }
  ]
}
)

Field Clustering

Materials

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/material/search?cluster_size=25')
# Of course, in a real use case, error handling in case of no results should be added here instead of passing results directly to the treemap function
treemap(req.json(), "Materials", "#8bcf89")
../_images/clustering_8_0.png

Techniques

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/technique/search?cluster_size=25')
treemap(req.json(), "Top 25 Techniques", "#f58518")
../_images/clustering_10_0.png

Person

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/person/search?cluster_size=25')
treemap(req.json(), "Top 25 Person", "#83bcb6")
../_images/clustering_12_0.png

Note

Removing the very large ‘Unknown’ group is left as an exercise for the reader

Maker

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/maker/search?cluster_size=25')
treemap(req.json(), "Top 25 Makers", "#83bcb6")
../_images/clustering_15_0.png

Depicts

Not a commonly used field, hence the low numbers.

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/depicts/search?cluster_size=25')
treemap(req.json(), "Top 25 Depictions", "#83bab6")
../_images/clustering_17_0.png

Depicts (Person)

Not a commonly used field, hence the low numbers.

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/depicts_actor/search?cluster_size=25')
treemap(req.json(), "Top 25 Actors (Person, People or Organisation)", "#83bcb6")
../_images/clustering_19_0.png

Associated Actor (Person, People or Organisation)

Not a commonly used field, hence the low numbers.

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/associated_actor/search?cluster_size=25')
treemap(req.json(), "Top 25 Associated Actors (Person, People, Organisation)", "#83bcb6")
../_images/clustering_21_0.png

Category

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/category/search?cluster_size=25')
treemap(req.json(), "Top 25 Categories", "#eebede")
../_images/clustering_23_0.png

Style

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/style/search?cluster_size=25')
treemap(req.json(), "Top 25 Styles", "#bab0ac")
../_images/clustering_25_0.png

Accessioned

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/accession_year/search?cluster_size=25')
treemap(req.json(), "Top 25 Accession Years", "#ff9d98")
../_images/clustering_27_0.png

Place

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/place/search?cluster_size=25')
treemap(req.json(), "Top 25 Places", "#908cc2")
../_images/clustering_29_0.png

Object Type

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/object_type/search?cluster_size=25')
treemap(req.json(), "Top 25 Object Types", "#96cfcf")
../_images/clustering_31_0.png

Collection

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/collection/search')
treemap(req.json(), "Collections", "#77bdc7")
../_images/clustering_33_0.png

Event

Not a commonly used field, hence the low numbers.

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/event/search?cluster_size=25')
treemap(req.json(), "Top 25 Events", "#f57724")
../_images/clustering_35_0.png

Organisation

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/organisation/search?cluster_size=25')
treemap(req.json(), "Top 25 Organisations", "#1d8640")
../_images/clustering_37_0.png

Cluster Exploration Examples

Materials used in Venice in the 18th century

To return to the example at the start, let’s see which materials are in use in objects from (or depicting or association with) Venice between 1700 and 1800

Note

At present the id_place parameter cannot be restricted to return objects “made in this place” only (excluding depictions of the place, or associated with the place). This is a feature we will consider adding in future versions of the API, or will be implemented via a model such as Linked Art

import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/material/search?id_place=x29237&year_made_from=1700&year_made_to=1800')
treemap(req.json(), "Materials", "#aaccdd")
../_images/clustering_40_0.png

Objects containing plastic accessioned in the C20th

import requests
import altair as alt
import pandas as pd
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/accession_year/search?id_material=AAT14570&year_accessioned_from=1900&year_accessioned_to=1999&cluster_size=100')
materials_df = pd.DataFrame(req.json())

bars = alt.Chart(materials_df).mark_bar().encode(
    x='count:Q',
    y="value:O"
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3
).encode(
    text='count:Q'
)

(bars + text).properties(height=900, title="Objects containing plastic accessioned to the V&A in the C20th")

See more worked examples in the Data Exploration site.

pip install vega_datasets
Collecting vega_datasets
  Downloading vega_datasets-0.9.0-py3-none-any.whl (210 kB)
     |████████████████████████████████| 210 kB 5.3 MB/s eta 0:00:01
?25hRequirement already satisfied: pandas in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from vega_datasets) (1.1.2)
Requirement already satisfied: numpy>=1.15.4 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from pandas->vega_datasets) (1.18.5)
Requirement already satisfied: pytz>=2017.2 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from pandas->vega_datasets) (2020.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from pandas->vega_datasets) (2.8.1)
Requirement already satisfied: six>=1.5 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas->vega_datasets) (1.14.0)
Installing collected packages: vega-datasets
Successfully installed vega-datasets-0.9.0
Note: you may need to restart the kernel to use updated packages.