Introduction¶
Clustering offers a “birds-eye view” of the data in the collections. Instead of finding individual objects through Searching & Filtering, you instead see the aggregated counts for each field (material, technique, style, etc) of the object records matching on your query (which can also have most of the same parameters available to search & filter, when applicable). This is useful for exploring the data, giving a sense of the scale of different aspects of the collection, and is particularly useful for data visualisation.
Clustering can return results in full across all the authority controlled fields, or, if you already have in mind a specific field you are interested in (for example, wanting to find out which materials were most in use in Venice in the 18th century, according to our object records), you can specify that particular field to return only those counts.
The available fields for clustering are:
* Material - Which materials are used in making an object.
* Technique - Which technique are used in an object's production.
* Place - Where was an object made (or where does it depict, or is associated with).
* Person - Who made the object, or is depicted on it, or is associated with an object.
* Maker - Who made an object
* Depicts - Who or what is depicted on an object
* Depicts (Person) - Who is depicted on an object
* Associated (Person) - Who is associated with an object
* Category - What category is an object asigned to.
* Collection - Which collection in the museum does an object belong to.
* Event - What event is depicted or associated with an object.
* Gallery - Which gallery is the object on display in.
* Object Type - What type of object is this.
* Organisation - Which organisation is created, is depicted, or is associated with an object.
* Style - What artistic style is the object considered to have.
* Accession Year - What year was the object accessioned to the museum collection.
There are two ways to request individual facets, one also returns the meta information about your search results, the other just returns the relevant term counts. The examples below show both in use.
Note
It should be noted the writer of this API documentation is a particular enthusiast for treemap data visualisations (ideally interactive - currently only appearing as fixed images below), but is willing to be convinced there are other ways to visualise cluster data, which may be even be better. Pull Requests either to improve treemaps or replace them with something better welcome.
Note
Please note whilst data visualisations can be an excellent way to see trends in data and spark new hypothesis for research, caution must be taken in reading too much into the figures, the data the visualisations are built on will have absences and biases and will forever remain incomplete.
Cluster Specific API Parameters¶
As mentioned you can use all the normal parameters for a search or filter query to reduce your results down to a subset you are interested in. There is an alternative parameter available for clustering available as well:
cluster_size - Use this instead of page_size to specify the number of results (upto a max of 100). If not set, the default size of 20 is used.
Note
The following code is used to create the treemap visualisation using the Vega data visualisation library. Reveal if you want to see the details, send a pull request if you know a better way of doing this, and especially if you can think of something to encode to show a colour range, or to make sure the text label fits within the box, or to show a representative image in the box.
from IPython.display import display
def Vega(spec):
bundle = {}
bundle['application/vnd.vega.v5+json'] = spec
display(bundle, raw=True)
def treemap(clusters, cluster_name, colour = "blue"):
clusters_json = [{"id": index+1, "name": [x["value"], "%d objects" % x["count"]], "parent": '0', "value": x["count"]} for index, x in enumerate(clusters)]
clusters_json.insert(0, {"id": 0, "value": 0, "name": cluster_name})
Vega({
"$schema": "https://vega.github.io/schema/vega/v5.json",
"description": "Hierarchical Data Layout",
"width": 1200,
"height": 800,
"padding": 2.5,
"autosize": "none",
"data": [
{
"name": "tree",
"values": clusters_json,
"transform": [
{
"type": "stratify",
"key": "id",
"parentKey": "parent"
},
{
"type": "treemap",
"field": "value",
"sort": {"field": "value", "order": "descending"},
"round": True,
"size": [{"signal": "width"}, {"signal": "height"}]
}
]
},
{
"name": "nodes",
"source": "tree",
"transform": [{ "type": "filter", "expr": "datum.children" }]
},
{
"name": "leaves",
"source": "tree",
"transform": [{ "type": "filter", "expr": "datum.parent == 0" }]
}
],
"scales": [
{
"name": "color",
"type": "ordinal",
"domain": {"data": "nodes", "field": "value"},
"range": [ colour ]
},
{
"name": "size",
"type": "ordinal",
"domain": [0, 1, 2, 3],
"range": [256, 10, 20, 14]
},
{
"name": "opacity",
"type": "ordinal",
"domain": [0, 1, 2, 3],
"range": [0.15, 0.5, 0.8, 1.0]
}
],
"marks": [
{
"type": "rect",
"from": {"data": "nodes"},
"interactive": False,
"encode": {
"enter": {
"fill": {"scale": "color", "field": "value"}
},
"update": {
"x": {"field": "x0"},
"y": {"field": "y0"},
"x2": {"field": "x1"},
"y2": {"field": "y1"}
}
}
},
{
"type": "rect",
"from": {"data": "leaves"},
"encode": {
"enter": {
"stroke": {"value": "#fff"}
},
"update": {
"x": {"field": "x0"},
"y": {"field": "y0"},
"x2": {"field": "x1"},
"y2": {"field": "y1"},
"fill": {"value": "transparent"},
"href": {"value": "https://collections.vam.ac.uk/"}
},
"hover": {
"fill": {"value": "green"}
}
}
},
{
"type": "text",
"from": {"data": "leaves"},
"interactive": False,
"encode": {
"enter": {
"font": {"value": "Helvetica Neue, Arial"},
"align": {"value": "center"},
"baseline": {"value": "middle"},
"fill": {"value": "#000"},
"text": {"field": "name"},
"fontSize": {"scale": "size", "field": "depth"},
"fillOpacity": {"scale": "opacity", "field": "depth"}
},
"update": {
"x": {"signal": "0.5 * (datum.x0 + datum.x1)"},
"y": {"signal": "0.5 * (datum.y0 + datum.y1)"}
}
}
}
]
}
)
Field Clustering¶
Materials¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/material/search?cluster_size=25')
# Of course, in a real use case, error handling in case of no results should be added here instead of passing results directly to the treemap function
treemap(req.json(), "Materials", "#8bcf89")

Techniques¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/technique/search?cluster_size=25')
treemap(req.json(), "Top 25 Techniques", "#f58518")

Person¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/person/search?cluster_size=25')
treemap(req.json(), "Top 25 Person", "#83bcb6")

Note
Removing the very large ‘Unknown’ group is left as an exercise for the reader
Maker¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/maker/search?cluster_size=25')
treemap(req.json(), "Top 25 Makers", "#83bcb6")

Depicts¶
Not a commonly used field, hence the low numbers.
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/depicts/search?cluster_size=25')
treemap(req.json(), "Top 25 Depictions", "#83bab6")

Depicts (Person)¶
Not a commonly used field, hence the low numbers.
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/depicts_actor/search?cluster_size=25')
treemap(req.json(), "Top 25 Actors (Person, People or Organisation)", "#83bcb6")

Associated Actor (Person, People or Organisation)¶
Not a commonly used field, hence the low numbers.
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/associated_actor/search?cluster_size=25')
treemap(req.json(), "Top 25 Associated Actors (Person, People, Organisation)", "#83bcb6")

Category¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/category/search?cluster_size=25')
treemap(req.json(), "Top 25 Categories", "#eebede")

Style¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/style/search?cluster_size=25')
treemap(req.json(), "Top 25 Styles", "#bab0ac")

Accessioned¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/accession_year/search?cluster_size=25')
treemap(req.json(), "Top 25 Accession Years", "#ff9d98")

Place¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/place/search?cluster_size=25')
treemap(req.json(), "Top 25 Places", "#908cc2")

Object Type¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/object_type/search?cluster_size=25')
treemap(req.json(), "Top 25 Object Types", "#96cfcf")

Collection¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/collection/search')
treemap(req.json(), "Collections", "#77bdc7")

Event¶
Not a commonly used field, hence the low numbers.
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/event/search?cluster_size=25')
treemap(req.json(), "Top 25 Events", "#f57724")

Organisation¶
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/organisation/search?cluster_size=25')
treemap(req.json(), "Top 25 Organisations", "#1d8640")

Cluster Exploration Examples¶
Materials used in Venice in the 18th century¶
To return to the example at the start, let’s see which materials are in use in objects from (or depicting or association with) Venice between 1700 and 1800
Note
At present the id_place parameter cannot be restricted to return objects “made in this place” only (excluding depictions of the place, or associated with the place). This is a feature we will consider adding in future versions of the API, or will be implemented via a model such as Linked Art
import requests
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/material/search?id_place=x29237&year_made_from=1700&year_made_to=1800')
treemap(req.json(), "Materials", "#aaccdd")

Objects containing plastic accessioned in the C20th¶
import requests
import altair as alt
import pandas as pd
req = requests.get('https://api.vam.ac.uk/v2/objects/clusters/accession_year/search?id_material=AAT14570&year_accessioned_from=1900&year_accessioned_to=1999&cluster_size=100')
materials_df = pd.DataFrame(req.json())
bars = alt.Chart(materials_df).mark_bar().encode(
x='count:Q',
y="value:O"
)
text = bars.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text='count:Q'
)
(bars + text).properties(height=900, title="Objects containing plastic accessioned to the V&A in the C20th")
See more worked examples in the Data Exploration site.
pip install vega_datasets
Collecting vega_datasets
Downloading vega_datasets-0.9.0-py3-none-any.whl (210 kB)
|████████████████████████████████| 210 kB 5.3 MB/s eta 0:00:01
?25hRequirement already satisfied: pandas in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from vega_datasets) (1.1.2)
Requirement already satisfied: numpy>=1.15.4 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from pandas->vega_datasets) (1.18.5)
Requirement already satisfied: pytz>=2017.2 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from pandas->vega_datasets) (2020.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from pandas->vega_datasets) (2.8.1)
Requirement already satisfied: six>=1.5 in /home/richard/.virtualenvs/jupyterlab/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas->vega_datasets) (1.14.0)
Installing collected packages: vega-datasets
Successfully installed vega-datasets-0.9.0
Note: you may need to restart the kernel to use updated packages.