Ben Lorenz-Meyer
For this project, I explored, cleaned, and visualized a dataset of 1900-1920 artists data from Tate, a collection of thousands of artworks from across the world. With what this dataset had to offer, I wanted to see the countries that artists were most commonly from to find a bit about what the Tate collection “prefers.” Cleaning the dataset with OpenRefine, visualizing with Flourish using a bar chart, and focusing on which countries the artists were from the most, I found that, by far, the country with the most artists was the UK, followed by the U.S. and other countries such as France, Italy, and Poland. Such a finding gives a pretty strong picture of what artists the Tate collection might have preferred to include, at least using this dataset.
The full dataset with over 3000 rows can be found here, but my dataset was just the first 400 rows.
This dataset contains information about artist’s names, genders, lifetimes, where they were born and died, as well as a link to information about them on the Tate website. A small but interesting quirk of the dataset is that many of the countries of birth are in the language of the country’s primary language (such as Deutschland for Germany), but still using the Latin alphabet. The dataset was mostly clean enough, but with my idea of what to analyze (which is frequency of different countries of birth in the artists), I needed to clean the data a little bit using OpenRefine.
To start, I organized the data by artist’s date of birth so that it was organized in a more natural way (rather than id). I then split the place of birth column, which was originally the city + country of birth separated by a comma, into two columns (one containing the city and one the country) in order to just get the countries by themselves. I then renamed the two columns to “CityOfBirth” and “CountryOfBirth” just so they were a bit more specific. Finally, I created a facet by null in order to remove the rows with no information on city or country of birth, and exported as a csv.
For my visualization/analysis, a simple bar chart of count was the best way to visualize frequency of countries. I chose Flourish for this as it is very easy to make such a chart using a .csv file. For such chart, I set my labels to be the country of birth, and my values to be the year of birth column (which is arbitrary, but I needed a numerical column for this). I then made the aggregation mode based on count, and the sort mode based on such count so that the most common countries were at the top and the least common at the bottom (the original was all over the place and it was hard to tell relative frequency). I then just added reasonable chart and axis titles, and the chart was done. I kept the design super simple (just changing the font of the title and leaving the rest) as I was going for a clean look and the defaults were very good for that.
In terms of website design, I made my subdomain just my original “lorenzmeyerb…” and added “midterm” to the start. I then just installed WordPress for my website format. I decided that the default WordPress theme, “Twenty Twenty-Five,” was super clean which was the look I was going for. All I did was just change the header at the top left of the page from “my blog” to “my midterm.”
An embed of the bar chart is shown below:
This choice of data visualization really exemplifies how the vast majority of artists in the Tate collection (at least born between 1900 and 1920) were from Europe and the U.S, which makes the Tate collection seem pretty Euro-centrist because of how dominant such areas of the globe are represented with such analysis. However, since the specific dataset I used only had 400 rows and the full dataset has over 3000, there is a chance that rows lower down in the full dataset have more artists from outside Europe or the U.S, so this analysis could be cherry-picking to make the selection seem more Euro-centric on accident.
This analysis is a great example of the Digital Arts and Humanities process, where the history and some information of artists and artworks in a collection is represented visually, in this case, by a bar chart.