Who made this? Why did they make it? How old is it? Where did it come from?
These are just a few questions the average visitor to any museum might have when viewing an artifact. Luckily for the visitor, objects in museums tend to be accompanied by helpful little tags, answering all of these questions and more. Though this is sufficient for the typical visitor viewing an artifact in a museum, what is a layman to do when faced with an unknown object he found in a field? How can that person answer the questions above without a helpful little tag?
An expert might draw upon his knowledge of similar objects; to emulate this, and to answer "who" and "when", I have created a dataset which contains images of artifacts in the Metropolitan Museum of Art and all associated data. I used this dataset to train classifiers which predict Culture and Creation Date/Date Range for images of objects.
The Metropolitan Museum of Art has recently updated its online image collection, and released around 400,000 images as OASC (Open Access Scholarly Content). The Metropolitan Museum of Art appears to have around 571,722 artifacts total which have been assigned ID numbers. Not all of these artifacts have images associated with them, and some of them have more than one image associated with them. This implies that there will be fewer than 400,000 OASC artifacts with images.
I have scraped the Met website using code from this github repository. For all image IDs in between 1 and 571,722, I access the content located at "http://www.metmuseum.org/art/collection/search/ID". If the content on that page had the OASC tag and contained an image, then I write relevant fields to a JSON file. This process takes about 3 seconds per ID, which means that fully checking all IDs will take about 20 days to complete. At the time of performing analysis on the dataset, I had queried 300,000 IDs and collected approximately 80,000 sets of object image/json data pairs.
I processed the Culture field by keeping the first set of capitalized words and ignoring both parenthetical comments and words like North, East, etc.
This resulted in the following list of cultures with more than 500 instances:
There were two different types of date field contained in my data. The first was a categorical date field; date categories and their counts are displayed below:
The artifacts for which I collected information are not evenly distributed across categories. The majority (49,925 out of 82,028) of my artifacts date from between 1600 and 1900.
The second type of date field is a description of the estimated creation date. This is entered in natural language, and usually implies either a single date or range of dates. In order to parse these dates, I used the yearrange parser from this repository.
Example fields from my dataset and parsed values are shown below:
The yearrange parser failed on some of the values in my dataset. It was not intended to handle B.C. dates, or any dates that have fewer than 3 digits in the year, so I have only considered dates where the start year is 1000 or later. This results in a start date, and end date, and a possible circa field. The circa field will exist if there is any implied uncertainty in the provided date ranges (for example, "ca.", "(?)", "probably").
I performed T-SNE on both culture and date range. I randomly selected 5000 images and 200 features from the 4096-dimensional output of the last convolutional layer of VGG-16. The results are shown below. As a future extension, I would like to make these results interactive, so that the space of images can be explored by viewing the image corresponding to any given point. The plots of the results of T-SNE on the inputs are shown below. Culture is on the left, and Date is on the right.
I performed classification on culture and date category, for all images in my dataset. I used multiple 1-vs.-all SVM classifiers trained on the output of the final convolutional layer of VGG-16. (Note that the culture and date category values are all mutually exclusive.)
My Culture SVMs achieve 0.533% accuracy on my test set, and the Date SVMs achieve 0.438% accuracy. The confusion matrices for the 17-class Culture categorization and the 9-class Date categorization are displayed below.
The values on the left are the true values for an image, and values on the top are the predicted values. If we look at the confusion matrix above, the results look promising. If we look at the "Cypriot" row, we see activation with other ancient Mediterranean cultures, like "Etruscan", "Greek", and "Roman". "British" often gets confused with other European cultures.
I trained a CNN which took in a 128x128x3 image and output a value between 1 and 17, mapping to the culture. I trained with approximately 5000 images, with at least 600 images from each culture. I used a batch size of 100, and allowed training to run for 65 epochs. The accuracy was approximately 0.4, but it may have continued to improve if I had allowed it to run longer.
I also attempted regression using the start date and end date I found by parsing the date description. As in classification, I used features from the final convolutional layer of VGG-16, and performed multivariate linear regression to predict an estimated start and end date. I calculated the average interval overlap, which was the average of the percent of overlap between the true start and end dates and the estimated start and end dates. The average interval overlap was 15.07%.
I would like to improve upon the date parser, to add handling for dates earlier than the year A.D. 1000. I would also like to do something with the "circa" field, like extending the interval by some amount based on the number of significant figures in the date. In other words, "ca. 1000" likely implies more uncertaintly than "ca. 1955", and I would like to account for this.
I would like to finish downloading all items with images and the OASC tag. This will likely be updated on the github page below in the next few days. Currently, I am only using a single image per item, even though some items have many images. It may be helpful to include all images for a given item, rather than just the primary image.
There is a lot more work which could be done on both classification and date regression.
Finally, I would like to know how well humans perform on the same task. I would like to use this data set to run a task on Amazon Mechanical Turk, to determine human accuracy and typical human confusions.
I have created a novel dataset of ~80,000 images, which contains information on culture, date category, and date interval. I achieved 53% accuracy on classification with 17 cultures, and 43% accuracy with 9 date categories. I also was able to estimate the date interval given an image with 15% average interval overlap.
The full dataset of images and json files can be downloaded here. This currently has data for 82,238 items.