A page from the Domesday Book (CE 1089), courtesy of Anna Powell-Smith’s Open Domesday project. |
Before 6:30pm on Tuesday, 2/27, you should submit to Canvas:
pdf
or docx
format, answering all the questions that are tagged with , and which are summarized in the activity deliverables sectionThis activity will prompt you to think about how digital archives – the fodder for lots of humanistic geospatial research – are constructed. In general, you will:
In his essay on the “stuff” of archives, Martin Manalansan observes at least three functions (2013:94) of an archive:
In what ways do these functions of archives – which overlap with one another - translate in digital terms?
In an essay reflecting on digital archives, Jake Hodder and David Beckingham (2022:1300-1301) argue that the search tools we use today are not simply venues for looking at digital (and digitized) materials, but that they structure how you search and even inform how you think.
Read the excerpt below:
If you have used a digital archive – or platforms like Google Books or JSTOR – you will be familiar with the search box that invites us to enter our ‘key terms’. It has the look and feel of a finding aid that we might use to identify a call number in a physical archive. But, as Ted Underwood (2014) argues, the underlying technology and philosophical principles are vastly different. The search bar does not help us navigate the arrangement of the archive; it allows us to circumvent it. Digital platforms privilege searching over browsing, and that searching has more in common with data mining than document retrieval. The more precise the phrasing, the more efficiently digital search can personalise our results. This is part of the nature of computer search or ‘information retrieval – it is very effective at identifying exact terms and can do so across millions of data points in seconds.
When we receive our search results, fragments of historical information are recombined from multiple collections and places with little fidelity to original order or provenance. As Sassoon (cited in Sternfeld, 2011:565) notes, digital archives return results as ‘a databank of orphans which have been removed from their transactional origins and evidence of authorial intent’. Digital archives, then, offer us an unprecedented means to quickly find exactly what we were already looking for. But rarely do we know what we are looking for, even when we think we do. And even more rarely do we know the precise, historical wording that would find it. So, we search by trial and error. We enter a term as a proxy for a broader theme. We refine it. We search again. If we are lucky, we might be able to tie our research question to a distinct and historically stable watchword that reveals dozens of new sources.
This process of recombination, as Hodder and Beckingham call it, yields an entirely new kind of archive. When you’re doing any kind of research on/with the internet, but especially historically focused research, you position yourself within and in between these hybrid archives of new and old.
Through recombination, you’ll come across a variety of sources, which can mostly be sorted into two categories: primary and secondary. Per this Tufts research guide, they can be distinguished as such:
When you search through these “digital archives,” how do you do it? Where do you go? What kinds of terms do you use? What kinds of sources are you seeking?
Here are a few tips for searching these often unruly digital archives.
Develop search terms for keywords and subjects. A set of foundational keywords and terms should guide any search you do. They should be firm enough to define your research topic, but flexible enough to transfer across different search databases. Depending on the database, you might want to search by subject heading or keyword:
Keywords | Subjects | |
---|---|---|
Definition | natural language words describing your topic - good to start with | pre-defined “controlled vocabulary” words used to describe the content of each item (book, journal article) in a database |
Flexibility | more flexible to search by - can combine together in many ways | less flexible to search by - need to know the exact controlled vocabulary term |
Search Scope | database looks for keywords anywhere in the record - not necessarily connected together | database looks for subjects only in the subject heading or descriptor field, where the most relevant words appear |
Result Precision | may yield too many or too few results | may yield many irrelevant results |
Refinement | if too many results - also uses subheadings to focus on one aspect of the broader subject | results usually very relevant to the topic |
See below for a few example topics, using the simple template provided by Tufts’ Hirsh Library. How would you fill in the rows?
Rise of the Peloponnesian League | Geography of John Smiths | Reconstructing ancient climate data | Gender and the Salem witch trials | |
---|---|---|---|---|
Subjects | ??? | ??? | ??? | ??? |
Keywords | ??? | ??? | ??? | ??? |
Use search operators. Search operators help constrain and specify your search terms. For example, if you’re researching the Peloponnesian League, but not necessarily the Peloponnesian War, you could construct a search phrase that excludes results focused on the War itself. In another case, you might be researching a person whose name is quite common across history, and need to add qualifying terms to your search; e.g., you don’t want to include a ton of hits for John Smith (explorer) during your research into John Smith (abolitionist). See the table below – adapted from this guide from Hirsh Library – for a description of each kind of search operator.
Operator | Description | Example |
---|---|---|
Wildcard | A character is used within or at the end of the word to substitute for one character or no characters. | colo?r retrieves documents with the words color and colour . |
Truncator | Retrieves any number of characters (including no characters) after the word stem. | witch$ retrieves documents with the word witch , as well as witchcraft , witchy , etc. |
Boolean Operators | Retrieves searches by combining terms with the use of words AND , NOT , and OR . Use OR to gather synonyms, use NOT to eliminate, & use AND to require both (or all) factors. |
map OR atlas , peloponnesian NOT war , john smith AND abolitionist . |
Bound Phrase | Retrieves searches for an exact match of the text with the use of quotation marks around the terms. | "sanborn atlas" |
These search operators will not always be the same across all databases. For example, Google uses a unique set of operators for its search engine.
But don’t get too distracted. Speaking from experience, it’s easy to be sucked into “the archive.” The philosopher Jacques Derrida calls this impulse archive fever. He says we are “in need of archives,” that we “have a compulsive, repetitive, and nostalgic desire for the archive, an irrepressible desire to return to the origin, a homesickness, a nostalgia for the return to the most archaic place of absolute commencement” (1996:91). But I digress!
If you ever feel like you are losing track of your purpose, return to your foundational search terms. What brought you here in the first place?
Before moving on to the next section on geospatial archives, answer the questions below. Think of them as an exercise in preparation for this term’s final project.
1. Pick a humanistic question (e.g., a topic) that you could answer using geospatial methods, and write it down here. (If you’re stumped, think about the exercises we’ve completed so far. [If you’re totally stumped, just use one of the example topics listed above.]) |
2. Develop a set of subjects and search terms that would help you search for this topic. Search, refine, and search again. Write the subjects and search terms here. |
3. In a few sentences (3 should suffice), summarize what you found about your chosen topic, including where you searched and at least one compelling primary source. |
What is a “geospatial archive?” What are primary and secondary geospatial sources, and where can you find them?
To answer this question, it’s useful to go back to one of the earliest sources of geospatial data that remains easily accessible today: William the Conqueror’s Domesday Book, a survey of England and parts of Wales undertaken in C.E. 1086.
After the Norman Conquest of 1066, King William the Conqueror sent his agents “to measure and catalogue the realm he had won” (Rogers 2015:50). Dallas Rogers calls the survey a “mediating technology” that allowed the King to establish and enforce a land taxation system. It’s not that different from other tools, like urban tax atlases or modern GIS databases for real estate, which ultimately serve the administrative purpose of collecting tax.
The Domesday Book from 1086 (top); an urban atlas from 1898 (middle); and the City of Boston’s modern tax parcel viewer in ArcGIS Online (bottom). |
The Domesday Book is an example of an extremely well-preserved census, or an official count of population. In the U.S., we use censuses to tabulate demographic data as well as determine political representation in the House of Representatives.
The OpenDomesday project has not only made the original manuscript domesday folios available for free online, but they’ve also mapped them and in doing so produced a really excellent historical gazetteer. Pretty neat.
In a sense, any modern spatial data object – that is, any geojson
or shp
file – is both a primary and seconday source. They derive from some sort of primordial source document – a map, plan, image, or census – but they also serve as the primary source for any spatial data that derives from them.
Take, for example, this spatial data of Boston’s historic shoreline. Lacking a time machine, the makers of this data did not go out and survey the historic shoreline themselves, but they weren’t just making educated guesses to produce their map, either: they created it by digitally tracing, among other objects, Osgood Carleton’s 1796 plan of Boston. In turn, you could use it to make something like this digital map of additions to Boston’s shoreline between 1795 and 1995.
While there is no overarching, canonical system for geospatial data archiving, there are all sorts of spatial data libraries that you can use to search and retrieve geospatial data. We typically call these data portals or data clearinghouses.
Here are a few general-purpose portals:
And here are more that focus specifically on censuses:
4. Find a spatial dataset (e.g., a shp file, a csv file containing geographic information like lat/long/address) that could help answer the research topic you chose in question 1. If you can’t find anything, in 1-2 sentences, consider how you would go about composing your own spatial dataset to answer this topic. Where would you start? |
Let’s dive into some historical population data in ArcGIS Pro. Tufts GIS has collected lots of useful spatial datasets collected on the M: Drive
. One of them contains historic population data for China, going back to 1953.
Last week, in Activity 02, we worked with historic population data for the entire continent of Africa. However, that data was not produced by a census – it was a set of estimated values that a researcher created with a set of mathematical models.
In this activity, we’re actually using data that came from a Census. Of course, as discussed above,
In your File Explorer, navigate to…
M:\Country\China\ChinaDataCenter\ChinaHistoricalCountyCensus
… and open the document titled README
. If you double-click it, it will probably open, by default, in Microsoft Edge (🤢). You can right-click ➡️ “Open with” to open it in your web browser of choice (and if Microsoft Edge is your web browser of choice, you don’t need to tell me, that’s between you and god).
README files are a convention of metadata files. Aptly named, they typically include information that explains what’s in a given directory.
Open this README file and answer the question below.
5. What were the reference (e.g., primary) sources for the 1953 and 1964 county- and province-level spatial data boundaries? |
Let’s compare these two censuses. Follow these steps to get the necessary data into your project.
H: drive
, make a workspace for this week’s activity.M:
➡️ Country
➡️ China
➡️ ChinaDataCenter
➡️ ChinaHistoricalCountyCensus
➡️ 1953
china53
and prov53
to the project.countries
data from Natural Earth: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/downloads
folder and into a sensible location in your workspace.countries
data to your project.Once you have the data in your project, organize it:
World Topographic Map
and Hillshade
are turned on, turn them off.china53
➡️ Counties 1953
prov53
➡️ Provinces 1953
china64
➡️ Counties 1964
prov64
➡️ Provinces 1964
countries
➡️ Countries
Provinces 1953
Counties 1953
Provinces 1964
Counties 1964
Countries
Group your 1953 provinces and counties together by selecting both of the 1953 layers ➡️ Right-click ➡️ “Group”
This will allow you to toggle these layers on and off in groups.
1953
and 1964
Now that your data is organized, symbolize it. You know by now that there are a few ways to access the Symbology pane, including 1) right-clicking a layer ➡️ “Symbology,” and 2) clicking on the “Feature Layer” tab in the ribbon at the top of ArcGIS Pro while the layer you want to symbolize is selected.
First, symbolize the Countries
layer to be unintrusive, so that it clearly scans as a background (remember figure-ground?). I’d recommend a gray or tan, with boundaries of 0.5
width in the same color as the fill, but darker.
To get that “just right” color, you’ll probably have to click into the “Color Properties” tab of the “Color” dropdown.
Province
layers so that the color is transparent (e.g., “No color”) and the outline is a dark gray with an width of 0.5
.Counties 1953
layer, following these parameters:
A53002
<percentage of total>
Choose a sensible color ramp for this numerical, quantitative data (e.g., sequential, classed, single-hue)
| :-| Click the gear next to “Color scheme” and select “Apply to fill and outline.” This will automatically apply the fill color for each feature in your data to the outline as well, de-emphasizing the internal county boundaries and giving the data a smoother appearance. This also allows us to cartographically distinguish the provinces from the counties.
A64002
. However, instead of choosing the Natural Breaks method, choose Manual Interval and set the class breaks to the exact same breaks as your 1953 layer. Your maps need to share class breaks in order to be comparable across time, and this is one fairly straightforward way to generate them.If you’ve grouped layers and symbolized everything properly, you should see something like this when you toggle the 1953
layer on and off (this gif is just displayed in natural breaks - yours will probably look different):
We’ve got our data displayed nicely – you could even make a new “Map” tab, like we did in Activity 02, to make a side-by-side comparison of the two years – but what do the fields we’re mapping, A53002
and A64002
, actually mean? And what else is in the attribute tables of these two datasets?
In the Catalog pane of ArcGIS Pro, navigate back to M:
➡️ Country
➡️ China
➡️ ChinaDataCenter
➡️ ChinaHistoricalCountyCensus
.
You should see an Excel file (.xls
) titled Variables_index
. This file contains metadata for this historic demographic data.
Right-click the Variables_index
file ➡️ “Show in File Explorer.” Then, open the file in Microsoft Excel and answer the questions below using the metadata.
6. What variable do the field names A53002 and A64002 describe? |
7. What other variable could you use to compare change between 1953 and 1964? |
Before moving to the next section, turn labels on for both Province
layers:
$feature.EPROV
(that’s “English province name”)The default labels are pretty big, so after configuring the labels, I clicked into the Symbol tab of the label properties, unfolded “Appearance,” and reduced the font size from 10 pt
to 8 pt
.
Take some time to observe this data, toggling it off and comparing the layers to one another, before answering the question below.
8. In no more than 3 sentences, describe what you can learn from comparing the geographical changes over time as represented in this data; for example, you could describe provincial boundary changes, broader changes in the geopolitical landscape, or regional shifts in population. If you’re feeling ambitious, you could even read up on the Paracel Islands, Tibet/Xizang, or the hukou system – all of which experienced radical changes between the years represented in this data, 1953-1964 – and consider how they are (or are not) represented in this data and its attribute table. But, again: no more than 3 sentences. |
9. Take a screenshot of your ArcGIS Pro interface, showing the symbolized layers. |
Before 6:30pm on Tuesday, 2/27, you should submit to Canvas:
pdf
or docx
format, answering all the questions that are tagged with , and which are summarized below:1. Pick a humanistic question (e.g., a topic) that you could answer using geospatial methods, and write it down here. (If you’re stumped, think about the exercises we’ve completed so far. [If you’re totally stumped, just use one of the example topics listed above.]) |
2. Develop a set of subjects and search terms that would help you search for this topic. Search, refine, and search again. Write the subjects and search terms here. |
3. In a few sentences (3 should suffice), summarize what you found about your chosen topic, including where you searched and at least one compelling primary source. |
4. Find a spatial dataset (e.g., a shp file, a csv file containing geographic information like lat/long/address) that could help answer the research topic you chose in question 1. If you can’t find anything, in 1-2 sentences, consider how you would go about composing your own spatial dataset to answer this topic. Where would you start? |
5. What were the reference (e.g., primary) sources for the 1953 and 1964 county- and province-level spatial data boundaries? |
6. What variable do the field names A53002 and A64002 describe? |
7. What other variable could you use to compare change between 1953 and 1964? |
8. In no more than 3 sentences, describe what you can learn from comparing the geographical changes over time as represented in this data; for example, you could describe provincial boundary changes, broader changes in the geopolitical landscape, or regional shifts in population. If you’re feeling ambitious, you could even read up on the Paracel Islands, Tibet/Xizang, or the hukou system – all of which experienced radical changes between the years represented in this data, 1953-1964 – and consider how they are (or are not) represented in this data and its attribute table. But, again: no more than 3 sentences. |
9. Take a screenshot of your ArcGIS Pro interface, showing the symbolized layers. |
Derrida, Jacques. 1996. Archive Fever: A Freudian Impression. Religion and Postmodernism. Chicago: University of Chicago Press.
Hodder, Jake, and David Beckingham. 2022. “Digital Archives and Recombinant Historical Geographies.” Progress in Human Geography 46 (6): 1298–1310.
Manalansan, Martin F. 2014. “The ‘Stuff’ of Archives.” Radical History Review 2014 (120): 94–107.
Rogers, Dallas. 2017. The Geopolitics of Real Estate: Reconfiguring Property, Capital and Rights. London ; New York: Rowman & Littlefield Publishers.
Stoler, Ann Laura. 2002. “Colonial Archives and the Arts of Governance.” Archival Science 2 (1): 87–109.