LOADING...

-

DIGITAL COLLECTIONS CATALYST PROJECT (2021)

last updated: 2nd December 2022

The Topography
of Searching

The Digital Collections Catalyst project (2021)
for the State Library of Queensland

What we search for reveals something of ourselves: our interests, our fears, our curiosity, or simply what we have forgotten. And it's not just what we search for, as how we search reveals something of ourselves as well. So when viewed in aggregate, what do the - searches (and counting) of the State Library of Queensland catalogue reveal about the events, topics, and concerns that have been on the minds of Queenslanders over the past decade or so? Do we ask more questions in winter? Do we make more typos in January? Did we swear more in 2020? Are we becoming more anxious? What aspects are constant, and what aspects are more prone to change?

This project allows you to generate topographic maps from the words and phrases that appeared in the searches made by people using the library catalogue since April 2012 (when the current tracking data began). And by looking at these maps - with their peaks, their valleys, their plateaus, and their plains - we get an imperfect glimpse of the collective interests and concerns of those using the catalogue, and the ways in which they've changed over the months, seasons, and years.

Read a more detailed explanation about the project below, or head straight to the

MAP GENERATOR

About the
project.

So in an age when there's more and more ways to access more and more information, what do people turn to the State Library for, and how do they go about finding it?

There are many ways to search the catalogue, and what follows is by no means an exhaustive list (and are all actual searches). Do we just use a single word ('cats', 'brisbane'), or the author ('Melissa Lucashenko', 'murakami'), or the title ('moby dick', 'The Yield'), wildcard characters ('monz mudflat*', 'nurs*'), or a Boolean search ('(preserves OR jam OR chutney OR pickles)'), or both ('(surg* OR operat*)'), or do we use an ISBN ('9780471732082'), or a catalogue number ('518774350002062'), or a Dewey Decimal number2 ('616.075'), or do we phrase it as a question ('does mozart make babies smarter', 'should citrus be stored in fridge'), or put it in quotes ('"one of the soldiers"', '"expo 88"'). Is it specific ('sleep paralysis and alien abduction'), or general ('weird shit'). Are we looking for books ('book about growing roses', 'little red book'), ebooks ('kids ebooks', 'tiny house ebooks'), journals ('Neue Grafik', 'Australian Journal of Political Science'), photos ('Badu Island photos', 'corley photographs'), illustrations or drawings ('botanical illustration', 'Drawing of migrants disembarking from a ship, ca. 1885'), musical scores or sheet music ('Chopin Sheet music', 'psycho complete score hitchcock'), recipes ('paleo recipes', 'GOOD HOUSEKEEPER'S PICTURE RECIPE BOOK'), maps ('flood maps', 'SG56-06'), letters ('Ernest Henry letters', 'WW1 letters'), manuscripts ('Harriet Barlow Manuscript', 'illuminated manuscripts'), films ('storm boy', 'Wake In Fright'), streaming ('Kanopy catalogue', 'streaming movies'), audio ('Gunggari Language Audio Cassettes', 'pavarotti audio'), newspapers or gazettes ('kingaroy newspaper', 'Queensland Police Gazette'), people ('Elvis'), places ('Cribb Island'), things ('zines'), or even last year's excellent digital catalyst project ('mapping future brisbane'). Do we search in lowercase ('biofuels'), or in all caps ('PLANTS OF CENTRAL QUEENSLAND'). Do we make typos ('euthenasia', 'databae'), or hit enter too soon ('a','everything is f', or the many blank searches), or accidently paste in a URL ('google.com used cars under $700', 'www.ato.gov.au'), or an email address ('*****@hotmail.com'), or another language ('해를 품은 달', 'حذاء رياضي'), or make a database hacking attempt ('"' '1'='1 OR": 1'). Are we uncertain ('is this working?'), frustrated ('lynda login F**K SAKE!'), or excited ('sloths!', 'minneeeeeeeeeeeecccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccrrrrrrrrrrrrrrrrraaaaaaaaaaafffftttttttttttttttt'), having questions about religion ('Orthodox Jews and IVF', 'Hindu belief on euthanasia', 'Christianity and sexting'), or are we just after something better ('better sleep', 'better public transport', 'BETTER BEEKEEPING'). Do we use emoticons (':)', '*-*'), or emojis (🐱), or is it gibberish ('sdfdfdfddfdfdfdfdffff', 'uuiji6ytdttttt') - possibly due to cats on keyboards - or is it meant for the library website, instead of the catalogue, ('easter opening hours', 'can you eat at the library?'), or, um, is it perhaps meant for another browser window ('you porn', 'stupid ass nae nae baby', 'stop driving by my sisters house the neighbours hate you').

However we search, our searches reveal more than just the topics we are interested in. In the words we use, and how we choose to phrase and/or formulate the request, it can also reveal our vocabulary, our familiarity with the library's systems, and at times, our emotions. Some searches are concise, some are verbose, some are flippant, some are polite. Some show people clearly wrestling with issues - health, dating, children, loneliness, anxiety, politics, religion, ethics - while others show boredom, frustration, and inattention. Regardless, the catalogue will dutifully try and provide a response to whatever was entered.

When looked at as a whole, the searches show the breadth of people's interests, hopes, fears, and moods, and as the data is updated at the end of each day, think of these maps as a glimpse into the forever-evolving, forever-shifting, collective preoccupations of Queenslanders over time.

2. Interestingly, the majority of the Dewey searches are between 610-620, which is the range for Medicine and Health

How it works.

Enter a search term (or terms) or select from a few predefined categories from the fields in the 'Search' section below, and you'll be shown a topographic map, generated from the relative monthly frequency of when those terms appeared in a search3. The map covers the period from early 2012, when the current dataset began, and is updated daily at around 3am. The map can be viewed as either 2D or 3D, and both the map and data can be downloaded (and although the map and csv data is aggregated by month, the json data has a daily breakdown).

Each map also has a few stats, as well as the top searches. Depending on the number of matches there are for each particular query, there's also a section of 'top words', and 'sentiment'. The top words show what other words appeared when people were searching, and give you more of an idea of the aspects of a subject people were more interested in, and as the linguist John Firth said, "you shall know a word by the company it keeps". The sentiment section provides a (crude) look at the sentiment of the search words that were used, whether positive, negative, or neutral (more info in the sections below).

But first a few caveats: maps are never completely accurate, and by necessity, are always an oversimplification of what they are trying to represent. Nor are they without the biases of their creator(s), as choices are made about what to include, what to exclude, what to highlight, and what to downplay. But their benefit is that they can provide a quick overview of some of the key features of an area at a glance, in this case the searches made by people using the library catalogue.

Also, the data that the library captures is completely anonymous. No personal details are recorded, and where email addresses have been accidently used by people as a search term, the name has been removed (but the service provider kept). Apart from that, the data is unfiltered, and left exactly as it was entered, and as a result, depending on what is searched for, the results may contain words, phrases, or concepts that people may find offensive.

And a final note: although it can be tempting to draw conclusions from any patterns and potential relationships that seem to appear in the maps, without further study, we can't be sure that those relationships actually exist, or just appear to (the old 'correlation is not causation'). This is especially true for terms that occur less frequently, and so the sample size is small, as well as any terms that appear to have peaks in April or May 2012, due to incomplete data for those months***.

3. The counts are based on the number of searches where the term appeared, rather than the number of occurrences of the term, so 'Palm Island 1930' and 'palm island centenary historical palm island images' both just count as 1 occurrence, despite the fact that 'palm island' occurs twice in the latter.

A quick overview
of searches.

On average there's around - searches per day. The most searches were in 2016 & 2017, followed closely by 2020 (note: tracking didn't start properly until June 2012, hence the comparatively low number that year). The average number of words per search is around three. Hover over the graphs for more detail.

By Year:


The graphs below show the relative percent of when searches occured in 2021.

By Hour:

Not surprisingly, searches were more common between 9am and 4pm, with a peak at around 2-3pm.

By Day:

It peaked on Wednesday, with the least on Saturday, which had around half as many searches.

By Month:

The peak was in May, with over twice as many searches as the lowest month (December). The season with the most searches was Autumn, the least was Spring (note that the month data above is based on the average per day for each month).

How to read
the maps.

A topographic map shows the features of a given area. In the physical world, this would include features such as mountains, rivers, lakes, and oceans, as well as potentially the types of vegetation or land cover such as forests or swamps, and structures such as buildings, bridges, or towns. To show elevation, or in this case, how common a search term was at that point in time, contour lines are used. They enclose areas of the same height, and each contour line indicates the same increase or decrease in height.

Contour lines that are close together indicate a rapid change in height, contours lines that are farther apart indicate a gentler slope. The direction of the slope can usually be determined by the numbers on the contour lines (as well as colour), but these maps just use colour. If the colour is getting lighter, then the height is increasing, if it is getting darker, then is decreasing. Peaks are indicated by black dots, and the highest point overall by a pink one.

The maps below show a few examples of some of the typical features. Note that the map can be viewed in either 2D or 3D. Each map has a scale, showing the minimum value - a dark green - and the maximum value, indicated by white. In order to best show the differences within each search term the scale varies.

Note: clicking on the marker icon will show you the topographic map for that word or phrase, clicking on the magnifying glass icon will take you to the search results for that word or phrase in the library catalogue.

  • SHOW TOPOGRAPHY
  • SEARCH CATALOGUE

3D maps:

The map below shows the search results in 3D for six different searches. Note that in order to best show the different feature types, the elevation/vertical scale varies between maps (in reality, searches containing 'newspaper' are over 120 times more common than searches containing 'XXXX').


A Podcast
The first recorded search was in May 2012, but didn't become popular until October & December 2020.

B COVID
An example of a search term with ongoing interest. First search in the catalogue was 'Covid-19 economy' at around 7am on the 9th of March, 2020.

C Ekka
An example of a term with an annual peak, in this case around the date of the show in early August. Note the recent drop, which could be because the show has been cancelled for the past two years.

D Newspaper
An example of a term that is always popular, with multiple searches per day.

E XXXX
An example of a search that has no real pattern to its popularity, and so consists of many smaller peaks and gullies.

F Nursing
A search that was popular for a period (2015 & 2016 in this case), then became less so, forming the valley, then became popular again (from early 2020).

2D maps:

zine

Jan-Mar, 2017-2019

This shows a gentle increase from left to right from a low level. Note the darker colour, which shows this is lower.

domestic violence

Mar-May, 2013-2015

This shows a gully in April-May 2013 & April 2014 and the saddle between March & May 2015.

Bundaberg

Jul-Sep, 2014-2016

This shows a steep increase up to the highest point on the map (as indicated by the pink dot).


The questions.

While there are many ways to search - by title, author, keyword, catalogue number, Boolean, etc - a handful of people choose to politely phrase their search in the form of a question (currently -, or around -% of searches). This section looks at the searches that start with words such as 'who', 'what', 'when', 'where', 'why' etc (sometimes referred to as 'interrogative words'). As a proportion of all searches, people tended to ask the least questions in the summer, and the most in winter. To see more about these types of searches and their topography, search using the 'starts with' option in the 'Search' section above.

  • SHOW TOPOGRAPHY
  • SEARCH CATALOGUE

Summer

LOADING

Autumn

LOADING

Winter

LOADING

Spring

LOADING

Note: only returns searches that had at least three words, and some of the results may be for books where the title of the book is a question, as there is not an easy way to exclude them. Also, given that 'Will' is also a name, to exclude searches for a person from the list, 'will' has the added criteria that it has to end with a question mark.

Some of the questions asked

Questions starting with:

  • Search Library

View map: for all searches starting with ''

The trees, not the forest.

When looking at data, especially large data sets, the focus often tends to be on what is at the top - the biggest, the latest, the trends - and sometimes the information contained in the less frequent elements can be missed. Given that almost 40% of searches only ever appeared once, and a further 6% or so only twice, the searches exhibit the classic long tail distribution:

Number of times search
term appeared:

April 2012 - Dec 2021


Therefore, in order to briefly have a look at a few of the individual trees, rather than the overall forest, the searches below are a selection of those that have only appeared a single time.


A few recent examples

(there were - searches that only appeared once)

  • Search Library

Notes, & a few
final thoughts.

As in the physical world, certain events leave their marks on the landscape. The steep cliffs formed by searches related to COVID (the first search was for 'Covid-19' at around 7am, on Monday the 9th of March, 2020) are now part of the data, and only time will tell the final shape and form that those peaks will take. The gentle decline of DVDs (and the subsequent rise of streaming), the 2011 floods (and the catalogue going offline during the 2022 floods), the annual peaks for Anzac Day, the Ekka, and those (presumably) related to school projects on Ancient Egypt, Greece, and Rome, the dwindling interest in Lonely Planet travel guides, the increased popularity of the em dash, all are etched into the data, and are now permanent features of the landscape.

A librarian once said "we cannot see the role libraries play in fighting inequality, polarization and loneliness from a spreadsheet", to which I'd add, "or a data visualisation", but I hope this highlights the huge range of questions - both literal and implied - that people have turned to the library for, in order to seek some answers. There's obviously heaps more that could have been done with this dataset, but I hope people have found at least some aspects of this interesting, and so um, feel free to now go and search the catalogue (and thus contribute some more data points to the maps), get some tips on some of the different ways to search, or keep exploring the maps using some of the topics in the footer below.

Notes:

  • Search data is taken from Google Analytics and based on 'unique searches', where duplicate searches within a single session or visit are excluded. i.e. If someone searches for 'Townsville' several times during their visit, it will only count as one search for 'Townsville' (but it is case sensitive so 'Townsville' and 'TOWNSVILLE' will count as two separate searches). More info here.
  • Search data is currently stored in Google Analytics with some of the search criteria as a prefix. So a general search for 'Townsville' is stored as 'any,contains,Townsville', a search for a title that begins with 'MOBY' is stored as 'title,begins_with,MOBY' etc. For this project, the prefixes have been stripped.
  • in order to speed up how quickly the data is returned, the handful of searches that were over 768 characters were truncated (0.002% as of the end of 2021). Truncated searches end with '[...]'
  • Those who searched for the Euro symbol (€) might have noticed a bunch of results where '’' was in the results instead of an apostrophe (e.g.'Don’t call me Ishmael'). This is an issue with the original data, and is due to a mismatch of character encoding, which can occur when the search text is copied and pasted from another program. Another common one that appears in the data is '~2F' for '/'
  • For a while there was going to be another section looking at the ratio of unique words to total searches for a term, to see if more complex topics had more unique words used when people were searching for them, but I sort of ran out of time.
  • HTML tags were stripped from the searches when they appeared.
  • As noted earlier, although there is data for April, May, & June 2012, the first month with a full month of data is July, so data from earlier than that should be considered unreliable. Also, in February & March 2022 the power was off at the library at times due to the floods, during which the online catalogue was unavailable, so total searches for those months are lower (during the affected period - February 27th to March 8th - there were around 300 searches per day on average, which is about 10% the daily totals of the weeks before)