A while back I was asked this question, and it provoked an almost subterranean thought process in me. While I can’t promise a deep answer to this question, at least an answer has emerged.
For me, librarianship itself is all about guiding people to knowledge. I love the sherpa meme, and would be honored to call myself a data sherpa.
[The image above appears to be no longer actively used by Open Knowledge Foundation, possibly due to other companies using this tagline as well. I am just referring to it in an academ-y fair use-y way here.]
So, a data librarian is not necessarily someone who collects data and puts on a shelf or on a server (although that certainly can be done by data librarians). For me, the central role of data librarians (as compared to data archivists, data curators, data analysts, data scientists, and other professions with the data-prefix) is that of data navigator or data guide. We help people find and use the data they need, using the librarian side of our skills to understand our user communities and craft solutions to their particular needs. That requires knowing the data landscape, having the hard skills to crunch the data itself, and having the soft skills to adapt our services to our environment.
As has always been the case in librarianship, the balance among the different things we offer changes over time. Data availability is certainly increasing in the general sense, so the “finding data” part of the equation has changed to one that requires more understanding of what kinds of deep analysis are made possible by combining disparate datasets and tools in possibly unexpected ways. Finding the population of a country over time has never been easier, but trying to understand how economic and environmental factors may have contributed to that population change is now a question that permits more sophisticated answers if we bring more and better data to bear.
The tools at our disposal have changed as well. Rather than being dependent on a preinstalled application (for example on a data CD), users expect to extract data into their own preferred analysis platforms and then be able to serve their results back to end users via interactive web interfaces of their own creation. It is amazing that this is possible and is getting easier all the time. But it also means that we cannot stand pat and continue to offer the same data resources of previous decades as if they are everyone’s sine qua non or ne plus ultra or [insert your alma mater‘s latin cliché here].
What else distinguishes a data librarian? Many data scientists may know the data landscape and know about data analysis and be applying those skills in service of a particular community. I would also argue that it is the values of librarianship that are important. Specifically, the commitment to open, shared resources, and to educating the community on their use are critical. This is why I consider many of the things I do that others may not see as “librarian-like” — such as teaching data literacy, or sharing tutorial videos about statistical software — to be some of my most valuable and core work. What I value is this openness and sharing that offers the promise to every person that they can continue to learn and develop their skills, and themselves. I hope that my work as a data librarian helps enable that, and I am glad and privileged to be part of both a local work environment and an international community that supports those goals.
Attending IASSIST 2013 was very therapeutic, and I have returned from Germany invigorated and with many new thoughts about improving data services.
One thing I now wonder about is what I would do if I could design a dream lineup for a Data Services team, assuming that I had 4 or 5 staff lines at my disposal to hire from scratch. What would the ideal configuration look like? My thoughts are primarily about an academic library setting similar to my own, but this would be an interesting exercise in other settings too.
Is it hierarchical (a head with subordinates)? A team of equals? Are responsibilities cleanly divided or shared?
Some technical skills that are required: statistical, mathematical and engineering software (R/SAS/SPSS/Matlab/Mathematica, etc. etc.), GIS, Qualitiative Analysis, Data Visualization. Scripting languages (Python, Java), Database skills. How are these divided among positions?
We also need a knowledge of public data sources, outreach and instruction skills in person and via electronic media. Research data management also requires one-on-one people skills to negotiate data acquisition and provide advice across many disciplines. Is it better to split these along functional lines (RDM specialist vs. Public Services Data Librarian) or along subject lines (e.g., a science data librarian and a social sciences data librarian handle both instruction and individual RDM work in their respective disciplines)? Does digital humanities fit in here, or is it a separate issue?
So here’s the thought exercise: List the five members of your dream data team…