New Brunswick Libraries Data Workshop Series
This Fall, Ryan Womack, Data Librarian, will offer a series of workshops on statistical software, data visualization, and reproducible research as part of New Brunswick Libraries Data Management Services. A detailed calendar and descriptions of each workshop are below. This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave. These sessions will be identical except for location. Sessions will run approximately 3 hours. Workshops in parts will divide the time in thirds. For example, the first SPSS, Stata, and SAS workshop (running from 12-3 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm. You are free to come only to those segments that interest you. There is no need to register, just come!
Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Wednesdays from 12 to 3 pm. The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.
For both locations, you are encouraged to bring your own laptop to work in your native environment. Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop. At LSM, we will have laptops available to borrow for the session if you don’t bring your own. Room capacity is 25 in both locations, first come, first served.
If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series. Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.
Calendar of workshops
1:10 pm -4:10 pm
12 noon – 3 pm
|September 12||Introduction to SPSS, Stata, and SAS||September 13|
|September 19||Introduction to R||September 20|
|September 26||Data Visualization in R||September 27|
|October 3||Reproducible Research||October 18|
Description of Workshops:
§ Introduction to SPSS, Stata, and SAS (September 12 or September 13) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset. If you are already using these packages with some degree of success, you may find these sessions too basic for you.
- SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines. Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SPSS is also available in campus computer labs and via the Apps server (see below).
- Stata is flexible and allows relatively easy access to programming features. It is popular in economics among other areas. Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users. Find it at software.rutgers.edu.
- SAS is a powerful and long-standing system that handles large data sets well, and is popular in the pharmaceutical industry and health sciences, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).
Note: Accessing software via apps.rutgers.edu
§ Introduction to R (September 19 or September 20) – This session provides a three-part orientation to the R programming environment. R is freely available, open source statistical software that has been widely adopted in the research community. Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool. No prior knowledge is assumed. The three parts cover:
- Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
- Graphics: comparison of graphing techniques in base R, lattice, and ggplot2 packages
- Data Manipulation: data import and transformation, additional methods for working with large data sets, also dplyr and other packages from the tidyverse useful for manipulation.
Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://r-project.org
§ Data Visualization in R (September 26 or September 27) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background. The three parts are:
- Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages. Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
- Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
- 3-D, Interactive, and Big Data: presentation of 3-D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.
Additional R resources can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://r-project.org
§ Reproducible Research (October 3 or October 18) covers
- Reproducible research describes the growing movement to make the products of research accessible and usable by others in order to verify, replicate, and extend research findings. This session reviews how to plan research, to create publications, code, and data in open, reusable formats, and maximize the impact of shared research findings. Examples in LaTeX and Rmarkdown are discussed, along with platforms for reusability such as the Open Science Foundation.
Additional resources on reproducible research and data management, including presentation slides, can be found here: http://libguides.rutgers.edu/datamanagement
§ Special Topics
Note that the following special topics are no longer covered by in-person workshops, but are available via screencast.
- Time Series in R: review of commands and techniques for basic time series analysis in R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOA2q0sfDNKBH9WIlLxXkbn and scripts at http://libguides.rutgers.edu/data_R
- Survival Analysis in R: review of commands and techniques for basic survival analysis in R. Scripts at http://libguides.rutgers.edu/data_R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOON9isnuVYIL8dNwkvwqr9.
- Big Data in Brief: an introduction to some of the techniques and software environments used to work with big data, with pointers to resources for further learning at http://libguides.rutgers.edu/bigdata. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hMNhIdrvz1F5-JHIWi1qdX1
Distributing files via PirateBox
A PirateBox is a wireless router that has been reconfigured to serve as a local fileserver. The PirateBox project develops software to do this. PirateBox makes it easy to share files with anyone within range of the router, and also supports a local anonymous discussion board for those within range.
I did this for the TP-Link TL-MR3040, a commonly used piece of hardware for PirateBoxes. The MR3040 is small and battery powered, so you can easily carry it in your pocket to places that have no electricity or internet. The file system goes on a removable USB, so it is easy to set up by just copying stuff from your computer to the USB.
Configuration is not really that difficult if you follow the instructions here at the PirateBox project site.
To customize your SSID (the name of your wireless device) and Home Page you can follow these instructions.
Workshops hosted on PirateBox
I have used my PirateBox to share workshop slides, articles, and code. Up to down this has been a small supplement to my normal workshops, but it could have a larger role.
What I would like to do is to create a self-contained training environment that would not depend on the vagaries of local configuration and connectivity issues. Following a “train the trainer” model, one could build a PirateBox with an entire data literacy course running off of web pages on the PirateBox. The PirateBox could include all necessary software (a complete R installation, for example) and a collection of supporting documents, datasets, code, and any other information. This material could also be mirrored to/from a regular website, but the portable and self-contained aspect of the PirateBox opens up many possibilities.
So the trainer could walk into a room anywhere in the world (for example a small Mongolian town – сум), with their PirateBox and lead a workshop based on materials that reside in their entirety on the PirateBox. Then leave the PirateBox behind so that those in the community could continue to work with the materials and any additional modules. They could adapt, repurpose, and create their own materials too. So the PirateBox can support ongoing learning, far beyond the limits of one-shot workshops.
These are not especially new ideas, and even as I type people are surely hacking wireless routers and other devices to perform other advanced functions. Doubtless the technology will continue to develop. But for now, the PirateBox software allows one to do interesting work with less than $50 in hardware and a couple of hours in setup time. Who knows? One can dream of hordes of data literacy pirates emerging from this simple technology.