R Workshops, a tidyverse approach, Fall 2019
A bit late posting about this, but my R workshops start tomorrow. This year I am revising my materials to reflect a tidyversecentric approach. I am not a tidyverse convert or even a particular fan, but I would like to teach this popular and coherent ecosystem as an entrypoint to R. I hope it does not discourage learning the entire diversity of the R space.
These workshops are open to all without registration.
Bring your own laptop to these sessions to get the most out of them!
Later in the semester, there are plans to repeat these as webinars (schedule to come in late September).
R for data analysis: a tidyverse approach
 Wednesday, September 25 – 12:001:20 pm, LSM Conference Room
 Thursday, October 3– 2:504:10 pm, Alexander Library Room 415
The session introduces the R statistical software environment and basic methods of data analysis, and also introduces the “tidyverse”. While R is much more than the “tidyverse”, the development of the “tidyverse” set of packages, led by RStudio, has provided a powerful and connected toolkit to get started with using R. Note that graphics and data manipulation are covered in subsequent sessions.
R graphics with ggplot2
 Wednesday, October 2 – 12:001:20 pm, LSM Conference Room
 Thursday, October 10– 2:504:10 pm, Alexander Library Room 415
The ggplot2 package from the tidyverse provides extensive and flexible graphical capabilities within a consistent framework. This session introduces the main features of ggplot2. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background.
R data wrangling with dplyr, tidyr, readr and more
 Wednesday, October 9 – 12:001:20 pm, LSM Conference Room
 Thursday, October 24 – 2:504:10 pm, Alexander Library Room 415
Some of the most powerful features of the tidyverse relate to its abilities to import, filter, and otherwise manipulate data. This session reviews major packages within the tidyverse that relate to the essential data handling steps require before (and during) data analysis.
R for interactivity: an introduction to Shiny
 Wednesday, October 23 – 12:001:20 pm, LSM Conference Room
 Thursday, October 31 – 2:504:10 pm, Alexander Library Room 415
Shiny is an R package that enables the creation of interactive websites for data visualization. This session provides a brief overview of the Shiny framework, and how to edit and publish Shiny sites in RStudio (with shinyapps.io). Familiarity with R/RStudio is assumed.
R for reproducible scientific documents: knitr, rmarkdown, and beyond
 Wednesday, October 30 – 12:001:20 pm, LSM Conference Room
 Thursday, November 7 – 2:504:10 pm, Alexander Library Room 415
The RStudio environment enables the easy creation of documents in various formats (HTML, DOC, PDF) using Rmarkdown, while knitr allows the incorporation of executable R code to produce the tables and figures in those documents. This session introduces these concepts and other packages and practices supporting reproducibility with the R environment.
October Python Workshops
The New Brunswick Libraries’ Quantiative Data Analytics Graduate Specialist, Hang Miao, will be offering a threepart series of Python workshops in October, starting Friday October 5.
Note: additional more advanced workshops for November will be announced later in October.
☞ RSVP for the Python workshops.
Workshops are offered in either Alexander Library or LSM (with identical content). Participants in LSMbased workshops must bring their own laptops. At Alexander, you can either bring your own laptop, or use the desktops in the lab.
Python Basics and Data Exploration
October 5, 13 pm, Alexander Library Room 413
October 10, 3:305:30 pm, Library of Science and Medicine Electronic Classroom (3rd floor)
This workshop will be an accelerated introduction to fundamental concepts such as variable assignment, data types, basic calculations, working with strings and lists, control structures (e.g. forloops), functions. We will also start working with pandas, a popular data science library in Python, to explore a dataset on foodborne outbreaks reported to the CDC.
Data Manipulation and Analysis
October 12, 13 pm, Alexander Library Room 413
October 17, 3:305:30 pm, Library of Science and Medicine Conference Room (1st Floor)
In this workshop, we will dive into the world of arrays and data frames using the NumPy and pandas libraries. We’ll cover data cleaning and preprocessing, joining and merging, group operations, and more. If you work with tabular data, this workshop is for you!
Data Visualization and Machine Learning
October 19, 13 pm, Alexander Library Room 413
October 24, 3:305:30 pm, Library of Science and Medicine Conference Room (1st floor)
Interested in finding patterns and predicting unknown attribute values in your data? Join us for an overview of machine learning techniques implemented using the scikitlearn library. We’ll also learn how to do data visualization with matplotlib, a popular plotting library in Python.
Open Followup Session on Python
October 26, 13 pm, Alexander Library Room 413
October 31 3:305:30 pm, Library of Science and Medicine Electronic Classroom (3rd floor)
This open session allows participants to bring their questions or issues from previous sessions to practice and further develop skills.
Statistical Software and Data Workshops, Fall 2018
New Brunswick Libraries Data Workshop Series
Fall 2018
This Fall, Ryan Womack, Data Librarian, will offer a series of workshops on statistical software and data visualization as part of New Brunswick Libraries Data Management Services. A detailed calendar and descriptions of each workshop are below. The workshop on reproducible research is moving online to YouTube – stay tuned for an upcoming blog post and announcement on its availability. We also anticipate offering additional workshops through the Graduate Specialist program. That announcement will be coming in September.
This semester each workshop topic will be repeated twice in person, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave. These sessions will be identical except for location. Sessions will run approximately 3 hours. Workshops in parts will divide the time in thirds. For example, the first SPSS, Stata, and SAS workshop (running from 1:104:10 pm) would start with SPSS at 1:10 pm, Stata at 2:10 pm, and SAS at 3:10 pm. You are free to come only to those segments that interest you. There is no need to register, just come! A
Logistics
Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Mondays from 12 to 3 pm. The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.
For both locations, you are encouraged to bring your own laptop to work in your native environment. Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop. At LSM, we will have laptops available to borrow for the session if you don’t bring your own. Room capacity is 25 in both locations, first come, first served.
If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at https://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series. Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.
Calendar of workshops
Wednesday (LSM)
12 noon – 3 pm 
Thursday (Alexander)
1:10 pm 4:10 pm 

October 3  Introduction to SPSS, Stata, and SAS  September 13 
October 10  Introduction to R  September 20 
October 17  Data Visualization in R  September 27 
Description of Workshops:
§ Introduction to SPSS, Stata, and SAS (September 13 or October 3) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset. If you are already using these packages with some degree of success, you may find these sessions too basic for you.
 SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines. Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SPSS is also available in campus computer labs and via the Apps server (see below).
 Stata is flexible and allows relatively easy access to programming features. It is popular in economics among other areas. Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users. Find it at software.rutgers.edu.
 SAS is a powerful and longstanding system that handles large data sets well, and is popular in the pharmaceutical industry and health sciences, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).
Note: Accessing software via apps.rutgers.edu
SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.edu. apps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.
§ Introduction to R (September 20 or October 10) – This session provides a threepart orientation to the R programming environment. R is freely available, open source statistical software that has been widely adopted in the research community. Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool. No prior knowledge is assumed. The three parts cover:
 Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
 Graphics: comparison of graphing techniques in base R, lattice, and ggplot2 packages
 Data Manipulation: data import and transformation, additional methods for working with large data sets, also dplyr and other packages from the tidyverse useful for manipulation.
Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Data Visualization in R (September 27 or October 17) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background. The three parts are:
 Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages. Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
 Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
 3D, Interactive, and Big Data: presentation of 3D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.
Additional R resources can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
Note that the following special topics are no longer covered by inperson workshops, but are available via screencast.
 Reproducible Research – [Coming Soon] how to create data, code, and publications in open, reusable formats and maximize the impact and validity of your research.
 Time Series in R: review of commands and techniques for basic time series analysis in R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOA2q0sfDNKBH9WIlLxXkbn and scripts at http://libguides.rutgers.edu/data_R
 Survival Analysis in R: review of commands and techniques for basic survival analysis in R. Scripts at http://libguides.rutgers.edu/data_R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOON9isnuVYIL8dNwkvwqr9.
 Big Data in Brief: an introduction to some of the techniques and software environments used to work with big data, with pointers to resources for further learning at http://libguides.rutgers.edu/bigdata. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hMNhIdrvz1F5JHIWi1qdX1
Statistical Software and Data Workshops, Spring 2018
New Brunswick Libraries Data Workshop Series
Spring 2018
This Spring, Ryan Womack, Data Librarian, will repeat the series of workshops on statistical software and data visualization as part of New Brunswick Libraries Data Management Services. A detailed calendar and descriptions of each workshop are below. The workshop on reproducible research is moving online to YouTube – stay tuned for an upcoming blog post and announcement on its availability.
This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave. These sessions will be identical except for location. Sessions will run approximately 3 hours. Workshops in parts will divide the time in thirds. For example, the first SPSS, Stata, and SAS workshop (running from 123 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm. You are free to come only to those segments that interest you. There is no need to register, just come!
Logistics
Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Mondays from 12 to 3 pm. The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.
For both locations, you are encouraged to bring your own laptop to work in your native environment. Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop. At LSM, we will have laptops available to borrow for the session if you don’t bring your own. Room capacity is 25 in both locations, first come, first served.
If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series. Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.
Calendar of workshops
Monday (LSM)
12 noon – 3 pm 
Tuesday (Alexander)
1:10 pm 4:10 pm 

January 29  Introduction to SPSS, Stata, and SAS  January 30 
February 5  Introduction to R  February 6 
February 12  Data Visualization in R  February 13 
Description of Workshops:
§ Introduction to SPSS, Stata, and SAS (January 29 or January 30) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset. If you are already using these packages with some degree of success, you may find these sessions too basic for you.
 SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines. Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SPSS is also available in campus computer labs and via the Apps server (see below).
 Stata is flexible and allows relatively easy access to programming features. It is popular in economics among other areas. Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users. Find it at software.rutgers.edu.
 SAS is a powerful and longstanding system that handles large data sets well, and is popular in the pharmaceutical industry and health sciences, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).
Note: Accessing software via apps.rutgers.edu
SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.edu. apps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.
§ Introduction to R (February 5 or February 6) – This session provides a threepart orientation to the R programming environment. R is freely available, open source statistical software that has been widely adopted in the research community. Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool. No prior knowledge is assumed. The three parts cover:
 Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
 Graphics: comparison of graphing techniques in base R, lattice, and ggplot2 packages
 Data Manipulation: data import and transformation, additional methods for working with large data sets, also dplyr and other packages from the tidyverse useful for manipulation.
Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Data Visualization in R (February 12 or February 13) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background. The three parts are:
 Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages. Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
 Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
 3D, Interactive, and Big Data: presentation of 3D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.
Additional R resources can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
Note that the following special topics are no longer covered by inperson workshops, but are available via screencast.
 Reproducible Research – [Coming Soon] how to create data, code, and publications in open, reusable formats and maximize the impact and validity of your research.
 Time Series in R: review of commands and techniques for basic time series analysis in R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOA2q0sfDNKBH9WIlLxXkbn and scripts at http://libguides.rutgers.edu/data_R
 Survival Analysis in R: review of commands and techniques for basic survival analysis in R. Scripts at http://libguides.rutgers.edu/data_R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOON9isnuVYIL8dNwkvwqr9.
 Big Data in Brief: an introduction to some of the techniques and software environments used to work with big data, with pointers to resources for further learning at http://libguides.rutgers.edu/bigdata. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hMNhIdrvz1F5JHIWi1qdX1
Statistical Software and Data Workshops, Fall 2017
New Brunswick Libraries Data Workshop Series
Fall 2017
This Fall, Ryan Womack, Data Librarian, will offer a series of workshops on statistical software, data visualization, and reproducible research as part of New Brunswick Libraries Data Management Services. A detailed calendar and descriptions of each workshop are below. This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave. These sessions will be identical except for location. Sessions will run approximately 3 hours. Workshops in parts will divide the time in thirds. For example, the first SPSS, Stata, and SAS workshop (running from 123 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm. You are free to come only to those segments that interest you. There is no need to register, just come!
Logistics
Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Wednesdays from 12 to 3 pm. The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.
For both locations, you are encouraged to bring your own laptop to work in your native environment. Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop. At LSM, we will have laptops available to borrow for the session if you don’t bring your own. Room capacity is 25 in both locations, first come, first served.
If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series. Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.
Calendar of workshops
Tuesday (Alexander)
1:10 pm 4:10 pm 
Wednesday (LSM)
12 noon – 3 pm 

September 12  Introduction to SPSS, Stata, and SAS  September 13 
September 19  Introduction to R  September 20 
September 26  Data Visualization in R  September 27 
October 3  Reproducible Research  October 18 
Description of Workshops:
§ Introduction to SPSS, Stata, and SAS (September 12 or September 13) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset. If you are already using these packages with some degree of success, you may find these sessions too basic for you.
 SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines. Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SPSS is also available in campus computer labs and via the Apps server (see below).
 Stata is flexible and allows relatively easy access to programming features. It is popular in economics among other areas. Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users. Find it at software.rutgers.edu.
 SAS is a powerful and longstanding system that handles large data sets well, and is popular in the pharmaceutical industry and health sciences, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).
Note: Accessing software via apps.rutgers.edu
SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.edu. apps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.
§ Introduction to R (September 19 or September 20) – This session provides a threepart orientation to the R programming environment. R is freely available, open source statistical software that has been widely adopted in the research community. Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool. No prior knowledge is assumed. The three parts cover:
 Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
 Graphics: comparison of graphing techniques in base R, lattice, and ggplot2 packages
 Data Manipulation: data import and transformation, additional methods for working with large data sets, also dplyr and other packages from the tidyverse useful for manipulation.
Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Data Visualization in R (September 26 or September 27) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background. The three parts are:
 Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages. Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
 Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
 3D, Interactive, and Big Data: presentation of 3D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.
Additional R resources can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Reproducible Research (October 3 or October 18) covers
 Reproducible research describes the growing movement to make the products of research accessible and usable by others in order to verify, replicate, and extend research findings. This session reviews how to plan research, to create publications, code, and data in open, reusable formats, and maximize the impact of shared research findings. Examples in LaTeX and Rmarkdown are discussed, along with platforms for reusability such as the Open Science Foundation.
Additional resources on reproducible research and data management, including presentation slides, can be found here: http://libguides.rutgers.edu/datamanagement
§ Special Topics
Note that the following special topics are no longer covered by inperson workshops, but are available via screencast.
 Time Series in R: review of commands and techniques for basic time series analysis in R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOA2q0sfDNKBH9WIlLxXkbn and scripts at http://libguides.rutgers.edu/data_R
 Survival Analysis in R: review of commands and techniques for basic survival analysis in R. Scripts at http://libguides.rutgers.edu/data_R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOON9isnuVYIL8dNwkvwqr9.
 Big Data in Brief: an introduction to some of the techniques and software environments used to work with big data, with pointers to resources for further learning at http://libguides.rutgers.edu/bigdata. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hMNhIdrvz1F5JHIWi1qdX1
Statistical Software and Data Workshops, Fall 2016
Rutgers University Libraries Data Services Workshop Series (New Brunswick)
Fall 2016
This Fall, Ryan Womack, Data Librarian, will offer a series of workshops on statistical software, data visualization, and data management, as part of the Rutgers University Libraries Data Services. A detailed calendar and descriptions of each workshop are below. This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave. These sessions will be identical except for location. Sessions will run approximately 3 hours. Workshops in parts will divide the time in thirds. For example, the first SPSS, Stata, and SAS workshop (running from 123 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm. You are free to come only to those segments that interest you. There is no need to register, just come!
Logistics
Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Wednesdays from 12 to 3 pm. The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Thursdays from 1:10 to 4:10 pm.
For both locations, you are encouraged to bring your own laptop to work in your native environment. Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop. At LSM, we will have laptops available to borrow for the session if you don’t bring your own. Room capacity is 25 in both locations, first come, first served.
If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series. Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.
Calendar of workshops
Wednesday (LSM)
12 noon – 3 pm 
Thursday (Alexander)
1:10 pm 4:10 pm 

September 21  Introduction to SPSS, Stata, and SAS  September 22 
September 28  Introduction to R  September 29 
October 5  Data Visualization in R  October 6 
October 19  Introduction to Data Management  October 13 
Description of Workshops:
§ Introduction to SPSS, Stata, and SAS (September 21 or September 22) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset. If you are already using these packages with some degree of success, you may find these sessions too basic for you.
 SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines. Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SPSS is also available in campus computer labs and via the Apps server (see below).
 Stata is flexible and allows relatively easy access to programming features. It is popular in economics among other areas. Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users. Find it at software.rutgers.edu.
 SAS is a powerful and longstanding system that handles large data sets well, and is popular in the pharmaceutical industry, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).
Note: Accessing software via apps.rutgers.edu
SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.edu. apps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.
§ Introduction to R (September 28 or September 29) – This session provides a threepart orientation to the R programming environment. R is freely available, open source statistical software that has been widely adopted in the research community. Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool. No prior knowledge is assumed. The three parts cover:
 Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
 Graphics: comparison of graphing techniques in base R, lattice, and ggplot2 packages
 Data Manipulation: data import and transformation, additional methods for working with large data sets, also plyr and other packages useful for manipulation.
Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Data Visualization in R (October 5 or October 6) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background. The three parts are:
 Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages. Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
 Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
 3D, Interactive, and Big Data: presentation of 3D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.
Additional R resources can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Introduction to Data Management (October 13 or October 19) covers
 Best Practices for Managing Your Data – methods to organize, describe, backup, and archive your research data in order to ensure its future usability and accessibility. Developing good habits for handling your data from the start will save time and frustration later, and increase the ultimate impact of your research.
 Data Management Plans, Data Sharing and Archiving – targeted to researchers who need to write data management plans (DMPs) and share their data as part of their grant application, research and publication process. Reviews DMP guidelines, checklist, and general advice, along with options for sharing and permanently archiving research data.
 Reproducible Research – covers the growing movement to make the products of research accessible and usable by others in order to verify, replicate, and extend research findings. Reviews how to plan research, to create publications, code, and data in open, reusable formats, and maximize the impact of shared research findings.
Additional data management resources, including presentation slides, can be found here: http://libguides.rutgers.edu/datamanagement
§ Special Topics
Note that the following special topics are no longer covered by inperson workshops, but are available via screencast.
 Time Series in R: review of commands and techniques for basic time series analysis in R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOA2q0sfDNKBH9WIlLxXkbn and scripts at http://libguides.rutgers.edu/data_R
 Survival Analysis in R: review of commands and techniques for basic survival analysis in R. Scripts at http://libguides.rutgers.edu/data_R. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hOON9isnuVYIL8dNwkvwqr9.
 Big Data in Brief: an introduction to some of the techniques and software environments used to work with big data, with pointers to resources for further learning at http://libguides.rutgers.edu/bigdata. Screencast at https://www.youtube.com/playlist?list=PLCj1LhGni3hMNhIdrvz1F5JHIWi1qdX1
Data Visualization and R
Well, it has been a long time in coming, but I have finally finished converting my Data Visualization workshop series to a screencast video version. See this YouTube playlist for the complete series, and the materials at Github. This is the long version of the inperson 3 hour workshop. The video series goes into even more detail, starting from a history of major developments in visualization, to various implementations of specific graphs, interactive visualizations, web viz, big data, and more.
I also have some ideas for some more uptodate addins that I will probably record as lagniappe videos over the next few weeks. Those didn’t quite fit into the existing sequence of videos.
The energy to complete these videos came from several musical sources, of which I would credit Harmogu and Linton Kwesi Johnson as leading lights.
Statistical Software and Data Workshops Spring 2016
Rutgers University Libraries Data Services Workshop Series (New Brunswick)
January 2016
This Spring, Ryan Womack, Data Librarian, will repeat the series of workshops on statistical software, data visualization, and data management, as part of the Rutgers University Libraries Data Services. A detailed calendar and descriptions of each workshop are below. This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave. These sessions will be identical except for location. Sessions will run approximately 3 hours. Workshops in parts will divide the time in thirds. For example, the first SPSS, Stata, and SAS workshop would start with SPSS at 12, Stata at 1, and SAS at 2. You are free to come only to those segments that interest you. There is no need to register, just come!
Logistics
Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Mondays from 12 to 3 pm. The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.
For both locations, you are encouraged to bring your own laptop to work in your native environment. Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop. At LSM, we will have laptops available to borrow for the session if you don’t bring your own. Room capacity is 25 in both locations, first come, first served.
If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data. Additional screencasts are continually being added to this series.
Calendar of workshops
Monday (LSM)
12 noon – 3 pm 
Tuesday (Alexander)
1:10 pm 4:10 pm 

January 25  Introduction to SPSS, Stata, and SAS  January 26 
February 1  Introduction to R  February 2 
February 8  Data Visualization in R  February 9 
February 15  Special Topics:
Time Series in R, Survival Analysis in R, Big Data in Brief 
February 16 
Description of Workshops:
§ Introduction to SPSS, Stata, and SAS (January 25 or January 26) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset. If you are already using these packages with some degree of success, you may find these sessions too basic for you.
 SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines. Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SPSS is also available in campus computer labs and via the Apps server (see below).
 Stata is flexible and allows relatively easy access to programming features. It is popular in economics among other areas. Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users. Find it at software.rutgers.edu.
 SAS is a powerful and longstanding system that handles large data sets well, and is popular in the pharmaceutical industry, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).
Note: Accessing software via apps.rutgers.edu
SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.edu. apps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.
§ Introduction to R (February 1 or February 2) – This session provides a threepart orientation to the R programming environment. R is freely available, open source statistical software that has been widely adopted in the research community. Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool. No prior knowledge is assumed. The three parts cover:
 Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
 Graphics: comparison of graphing techniques in base R, lattice, and ggplot2 packages
 Data Manipulation: data import and transformation, additional methods for working with large data sets
Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Data Visualization in R (February 8 or February 9) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background. The three parts are:
 Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages. Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
 Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
 3D, Interactive and Big Data: presentation of 3D data, interactive exploration data, and techniques for large datasets
Additional R resources can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Special Topics (February 15 or February 16) covers a few different specialized areas. The three parts presented during the afternoon workshop are not related.
 Time Series in R: review of commands and techniques for basic time series analysis in R. Scripts at http://libguides.rutgers.edu/data_R
 Survival Analysis in R: review of commands and techniques for basic survival analysis in R. Scripts at http://libguides.rutgers.edu/data_R
 Big Data in Brief: an introduction to some of the techniques and software environments used to work with big data, with pointers to resources for further learning at http://libguides.rutgers.edu/bigdata
Of related interest: There is also a Digital Humanities Workshop Series this spring, covering topics including text analysis, network analysis, and digital mapping. See https://dh.rutgers.edu/spring2016workshops/ for information on the topics and schedule.
Data Workshops Full, Registration Closed
Somehow the response this Fall was much higher than expected, so all data and statistical software workshop sessions are now full and registration is closed. Please consult the screencasts, scripts, and handouts at libguides.rutgers.edu/data for a selfguided version of the same material.
The same sessions will run again live in the Spring.
Statistical Software and Data Workshops – Fall 2015
Rutgers University Libraries Data Services Workshop Series (New Brunswick)
August 2015
This Fall, Ryan Womack, Data Librarian, will give a series of workshops on statistical software, data visualization, and data management, as part of the Rutgers University Libraries Data Services. To go directly to the registration page, click here. A detailed calendar and descriptions of each workshop are below. This semester each workshop topic will be repeated twice, once at Alexander Library on College Ave, and once at the Library of Science and Medicine on Busch. These sessions will be identical except for location. Sessions will run approximately 3 hours. Workshops in parts will divide the time in thirds. For example, the SPSS, Stata, and SAS workshop would start with SPSS at 1:10, Stata at 2:10, and SAS at 3:10. You are free to come only to those segments that interest you.
Logistics
Location: The Alexander Library (College Ave) workshops will be held in room 415 of the Scholarly Communication Center (4th floor of Alexander Library) from on Wednesdays from 1:10 to 4:10 pm. The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Thursdays from 12 to 3 pm. Pay attention to the different locations and times when signing up.
For both locations, you are encouraged to bring your own laptop to work in your native environment. Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop. At LSM, we will have laptops available to borrow for the session if you don’t bring your own. Room capacity is 25 in both locations.
If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data. Additional screencasts are continually being added to this series.
Calendar of workshops
Wednesday (Alexander)
1:104:10 pm 
Thursday (LSM)
123 pm 

September 9  Introduction to SPSS, Stata, and SAS  September 10 
September 16  Introduction to R  September 17 
October 7  Data Visualization in R  September 24 
October 14  Special Topics:
Time Series in R, Survival Analysis in R, Big Data in Brief 
October 8 
Register for the workshops here
Description of Workshops:
§ Introduction to R (September 16 or September 17) – This session provides a threepart orientation to the R programming environment. R is freely available, open source statistical software that has been widely adopted in the research community. Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool. No prior knowledge is assumed. The three parts cover:
 Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
 Graphics: comparison of graphing techniques in base R, lattice, and ggplot2 packages
 Data Manipulation: data import and transformation, additional methods for working with large data sets
Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Introduction to SPSS, Stata, and SAS (September 9 or September 10) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset. If you are already using these packages with some degree of success, you may find these sessions too basic for you.
 SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines. Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SPSS is also available in campus computer labs and via the Apps server (see below).
 Stata is flexible and allows relatively easy access to programming features. It is popular in economics among other areas. Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users. Find it at software.rutgers.edu.
 SAS is a powerful and longstanding system that handles large data sets well, and is popular in the pharmaceutical industry, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year. Find it at software.rutgers.edu. SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).
Note: Accessing software via apps.rutgers.edu
SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.edu. apps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.
§ Data Visualization in R (October 7 or September 24) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R. Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background. The three parts are:
 Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages. Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
 Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
 3D, Interactive and Big Data: presentation of 3D data, interactive exploration data, and techniques for large datasets
Additional R resources can be found here: http://libguides.rutgers.edu/data_R
R is freely downloadable from http://rproject.org
§ Special Topics (October 14 or October 8) covers a few different specialized areas. The three parts presented during the afternoon workshop are not related.
 Time Series in R: review of commands and techniques for basic time series analysis in R. Scripts at http://libguides.rutgers.edu/data_R
 Survival Analysis in R: review of commands and techniques for basic survival analysis in R. Scripts at http://libguides.rutgers.edu/data_R
 Big Data in Brief: an introduction to some of the techniques and software environments used to work with big data, with pointers to resources for further learning at http://libguides.rutgers.edu/bigdata
Recent Comments