Tag Archives: R

Statistical Software and Data Workshops, Fall 2018

New Brunswick Libraries Data Workshop Series

Fall 2018

This Fall, Ryan Womack, Data Librarian, will offer a series of workshops on statistical software and data visualization as part of New Brunswick Libraries Data Management Services.   A detailed calendar and descriptions of each workshop are below.  The workshop on reproducible research is moving online to YouTube – stay tuned for an upcoming blog post and announcement on its availability.   We also anticipate offering additional workshops through the Graduate Specialist program.  That announcement will be coming in September.

This semester each workshop topic will be repeated twice in person, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave.  These sessions will be identical except for location. Sessions will run approximately 3 hours.  Workshops in parts will divide the time in thirds.  For example, the first SPSS, Stata, and SAS workshop (running from 1:10-4:10 pm) would start with SPSS at 1:10 pm, Stata at 2:10 pm, and SAS at 3:10 pm.  You are free to come only to those segments that interest you.  There is no need to register, just come!   A

Logistics

Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Mondays from 12 to 3 pm.  The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.

For both locations, you are encouraged to bring your own laptop to work in your native environment.  Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop.  At LSM, we will have laptops available to borrow for the session if you don’t bring your own.  Room capacity is 25 in both locations, first come, first served.

If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at https://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series.  Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.

Calendar of workshops

Wednesday (LSM)

12 noon – 3 pm

  Thursday (Alexander)

1:10 pm -4:10 pm 

October 3 Introduction to SPSS, Stata, and SAS September 13
October 10 Introduction to R September 20
October 17 Data Visualization in R September 27

Description of Workshops:

§ Introduction to SPSS, Stata, and SAS (September 13 or October 3) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset.  If you are already using these packages with some degree of success, you may find these sessions too basic for you.

  • SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines.  Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SPSS is also available in campus computer labs and via the Apps server (see below).
  • Stata is flexible and allows relatively easy access to programming features.  It is popular in economics among other areas.  Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users.  Find it at software.rutgers.edu.
  • SAS is a powerful and long-standing system that handles large data sets well, and is popular in the pharmaceutical industry and health sciences, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).

Note: Accessing software via apps.rutgers.edu

SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.eduapps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.

§ Introduction to R (September 20 or October 10) – This session provides a three-part orientation to the R programming environment.  R is freely available, open source statistical software that has been widely adopted in the research community.  Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool.  No prior knowledge is assumed. The three parts cover:

  • Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
  • Graphics:  comparison of graphing techniques in base R, lattice, and ggplot2 packages
  • Data Manipulation:  data import and transformation, additional methods for working with large data sets, also dplyr and other packages from the tidyverse useful for manipulation.

Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

§ Data Visualization in R  (September 27 or October 17) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R.  Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background.  The three parts are:

  • Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages.  Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
  • Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
  • 3-D, Interactive, and Big Data: presentation of 3-D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.

Additional R resources can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

pyramid

 § Special Topics

Note that the following special topics are no longer covered by in-person workshops, but are available via screencast.

Advertisements

Mongolian Multivariate Statistics (at the Mongolian University of Life Sciences)

I was delighted to be invited to return to Mongolia again for a more in-depth visit to the Mongolian University of Life Sciences [Хөдөө Аж Ахуйн Их Сургууль].  I was honored by the invitation from the Dean of the School of Economics and Business [Эдийн засаг бизнесийн сургууль], B. Baasansukh, and P. Munkhtuya, Chair of the Department of Economic Statistics and Mathematical Modeling, to come and teach a one week Short Course on Applied Multivariate Statistical Methods with R [Олон хэмжээст статистикийн богино хэмжээний сургалт R программ дээр].  This post is a brief summary of my trip.

P1020579P1020580

I got to Ulaanbaatar on my first ever flight on MIAT Mongolian Airlines, which was quite comfortable.

On Monday, February 26, we jumped right into work.  The material for the short course was adapted from the book and associated R code by Brian Everitt and Torsten Hothorn, An Introduction to Applied Multivariate Statistical Analysis with R, Springer, 2011.  The eight chapters of the book were covered in four days.  Topics covered were the following:

  • R environment, setup, basics
  • Multivariate Analysis – what is it?
  • Data Exploration and Visualization
  • Principal Components
  • Multidimensional Scaling
  • Exploratory Factor Analysis
  • Confirmatory Factor Analysis
    • Structural Equation Modeling
  • Cluster Analysis
  • Repeated Measures

The Everitt and Hothorn text was particularly useful for its compact treatment of complex topics, and the self-contained nature of the included R code demo modules.

 

 

This slideshow requires JavaScript.

There were a total of 31 participants attending, although course and meeting conflicts prevented some from coming every day.  The MULS School of Economics and Business had 20 participants, with the remaining attendees coming from the Division of Science and Research, the School of Agroecology, the School of Veterinary Medicine, and the School of Engineering and Technology.  Outside of MULS, participants also came from the National Statistics Office of Mongolia and the Cabinet Secretariat of the Government of Mongolia.

 

 

This slideshow requires JavaScript.

All participants followed along by executing sample code on their own laptops.  All presentation materials and code are available from https://github.com/ryandata/multivariate.  Supplemental texts and files were made available using a portable PirateBox to distribute materials via local wireless network.  See this post on how to set up a PirateBox.

 

 

The interaction was lively and particularly aided by assistance in translation from G. Bilguun, O. Amartuvshin, and T. Suvdmaa.  I hope the participants enjoyed it as much as I did!

 

 

Putting our knowledge of R to immediate use, we used R to select a random sample of winners of swag provided by Rutgers’ Master of Business and Science program.

 

 

During the week, I also had the opportunity for several meetings to discuss future collaborations on projects to improve the academic and data infrastructure of MULS, which I believe will be the subject of future collaborations.  The week also included some delicious food, including a hot pot dinner and khorkhog [хopxoг] and khuushuur [хуушууp].

 

 

On Thursday afternoon, I had the opportunity to address a group of undergraduate statistics majors on trends in data science and its intersection with statistics.  I argued that the expansion in the power and availability of open source software and data has made it possible for anyone, from Mongolia or even the United States (where there are arguably more distractions) to study and master the tools and skills that underpin the most dynamic growth sectors for future jobs.

 

 

This slideshow requires JavaScript.

Finally on Friday, we wrapped up the course with some final discussion and the presentation of certificates to participants.

 

 

This slideshow requires JavaScript.

Mongolia is always a land of warm welcome and surprises.  Thanks to my hosts at SEB-MULS for a fantastic trip, filled with learning.  Thanks to D Music for inspiration as always.  And thanks especially to P. Munkhtuya for leading all of the organizational efforts for my visit. I am looking forward to working together with MULS colleagues in the future!  Маш их баярлалаа!

 

 

Statistical Software and Data Workshops, Spring 2018

New Brunswick Libraries Data Workshop Series

Spring 2018

This Spring, Ryan Womack, Data Librarian, will repeat the series of workshops on statistical software and data visualization as part of New Brunswick Libraries Data Management Services.   A detailed calendar and descriptions of each workshop are below.  The workshop on reproducible research is moving online to YouTube – stay tuned for an upcoming blog post and announcement on its availability.

This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave.  These sessions will be identical except for location. Sessions will run approximately 3 hours.  Workshops in parts will divide the time in thirds.  For example, the first SPSS, Stata, and SAS workshop (running from 12-3 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm.  You are free to come only to those segments that interest you.  There is no need to register, just come!

Logistics

Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Mondays from 12 to 3 pm.  The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.

For both locations, you are encouraged to bring your own laptop to work in your native environment.  Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop.  At LSM, we will have laptops available to borrow for the session if you don’t bring your own.  Room capacity is 25 in both locations, first come, first served.

If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series.  Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.

Calendar of workshops

Monday (LSM)

12 noon – 3 pm

  Tuesday (Alexander)

1:10 pm -4:10 pm 

January 29 Introduction to SPSS, Stata, and SAS January 30
February 5 Introduction to R February 6
February 12 Data Visualization in R February 13

Description of Workshops:

§ Introduction to SPSS, Stata, and SAS (January 29 or January 30) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset.  If you are already using these packages with some degree of success, you may find these sessions too basic for you.

  • SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines.  Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SPSS is also available in campus computer labs and via the Apps server (see below).
  • Stata is flexible and allows relatively easy access to programming features.  It is popular in economics among other areas.  Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users.  Find it at software.rutgers.edu.
  • SAS is a powerful and long-standing system that handles large data sets well, and is popular in the pharmaceutical industry and health sciences, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).

Note: Accessing software via apps.rutgers.edu

SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.eduapps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.

§ Introduction to R (February 5 or February 6) – This session provides a three-part orientation to the R programming environment.  R is freely available, open source statistical software that has been widely adopted in the research community.  Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool.  No prior knowledge is assumed. The three parts cover:

  • Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
  • Graphics:  comparison of graphing techniques in base R, lattice, and ggplot2 packages
  • Data Manipulation:  data import and transformation, additional methods for working with large data sets, also dplyr and other packages from the tidyverse useful for manipulation.

Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

§ Data Visualization in R  (February 12 or February 13) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R.  Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background.  The three parts are:

  • Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages.  Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
  • Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
  • 3-D, Interactive, and Big Data: presentation of 3-D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.

Additional R resources can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

pyramid

 § Special Topics

Note that the following special topics are no longer covered by in-person workshops, but are available via screencast.

Statistical Software and Data Workshops, Fall 2017

New Brunswick Libraries Data Workshop Series

Fall 2017

This Fall, Ryan Womack, Data Librarian, will offer a series of workshops on statistical software, data visualization, and reproducible research as part of New Brunswick Libraries Data Management Services.   A detailed calendar and descriptions of each workshop are below.  This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave.  These sessions will be identical except for location. Sessions will run approximately 3 hours.  Workshops in parts will divide the time in thirds.  For example, the first SPSS, Stata, and SAS workshop (running from 12-3 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm.  You are free to come only to those segments that interest you.  There is no need to register, just come!

Logistics

Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Wednesdays from 12 to 3 pm.  The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.

For both locations, you are encouraged to bring your own laptop to work in your native environment.  Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop.  At LSM, we will have laptops available to borrow for the session if you don’t bring your own.  Room capacity is 25 in both locations, first come, first served.

If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series.  Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.

Calendar of workshops

Tuesday (Alexander)

1:10 pm -4:10 pm

   Wednesday (LSM)

12 noon – 3 pm

September 12 Introduction to SPSS, Stata, and SAS September 13
September 19 Introduction to R September 20
September 26 Data Visualization in R September 27
October 3 Reproducible Research October 18

Description of Workshops:

§ Introduction to SPSS, Stata, and SAS (September 12 or September 13) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset.  If you are already using these packages with some degree of success, you may find these sessions too basic for you.

  • SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines.  Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SPSS is also available in campus computer labs and via the Apps server (see below).
  • Stata is flexible and allows relatively easy access to programming features.  It is popular in economics among other areas.  Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users.  Find it at software.rutgers.edu.
  • SAS is a powerful and long-standing system that handles large data sets well, and is popular in the pharmaceutical industry and health sciences, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).

Note: Accessing software via apps.rutgers.edu

SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.eduapps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.

 

§ Introduction to R (September 19 or September 20) – This session provides a three-part orientation to the R programming environment.  R is freely available, open source statistical software that has been widely adopted in the research community.  Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool.  No prior knowledge is assumed. The three parts cover:

  • Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
  • Graphics:  comparison of graphing techniques in base R, lattice, and ggplot2 packages
  • Data Manipulation:  data import and transformation, additional methods for working with large data sets, also dplyr and other packages from the tidyverse useful for manipulation.

Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

 

§ Data Visualization in R  (September 26 or September 27) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R.  Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background.  The three parts are:

  • Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages.  Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
  • Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
  • 3-D, Interactive, and Big Data: presentation of 3-D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.

Additional R resources can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

pyramid

 

§ Reproducible Research (October 3 or October 18) covers

  • Reproducible research describes the growing movement to make the products of research accessible and usable by others in order to verify, replicate, and extend research findings.  This session reviews how to plan research, to create publications, code, and data in open, reusable formats, and maximize the impact of shared research findings.  Examples in LaTeX and Rmarkdown are discussed, along with platforms for reusability such as the Open Science Foundation.

Additional resources on reproducible research and data management, including presentation slides, can be found here: http://libguides.rutgers.edu/datamanagement

 

§ Special Topics

Note that the following special topics are no longer covered by in-person workshops, but are available via screencast.

Mongolia GIS

While researching my upcoming Mongolia trip, I was amazed to discover a treasure trove of Mongolian data already at Rutgers.  Christopher Free, a quantitative ecologist, studies Mongolian fisheries, and is an R and GIS expert to boot.  He has compiled a one-stop archive for Mongolian GIS data and R courses that use Mongolian fish data as examples [much better than sports statistics!].

These are exactly the kinds of global connections I am delighted to make!

Survival Analysis in R video available

As promised earlier, the “special topic” material on Survival Analysis is now available on YouTube in lieu of in-person sessions.  Take a look at the Survival Analysis in R Playlist.

Survival analysis deals with data that may have truncated observations, called censored data.  A typical example is studying the time until failure of a part in engineering, or failure of a part of the human body in medicine (colloquially known as “disease”).  We usually have some accurate data on when the problem occurs until the point that the end of the study is reached.  Then we will have some subjects that survived without failure until the end of the study, but we are uncertain just how long they would have lasted until failure.  The methods of survival analysis account for this partial uncertainty in the data.  R can deal with almost all necessary aspects of survival analysis, but requires some mixing and matching of packages to get the best results, as shown in the videos.

As always, my YouTube videos are fueled by music behind the scenes.  Giving a throwback shoutout to Public Image Limited, some holiday Twice, plus the usual Mongolian suspects.

Statistical Software and Data Workshops, Fall 2016

Rutgers University Libraries Data Services Workshop Series (New Brunswick)

Fall 2016

This Fall, Ryan Womack, Data Librarian, will offer a series of workshops on statistical software, data visualization, and data management, as part of the Rutgers University Libraries Data Services.   A detailed calendar and descriptions of each workshop are below.  This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave.  These sessions will be identical except for location. Sessions will run approximately 3 hours.  Workshops in parts will divide the time in thirds.  For example, the first SPSS, Stata, and SAS workshop (running from 12-3 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm.  You are free to come only to those segments that interest you.  There is no need to register, just come!

Logistics

Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Wednesdays from 12 to 3 pm.  The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Thursdays from 1:10 to 4:10 pm.

For both locations, you are encouraged to bring your own laptop to work in your native environment.  Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop.  At LSM, we will have laptops available to borrow for the session if you don’t bring your own.  Room capacity is 25 in both locations, first come, first served.

If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data and https://youtube.com/librarianwomack. Additional screencasts are continually being added to this series.  Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.

Calendar of workshops

Wednesday (LSM)

12 noon – 3 pm

  Thursday (Alexander)

1:10 pm -4:10 pm

September 21 Introduction to SPSS, Stata, and SAS September 22
September 28 Introduction to R September 29
October 5 Data Visualization in R October 6
October 19 Introduction to Data Management October 13

 

Description of Workshops:

§ Introduction to SPSS, Stata, and SAS (September 21 or September 22) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset.  If you are already using these packages with some degree of success, you may find these sessions too basic for you.

  • SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines.  Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SPSS is also available in campus computer labs and via the Apps server (see below).
  • Stata is flexible and allows relatively easy access to programming features.  It is popular in economics among other areas.  Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users.  Find it at software.rutgers.edu.
  • SAS is a powerful and long-standing system that handles large data sets well, and is popular in the pharmaceutical industry, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).

Note: Accessing software via apps.rutgers.edu

SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.eduapps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.

 

§ Introduction to R (September 28 or September 29) – This session provides a three-part orientation to the R programming environment.  R is freely available, open source statistical software that has been widely adopted in the research community.  Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool.  No prior knowledge is assumed. The three parts cover:

  • Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
  • Graphics:  comparison of graphing techniques in base R, lattice, and ggplot2 packages
  • Data Manipulation:  data import and transformation, additional methods for working with large data sets, also plyr and other packages useful for manipulation.

Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

 

§ Data Visualization in R  (October 5 or October 6) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R.  Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background.  The three parts are:

  • Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages.  Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
  • Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
  • 3-D, Interactive, and Big Data: presentation of 3-D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.

Additional R resources can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

 

§ Introduction to Data Management (October 13 or October 19) covers

  • Best Practices for Managing Your Data – methods to organize, describe, backup, and archive your research data in order to ensure its future usability and accessibility.  Developing good habits for handling your data from the start will save time and frustration later, and increase the ultimate impact of your research.
  • Data Management Plans, Data Sharing and Archiving – targeted to researchers who need to write data management plans (DMPs) and share their data as part of their grant application, research and publication process.  Reviews DMP guidelines, checklist, and general advice, along with options for sharing and permanently archiving research data.
  • Reproducible Research – covers the growing movement to make the products of research accessible and usable by others in order to verify, replicate, and extend research findings.  Reviews how to plan research, to create publications, code, and data in open, reusable formats, and maximize the impact of shared research findings.

Additional data management resources, including presentation slides, can be found here: http://libguides.rutgers.edu/datamanagement

 

 

§ Special Topics

Note that the following special topics are no longer covered by in-person workshops, but are available via screencast.

 

Data Visualization and R

Well, it has been a long time in coming, but I have finally finished converting my Data Visualization workshop series to a screencast video version.  See this YouTube playlist for the complete series, and the materials at Github.  This is the long version of the in-person 3 hour workshop.  The video series goes into even more detail, starting from a history of major developments in visualization, to various implementations of specific graphs, interactive visualizations, web viz, big data, and more.

I also have some ideas for some more up-to-date add-ins that I will probably record as lagniappe videos over the next few weeks.  Those didn’t quite fit into the existing sequence of videos.

The energy to complete these videos came from several musical sources, of which I would credit Harmogu and Linton Kwesi Johnson as leading lights.

Statistical Software and Data Workshops Spring 2016

Rutgers University Libraries Data Services Workshop Series (New Brunswick)

January 2016

This Spring, Ryan Womack, Data Librarian, will repeat the series of workshops on statistical software, data visualization, and data management, as part of the Rutgers University Libraries Data Services.   A detailed calendar and descriptions of each workshop are below.  This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave.  These sessions will be identical except for location. Sessions will run approximately 3 hours.  Workshops in parts will divide the time in thirds.  For example, the first SPSS, Stata, and SAS workshop would start with SPSS at 12, Stata at 1, and SAS at 2.  You are free to come only to those segments that interest you.  There is no need to register, just come!

Logistics

Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Mondays from 12 to 3 pm.  The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.

For both locations, you are encouraged to bring your own laptop to work in your native environment.  Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop.  At LSM, we will have laptops available to borrow for the session if you don’t bring your own.  Room capacity is 25 in both locations, first come, first served.

If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at http://libguides.rutgers.edu/data.  Additional screencasts are continually being added to this series.

Calendar of workshops

Monday (LSM)

12 noon – 3 pm

  Tuesday (Alexander)

1:10 pm -4:10 pm

January 25 Introduction to SPSS, Stata, and SAS January 26
February 1 Introduction to R February 2
February 8 Data Visualization in R February 9
February 15 Special Topics:

Time Series in R, Survival Analysis in R, Big Data in Brief

February 16

 

Description of Workshops:

§ Introduction to SPSS, Stata, and SAS (January 25 or January 26) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset.  If you are already using these packages with some degree of success, you may find these sessions too basic for you.

  • SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines.  Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208425. SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SPSS is also available in campus computer labs and via the Apps server (see below).
  • Stata is flexible and allows relatively easy access to programming features.  It is popular in economics among other areas.  Copies of the workshop materials, a screencast, and additional Stata resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208427. Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users.  Find it at software.rutgers.edu.
  • SAS is a powerful and long-standing system that handles large data sets well, and is popular in the pharmaceutical industry, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: http://libguides.rutgers.edu/content.php?pid=115296&sid=1208423. SAS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at software.rutgers.edu.  SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).

Note: Accessing software via apps.rutgers.edu

SPSS, SAS, Stata, and R are available for remote access on apps.rutgers.eduapps.rutgers.edu does not require any software installation, but you must activate the service first at netid.rutgers.edu.

 

§ Introduction to R (February 1 or February 2) – This session provides a three-part orientation to the R programming environment.  R is freely available, open source statistical software that has been widely adopted in the research community.  Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool.  No prior knowledge is assumed. The three parts cover:

  • Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
  • Graphics:  comparison of graphing techniques in base R, lattice, and ggplot2 packages
  • Data Manipulation:  data import and transformation, additional methods for working with large data sets

Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

 

§ Data Visualization in R  (February 8 or February 9) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R.  Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background.  The three parts are:

  • Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages.  Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
  • Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
  • 3-D, Interactive and Big Data: presentation of 3-D data, interactive exploration data, and techniques for large datasets

Additional R resources can be found here: http://libguides.rutgers.edu/data_R

R is freely downloadable from http://r-project.org

 

§ Special Topics (February 15 or February 16) covers a few different specialized areas.  The three parts presented during the afternoon workshop are not related.

Of related interest:  There is also a Digital Humanities Workshop Series this spring, covering topics including text analysis, network analysis, and digital mapping. See https://dh.rutgers.edu/spring-2016-workshops/ for information on the topics and schedule.

Data Workshops Full, Registration Closed

Somehow the response this Fall was much higher than expected, so all data and statistical software workshop sessions are now full and registration is closed.  Please consult the screencasts, scripts, and handouts at libguides.rutgers.edu/data for a self-guided version of the same material.

The same sessions will run again live in the Spring.