Category Archives: Uncategorized

Statistical Software and Data Workshops, Spring 2017

Rutgers University Libraries Data Services Workshop Series (New Brunswick)

Spring 2017

In Spring 2017, Ryan Womack, Data Librarian, will repeat the series of workshops on statistical software, data visualization, and reproducible research as part of the Rutgers University Libraries Data Services.   A detailed calendar and descriptions of each workshop are below.  This semester each workshop topic will be repeated twice, once at the Library of Science and Medicine on Busch Campus, and once at Alexander Library on College Ave.  These sessions will be identical except for location. Sessions will run approximately 3 hours.  Workshops in parts will divide the time in thirds.  For example, the first SPSS, Stata, and SAS workshop (running from 12-3 pm) would start with SPSS at 12 pm, Stata at 1 pm, and SAS at 2 pm.  You are free to come only to those segments that interest you.  There is no need to register, just come!


Location: The Library of Science and Medicine (LSM on Busch) workshops will be held in the Conference Room on the 1st floor of LSM on Mondays from 12 to 3 pm.  The Alexander Library (College Ave) workshops will be held in room 413 of the Scholarly Communication Center (4th floor of Alexander Library) from on Tuesdays from 1:10 to 4:10 pm.

For both locations, you are encouraged to bring your own laptop to work in your native environment.  Alternatively, at Alexander Library, you can use a library desktop computer instead of your own laptop.  At LSM, we will have laptops available to borrow for the session if you don’t bring your own.  Room capacity is 25 in both locations, first come, first served.

If you can’t make the workshops, or would like a preview or refresher, screencast versions of many of the presentations are already available at and Additional screencasts are continually being added to this series.  Note that the “special topics” [Time Series, Survival Analysis, and Big Data] are no longer offered in person, but are available via screencast.

Calendar of workshops

Monday (LSM)

12 noon – 3 pm

  Tuesday (Alexander)

1:10 pm -4:10 pm

January 23 Introduction to SPSS, Stata, and SAS January 24
January 30 Introduction to R January 31
February 6 Data Visualization in R February 7
February 13 Reproducible Research February 14

Description of Workshops:

§ Introduction to SPSS, Stata, and SAS (January 23 or January 24) provides overviews of these three popular commercial statistical software programs, covering the basics of navigation, loading data, graphics, and elementary descriptive statistics and regression using a sample dataset.  If you are already using these packages with some degree of success, you may find these sessions too basic for you.

  • SPSS is widely used statistical software with strengths in survey analysis and other social science disciplines.  Copies of the workshop materials, a screencast, and additional SPSS resources can be found here: SPSS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at  SPSS is also available in campus computer labs and via the Apps server (see below).
  • Stata is flexible and allows relatively easy access to programming features.  It is popular in economics among other areas.  Copies of the workshop materials, a screencast, and additional Stata resources can be found here: Stata is made available by OIRT via campus license with no additional charge to install for Rutgers users.  Find it at
  • SAS is a powerful and long-standing system that handles large data sets well, and is popular in the pharmaceutical industry, among other applications. Copies of the workshop materials, a screencast, and additional SAS resources can be found here: SAS is made available by OIRT at a discounted academic rate, currently $100/academic year.  Find it at  SAS is also available in campus computer labs, online via the SAS University Edition cloud service, and via the Apps server (see below).

Note: Accessing software via

SPSS, SAS, Stata, and R are available for remote access on does not require any software installation, but you must activate the service first at


§ Introduction to R (January 30 or January 31) – This session provides a three-part orientation to the R programming environment.  R is freely available, open source statistical software that has been widely adopted in the research community.  Due to its open nature, thousands of additional packages have been created by contributors to implement the latest statistical techniques, making R a very powerful tool.  No prior knowledge is assumed. The three parts cover:

  • Statistical Techniques: getting around in R, descriptive statistics, regression, significance tests, working with packages
  • Graphics:  comparison of graphing techniques in base R, lattice, and ggplot2 packages
  • Data Manipulation:  data import and transformation, additional methods for working with large data sets, also plyr and other packages useful for manipulation.

Additional R resources, including handouts, scripts, and screencast versions of the workshops, can be found here:

R is freely downloadable from


§ Data Visualization in R  (February 6 or February 7) discusses principles for effective data visualization, and demonstrates techniques for implementing these using R.  Some prior familiarity with R is assumed (packages, structure, syntax), but the presentation can be followed without this background.  The three parts are:

  • Principles & Use in lattice and ggplot2: discusses classic principles of data visualization (Tufte, Cleveland) and illustrates them with the use of the lattice and ggplot2 packages.  Some of the material here overlaps with Intro to R, pt 2, but at a higher level.
  • Miscellany of Methods: illustrates a wide range of specific graphics for different contexts
  • 3-D, Interactive, and Big Data: presentation of 3-D data, interactive exploration data, and techniques for large datasets. Relevant packages such as shiny and tessera are explored.

Additional R resources can be found here:

R is freely downloadable from


§ Reproducible Research (February 13 or February 14) covers

  • Reproducible research describes the growing movement to make the products of research accessible and usable by others in order to verify, replicate, and extend research findings.  Reviews how to plan research, to create publications, code, and data in open, reusable formats, and maximize the impact of shared research findings.  Examples in LaTeX and Rmarkdown are discussed, along with platforms for reusability such as the Open Science Foundation.

Additional resources on reproducible research and data management, including presentation slides, can be found here:


§ Special Topics

Note that the following special topics are no longer covered by in-person workshops, but are available via screencast.

RyanData Relaunching with Statistical and Data Focus

As announced via Tweet, I am migrating my professional Twitter and Blogging activity to the RyanData identity.  The former tagline of rutgersdata will be used by the RUresearch Data Team to announce and discuss issues relating the Rutgers Research Data Services, and I will be contributing there too.

Here, you will find my posts on working with statistical data, the role of data in libraries, data visualization, and other issues that cross the desk of a Data Librarian, along with announcements of statistical software workshops and data services offered by the Rutgers University Libraries in New Brunswick.

I have been doubleplus busy while serving as Faculty Coordinator for the Rutgers University Libraries during this academic year (2012/2013), but I will be fullwise returning to data issues by July, so you can expect to see more posts and tweets here in the near future.

As announced on the former blog, I am refocusing my blog to more topical discussion of data issues from the former emphasis on announcements of new data resources.  The visual style will be minimalist.  The posts, less so.

Change of Focus on the Blog

I am going to be changing the focus of the blog somewhat from this point forward.  I will continue to post announcements about Rutgers data news: workshops, new services, special acquistions, and so on.  However, I no longer have time to track and select highlights of new data sources available (something that could be guessed from the paucity of postings in recent months).  Readers interested in this feature may want to subscribe directly to feeds from ICPSR, Roper, and other major data publishers, which provided the bulk of my highlights.

I will also post some thoughts about issues related to research data, injecting some more opinion into this blog than previously.

I hope readers continue to find this useful.

Assessing Happiness and Competitiveness of World Major Metropolises, 2006

Assessing Happiness and Competitiveness of World Major Metropolises, 2006 empirically examines happiness and community/city conditions assessed by residents living in ten major cities of the world: Beijing, Berlin, London, Milan, New York City, Paris, Seoul, Stockholm, Tokyo, and Toronto. Respondents were asked questions about themselves and their city of residence. Questions focused on a range of topics including the economy, culture and education, welfare, safety, environment, living conditions, city administration, community life, health, and happiness. Demographic questions included city of residence, gender, age, education level, income level, occupation, marital status, and religion.

Crime in Boomburb Cities

Crime in Boomburb Cities: 1970-2004 focused on the effect of economic resources and racial/ethnic composition on the change in crime rates from 1970-2004 in United States cities in metropolitan areas that experienced a large growth in population after World War II. A total of 352 cities in the following United States metropolitan areas were selected for this study: Atlanta, Dallas, Denver, Houston, Las Vegas, Miami, Orange County, Orlando, Phoenix, Riverside, San Bernardino, San Diego, Silicon Valley (Santa Clara), and Tampa/St. Petersburg. Selection was based on the fact that these areas developed during a similar time period and followed comparable development trajectories. In particular, these 14 areas, known as the “boomburbs” for their dramatic, post-World War II population growth, all faced issues relating to the rapid growth of tract-style housing and the subsequent development of low density, urban sprawls. The study combined place-level data obtained from the United States Census with crime data from the Uniform Crime Reports for five categories of Type I crimes: aggravated assaults, robberies, murders, burglaries, and motor vehicle thefts. The dataset contains a total of 247 variables pertaining to crime, economic resources, and race/ethnic composition.

National Survey on Drug Use and Health

The National Survey on Drug Use and Health (2009, 2008 and 2007) reports use of illegal drugs, nonmedical use of prescription drugs, and alcohol and tobacco. Demographic information and substance abuse treatment history are also reported.

This survey is part of SAMHDA (Substance Abuse and Mental Health Data Archive)

The survey was formerly titled the National Household Survey on Drug Abuse.