## Causal Inference Stanford Homework Solutions

**General information**

**Lecture**: Tuesdays, 7:10-9:00pm, in Warren Weaver Hall 101**Recitation/Laboratory (required for all students)**: Thursdays, 5:10-6:00pm in Silver Center 401

Instructor: | David Sontag | | dsontag {@ | at} cs.nyu.edu | |

Lab instructor: | Yacine Jernite | jernite {@ | at} cs.nyu.edu | ||

Grader: | Prasoon Goyal | pg1338 {@ | at} nyu.edu |

Office hours: Tuesdays, 10:30-11:30am. Location: Center for Data Science, 726 Broadway, 7th Floor

**Grading**: problem sets (55%) + midterm exam (20%) + final exam (20%) + participation (5%). Problem Set policy

**Book (required)**: Kevin Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012. I recommend the latest (4th) printing, as the earlier editions had many typos. You can tell which printing you have as follows: check the inside cover, below the "Library of Congress" information. If it says "10 9 8 ... 4" you've got the (correct) fourth print.

** Mailing list:** To subscribe to the class list, follow instructions here.

**Statistics 209 / HRP 239/ Education 260A**

**Winter 2018**

Statistical Methods for Group Comparisons and Causal Inference

**David Rogosa**

Lecture: WF, 2:30 - 4 Sequoia 200

course web page at http://rogosateaching.com/stat209/

To see full course materials from Winter 2016 go here

Instructor. David Rogosa, Sequoia 224, rag {AT} stanford {DOT} edu .

Office hours W,F 4 - 4:45.

TA Michael Sklar sklarm {AT} stanford {DOT} edu

Office Hours TH 2 - 3:30 in Sequoia Hall 220

*Registrar's Information*

STATS 209: Statistical Methods for Group Comparisons and Causal Inference (EDUC 260A, HRP 239) Description Critical examination of statistical methods in social science and life sciences applications, especially for cause and effect determinations. Topics: mediating and moderating variables, potential outcomes framework, encouragement designs, multilevel models, heterogeneous treatment effects, matching and propensity score methods, analysis of covariance, instrumental variables, compliance, path analysis and graphical models, group comparisons with longitudinal data. Prerequisite: intermediate-level statistical methods. Terms: Win | Units: 3 | Grading: Letter or Credit/No Credit 2017-2018 Winter STATS 209 | 3 units | Class # 31306 | Section 01 | Grading: Letter or Credit/No Credit | LEC | 01/08/2018 - 03/16/2018 Wed, Fri 2:30 PM - 4:20 PM at Sequoia Hall 200 with Rogosa, D. (PI) Instructors: Rogosa, D. (PI)

**Course Overview**

For students who have had intermediate-level instruction in statistical methods including multiple regression, logistic regression, log-linear models.

At the very least, the content of the course should provide some consolidation of previous instruction in statistical methods.

The goal is also to instill some introspection and critical analysis for the uses of statistical methods common in social science and medical applications, for experimental and observational studies.

The focus of the course is on understanding what useful information statistical modeling can provide in experimental and especially non-experimental social science settings.

*Quick Course Outline*

Week 1. Course Introduction; properties of regression models Week 2. Experiments vs observational studies; Neyman-Rubin-Holland formulation; encouragement designs; Week 3. Path analysis and causal modeling, multiple regression with pictures. Graphical models. Week 4. Multilevel data. Contextual effects, aggregation bias, mixed effects models Week 5. The many uses and forms of analysis of covariance, including heterogeneous treatment effects and regression discontinuity designs) Week 6. Instrumental variable methods, simultaneous equations, reciprocal effects Week 7. Compliance and experimental protocols; intent to treat and compliance adjustments Week 8. Matching and propensity score methods Week 9. Time-1, Time-2 group comparisons for Experimental designs and Observational studies Dead Week. Overflow and course summary.

**Texts (optional).**

*Statistical Models: Theory and Practice*David Freedman (2005) Revised edition (2009).

This course was created in 2005 around David Freedman's text, and covers that material using auxiliary texts and online materials.

One intent of this course is for students to read some statistical literature and actual research reports to augment the texts (on that theme Freedman's text actually includes reprints of four published empirical research papers which are also available through Jstor).

A Primary resource for R and data analysis.

*Data analysis and graphics using R*(2007) J. Maindonald and J. Braun, Cambridge 2nd edition 2007. 3rd edition 2010 short draft version in CRAN

Text resource page R-packages for Text Data Sets etc R-Package DAAG R-Package DAAGxtras

Additional resources,

*Design of observational studies*. Rosenbaum, Paul R. New York : Springer, c2010. Stanford access

Causal Inference in Statistics, Social and Biomedical Sciences: An Introduction, Guido Imbens and Don Rubin, 1st Edition (Cambridge University Press) Stanford access

*Data analysis and regression: A second course in statistics*. Mosteller, F. and Tukey, J. W. (1977) (the green book)

*Matched Sampling for Causal Effects*, Donald B. Rubin Cambridge University Press 2006

*Observational Studies*Paul R. Rosenbaum, Publisher: Springer; 2 edition (January 8, 2002)

David Freedman

*Statistical Models and Causal Inference*e Cambridge 2010 ISBN 978-0-521-19500-3

*Regression Analysis : A Constructive Critique*Richard A Berk (2003). Table of contents

Jan de Leeuw, Preface to Berk's "Regression Analysis: A Constructive Critique"

**Grading, Homework and Exams.**

Weekly homework assignments following class content will be posted, along with solutions. Homeworks are not graded.

*Assessment*. Two take home problem sets will be scheduled:

TH1 covering content weeks 1-4.

TH2 covering content weeks 5-8.

In class exam,

*Exam 3*scheduled by registrar, exam week. My best reading of the Registrar's chart indicates Monday March 19 2018 at 3:30 PM (in our classroom). If needed, Exam 3 can be taken remotely.

See also

Course Assignments Page

*Note to auditors*. We should have plenty of room in Sequoia 200 for auditors.

The Registrar does have a form (no-fee) for faculty, staff, post-docs: Application for Auditor or Permit to Attend (PTA) Status

**Statistical computing**

Class presentation will be in, and students are encouraged to use, R, (with occasional reference to SAS, Mathematica, and Matlab).

1/7/09. NYTimes endorses R: Data Analysts Captivated by R's Power

We have a set of 4 computer labs to supplement lecture materials (weeks 2, 4, 6, 8).

**Lab 1.**Multiple regression basics Lab1 posted 1/18/18

**Lab 2.**Multilevel analysis (mixed-effects models)

*High School and Beyond*example. Lab 2 posted 2/4/18

Lab 2 has evolved in three pieces.

a. Lab2, exposition and commands provides a full write up (annotated) of the analyses

b. Lab 2, Rogosa R-session (nlme legacy version)

c. Lab2 (priority Rogosa session) redone using lme4, lmer (with additional plots) Lecture slide, lme lmer for Bryk data

For those who are strapped for time or otherwise saturated, I provide a full single Bryk dataset that skips over the data manipulation portion of the activity

Additional materials for HSB analyses are posted in Week 4 Lecture topics, sec 3(iii)

**Lab 3,**Instrumental Variables.

Lab3, exposition and commands

Lab 3, Rogosa R-session Mroz87 data description Lab3 posted 2/15/18

note: I triple-checked and the dataset is where the description indicates and reads in the 753 cases.

**Lab 4**Matching and propensity scores. Lalonde job training data

This lab is arranged in pieces

a. Lab4, exposition and commands posted

b. Lab 4, Rogosa R-session, Base (sections 1-3) posted

c. Lab 4, Rogosa R-session, additional matching exercises (incl secs 4-6) posted

d. Lab 4, Rogosa R-session: not done until ancova is run posted

Current version of R is R version 3.4.3. R version 3.4.3 (Kite-Eating Tree) has been released on 2017-11-30. (I'm currently running 3.3.3 I see). For references and software: The R Project for Statistical Computing Closest download mirror is Berkeley

The CRAN Task View: Statistics for the Social Sciences provides an overview of relevant R packages. Also of interest are CRAN Task View: Psychometric Models and Methods and CRAN Task View: Design of Experiments (DoE) and Analysis of Experimental Data

In prior fall qtrs I did short 5 week intro R-course intended for users of other statistical packages; see Ed401 page

Among the infinite number of introduction to R resources is John Verzani's page A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li. Another version

Even more stuff: According to Peter Diggle: "The best resource for R that I have found is Karl Broman's Introduction to R page." And a remarkably useful set of R-resources from Murray State

Wm. Revelle who develops the psych package also has a draft text which covers standard statistics plus specialized measurement topics (plus other R intros)

For those with a life sciences background a useful resource may be the book Analysis of epidemiological data using R and Epicalc and the Epicalc package.

For categorical data, especially if you've had a course using Agresti, the lengthy guide by Laura Thompson has more than you want to know.

## Leave a Comment

(0 Comments)