# APPLICATIONS OF ECONOMETRICS TO BIG DATA - 2024/5

Module code: ECOM076

## Module Overview

The main application of machine learning is out-of-sample prediction. Prediction accuracy is typically evaluated in terms of squared error, where the error is difference between the prediction and the actual realization. In certain situations, such as forecasts of inflation or output from the Bank of England, an accurate prediction is enough. However, there are situations, like policy evaluation, in which we care about causal effects. Suppose that the Secretary of Education introduces three additional hours of mathematics in primary school to increase student GSE scores. Here the objective is to isolate the effect of additional hours of math on GSE score. In general, for each pupil, we have a lot of individual characteristics, which we need to control for. Data reduction techniques, such as LASSO, regression tree, random forest, help to eliminate all irrelevant information so that we can isolate the effect of the policy. This module is structured in two parts. In the first part, we review econometric tools for policy evaluation, such as instrumental variables, panel data, difference-in-difference, synthetic control, regression discontinuity. In the second part, we look at the same techniques when many instruments are available or many additional control variables are available.

Economics

### Module cap (Maximum number of students): N/A

Independent Learning Hours: 65

Lecture Hours: 22

Seminar Hours: 11

Guided Learning: 30

Captured Content: 22

Semester 2

None

## Module content

Endogeneity and Instrumental Variables. Natural Experiments
Overview of Panel Data
Treatment Evaluation. Average and Local Average Treatment Effect
Difference-in-Difference
Synthetic Control Methods
Regression Discontinuity
Policy Evaluation in the presence of many instruments
Policy Evaluation on the presence of many control variables

## Assessment pattern

Assessment type Unit of assessment Weighting
Coursework Coursework Assignment 30
Examination FINAL EXAMINATION (2 HRS) 70

NA

## Assessment Strategy

The assessment strategy is designed to provide students with the opportunity to demonstrate that they have achieved the module's learning outcomes.

Thus, the summative assessment for this module consists of:

A coursework assignment that allows students to undertake a policy evaluation exercise involving the use of big data by formalising a hypothesis of interest, selecting an appropriate data reduction and econometric estimation method, and writing their own code using a suitable software program

An examination that allows students to demonstrate a comprehensive understanding of and ability to evaluate critically methods of causal inference and policy evaluation in the possible presence of high-dimensionality data

Feedback Individual feedback will be provided on students' work during the weekly seminars and when coursework marks are released.

## Module aims

• Equip students with the econometric tools for conducting causal inference, and, so, for evaluating policy effects
• Enable students to undertake policy evaluation in the presence of high-dimensionality data (e.g., many instruments or many control variables)

## Learning outcomes

 Attributes Developed 001 Display a systematic understanding of knowledge, which is at the forefront of the literature on big data techniques and policy evaluation CK 002 Use competently econometric tools for policy evaluation in the presence of high-dimensionality data CKPT 003 Formalize a hypothesis of interest and select appropriate data reduction and econometric estimation methods CK 004 Write their own computer code using a suitable software program CKPT 005 Use their newly acquired knowledge and skills to write an MSc Dissertation on a topic related to the econometrics of big data and policy evaluation CKPT

Attributes Developed

C - Cognitive/analytical

K - Subject knowledge

T - Transferable skills

P - Professional/Practical skills

## Methods of Teaching / Learning

The learning and teaching strategy is designed to enable students to achieve the module's learning outcomes.

There will be two hours of lectures and one hour of seminar every week.

Problem sets based on the methodological topics taught during lectures and computer-based exercises will be reviewed during seminars/tutorials.

Students are expected to work on an assignment and actively participate in the seminar hour.

Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.