MATHEMATICS OF DATA SCIENCE - 2020/1

Module code: MAT3051

Module Overview

This module will introduce the subject of data science. It will start with the role of data in society as motivation, and then move to the core of the module which is mathematical methodology for data analysis, providing students with the underpinning mathematics that drive data algorithms.

The topics will be wide ranging with a focus on the Surrey brand of data, as research into data is part of the department research agenda, and then the module goes full circle and shows how all this theory drives data algorithms at all levels of society.

Module provider

Mathematics

Module Leader

SANTITISSADEEKORN Naratip (Maths)

Number of Credits: 15

ECTS Credits: 7.5

Framework: FHEQ Level 6

Module cap (Maximum number of students): N/A

Overall student workload

Independent Learning Hours: 117

Lecture Hours: 33

Module Availability

Semester 2

Prerequisites / Co-requisites

None

Module content

Topics covered will include some or all of:

- Introduction: the role of data in society, data science, and big data. The concept of learning from data. Mathematical preliminaries such as the background linear algebra, statistics, eigenvalues/eigenvectors, principal components, matrix calculus, Moore-Penrose inverse, Rayleigh-Ritz quotient, positive matrices and the Frobenius-Perron theorem. The most important introductory topic is the singular value decomposition (SVD).

The Introduction will also include some motivating examples that show what can be achieved by analysing data.

- Data assimilation: introduction and motivation; relevance to weather prediction. The theory will be developed in the context of the Lorenz 63 system of differential equations. Variational data assimilation methods (3DVar and 4DVar) including the adjoint method, derivation of error covariance matrices and operational drawbacks including ill-conditioning will be discussed. Practical investigations will include the effect of observational frequency and density.

- Networks and clustering: introduction with familiar examples such as mobile phone networks, social media networks, and evolution of networks. Theoretical constructs such as the Katz centrality theorem, directed versus non-directed networks, the network Laplacian, Cheeger's constant, network clustering (spectral clustering and K-mean clustering), connection to transport in dynamical systems, and bipartite network (spectral co-clustering).

- Data driven modelling: given large amounts of data but not a clear knowledge of the model or governing differential equations, DDM is a methodology for deducing the model. Methodology includes Dynamic mode decomposition (DMD) and attractor reconstruction.

- Deep learning and classification: introduction the classic machine learning and the concepts of classification and regression. Learning from data, the construction of deep neural networks, convolutional neural nets, machine learning, decision trees, and finding patterns in data using linear discriminant analysis, principle component analysis and multivariate regression.

Assessment pattern

Assessment type Unit of assessment Weighting
School-timetabled exam/test In-Semester Test 20
Examination Final Examination 80

Alternative Assessment

N/A

Assessment Strategy

The assessment strategy is designed to provide students with the opportunity to demonstrate:

1) Understanding of fundamental concepts and ability to develop and apply them to a new context.

2) Subject knowledge through recall of key definitions, formulae and derivations.

3) Analytical ability through the solution of unseen problems in the test and examination.

Thus, the summative assessment for this module consists of:

1) One two hour examination at the end of the semester, worth 80% of the overall module mark

2) One fifty minute in-semester test worth 20%

Formative assessment and feedback:

Students receive written feedback via the marked class test. The solutions to the in-semester tests are also reviewed in the lecture. Unassessed courseworks are also assigned to the students, and a sketch of solutions to these are provided. Verbal feedback is provided during lectures and office hours.

Module aims

  • The module aims to cover the range of data science and underpinning mathematics required to understand the key data
    algorithms in operation today.

Learning outcomes

Attributes Developed
001 Demonstrate understanding of the data assimilation and its role in improving forecasting. K
002 Understand the concepts of networks and clustering, along with basic theory, and examples. CKT
003 Understand how large data sets can be used to identify a model, and the theory of data driven modelling. CK
004 Understand classic machine learning, and the evolution into deep learning and the search for patterns in data. CK

Attributes Developed

C - Cognitive/analytical

K - Subject knowledge

T - Transferable skills

P - Professional/Practical skills

Methods of Teaching / Learning

Teaching is by lectures, 3 hours per week for 11 weeks.

Lecture notes are provided. Learning takes place through lectures, exercises and class tests. Blackboards and whiteboards are used for real-time presentation.

Supplementary notes provided via SurreyLearn. Periodic special lectures are devoted to discussion of example sheets.

Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.

Reading list

https://readinglists.surrey.ac.uk
Upon accessing the reading list, please search for the module using the module code: MAT3051

Other information

N/A

Programmes this module appears in

Programme Semester Classification Qualifying conditions
Mathematics MSc 2 Optional A weighted aggregate mark of 40% is required to pass the module
Mathematics with Music BSc (Hons) 2 Optional A weighted aggregate mark of 40% is required to pass the module
Mathematics MMath 2 Optional A weighted aggregate mark of 40% is required to pass the module
Mathematics with Statistics BSc (Hons) 2 Optional A weighted aggregate mark of 40% is required to pass the module
Mathematics BSc (Hons) 2 Optional A weighted aggregate mark of 40% is required to pass the module
Financial Mathematics BSc (Hons) 2 Optional A weighted aggregate mark of 40% is required to pass the module
Mathematics and Physics BSc (Hons) 2 Optional A weighted aggregate mark of 40% is required to pass the module
Mathematics and Physics MPhys 2 Optional A weighted aggregate mark of 40% is required to pass the module
Mathematics and Physics MMath 2 Optional A weighted aggregate mark of 40% is required to pass the module

Please note that the information detailed within this record is accurate at the time of publishing and may be subject to change. This record contains information for the most up to date version of the programme / module for the 2020/1 academic year.