# MATHEMATICS OF DATA SCIENCE - 2024/5

Module code: MATM065

## Module Overview

Data science is the study of data to extract meaningful and actionable insights at all levels of society such as dynamical systems and social media networks. This module introduces the role of data in society and provides students with the underpinning mathematics that drives data methodology and algorithms. This module then covers wide-ranging topics with a focus on the Surrey brand of data, as research into data is part of the department research agenda.

### Module provider

Mathematics & Physics

SANTITISSADEEKORN Naratip (Maths & Phys)

## Overall student workload

Independent Learning Hours: 69

Lecture Hours: 33

Guided Learning: 15

Captured Content: 33

Semester 2

n/a

## Module content

Indicative content includes:

• Introduction to data science, including the role of data, big data and learning from data;

• Singular value decomposition, principal components;

• Moore-Penrose inverse and Rayleigh-Ritz quotient of matrices;

• Networks, including network Laplacian and Cheeger's constant;

• Clustering, including spectral clustering and K-mean clustering;

• Supervised and unsupervised machine learning, including neural networks;

• Multivariate normal distributions and mixtures thereof;

• Classification using linear discriminant analysis, principal component analysis and Bayesian methods;

• Data assimilation, including Kalman filter theory and discussions on ill-conditioning;

• Advanced regression analysis, including regularisation under the energy norm and L1 regression.

## Assessment pattern

Assessment type Unit of assessment Weighting
School-timetabled exam/test In-semester test (50 minutes) 20
Examination Examination (2 hours) 80

n/a

## Assessment Strategy

The assessment strategy is designed to provide students with the opportunity to demonstrate:¿

• Understanding of fundamental concepts and ability to develop and apply them to a new context.

• Subject knowledge through the recall of key definitions, formulae and derivations.

• Analytical ability through the solution of unseen problems in the test and exam.

Thus, the summative assessment for this module consists of:

• One in-semester test taken during the semester, worth 20% of the module mark, corresponds to Learning Outcome 1, 2.

• A synoptic examination (2 hours), worth 80% of the module mark, corresponds to Learning Outcomes 1, 2, 3, 4, 5, 6.

Formative assessment

There are two formative unassessed courseworks over an 11 week period, designed to consolidate student learning.

Feedback

Individual written feedback is provided to students for formative unassessed courseworks. The feedback is timed such that feedback from the first coursework will assist students with preparation for the assessed coursework. The feedback from both courseworks and the assessed coursework will assist students with preparation for the synoptic examination. Students also receive verbal feedback during lectures and computer lab sessions.

## Module aims

• Extend students' understanding of matrices and enable them to calculate the singular value decomposition, Moore-Penrose inverse and Rayleigh-Ritz quotient of matrices;
• Introduce students to networks and clustering and enable students to use the theoretical constructs within;
• Facilitate students' understanding of machine learning by studying both supervised and unsupervised methodology;
• Enable students to classify data using both frequentist and Bayesian methodology;
• Give students an introduction and motivation to data assimilation in the context of the Kalman filter theory;
• Introduce students to more advanced regression analysis techniques such as regularisation under the energy norm and L1 regression.

## Learning outcomes

 Attributes Developed 001 Students will learn how to work with matrices to extract meaningful insights such as its principal components. KC 002 Students will understand the concepts of networks and clustering, along with basic theory, and examples. KC 003 Students will be able to implement both supervised and unsupervised machine learning. KCT 004 Students will be able to classify data using both frequentist and Bayesian methodology. KCT 005 Students will be able to demonstrate an understanding of data assimilation and its role in improving forecasting. KC 006 Students will be able to use more advanced regression analysis, such as regularisation under the energy norm and L1 regression. KC

Attributes Developed

C - Cognitive/analytical

K - Subject knowledge

T - Transferable skills

P - Professional/Practical skills

## Methods of Teaching / Learning

The learning and teaching strategy is designed to:

Cover the range of data science and underpinning mathematics required to understand the key data algorithms in operation today.

The learning and teaching methods include:

• Three one-hour lectures per week for eleven weeks, with typeset notes to complement the lectures. The lectures provide a structured learning environment with opportunities for students to ask questions and to practice methods taught.

• One exercise sheet per week for eleven weeks to reinforce their understanding and guide their learning. These sheets allow students to tackle questions at their own pace outside of scheduled teaching sessions. Model solutions are provided after students have attempted the questions.

• Two unassessed courseworks to provide students with further opportunity to consolidate learning. Students receive individual written feedback on these as guidance on their progress and understanding.

• Lectures may be recorded or equivalent recordings of lecture material provided. These recordings are intended to give students the opportunity to review parts of lectures that they may not fully have understood and should not be seen as an alternative to attending lectures.

Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.

Upon accessing the reading list, please search for the module using the module code: MATM065

## Other information

The School of Mathematics and Physics is committed to developing graduates with strengths in Digital Capabilities, Employability, Global and Cultural Capabilities, Resourcefulness and Resilience, and Sustainability. This module is designed to allow students to develop knowledge, skills, and capabilities in the following areas:

• Digital Capabilities: The SurreyLearn page for MATM065 features a dynamic discussion forum where students can pose questions and engage with others using e.g. LaTeX and MathML tools. This enhances their digital competencies and facilitates collaborative learning and information sharing.

• Employability: The module MATM065 equips students with skills to be data analysts and other related jobs. Increasingly powerful hardware has allowed for huge datasets to be collected in a wide variety of fields from healthcare to social media. This means that the demand for skills highlighted in this module is only going to increase.

• Global and Cultural Capabilities: Students enrolled in MATM065 originate from a variety of countries and have a wide range of cultural backgrounds. Students are encouraged to work together during problem-solving teaching activities in tutorials and lectures, which naturally facilitates the sharing of different cultures.

• Resourcefulness and Resilience: MATM065 is a module which demands a rigorous data-driven approach. Students will gain skills in constructing models for complex problems, thus building their resourcefulness and resilience.

• Sustainability: Students enrolled in MATM065 learn to make data-driven decisions in a vast variety of areas, including management of scarce resources and climate science.

## Programmes this module appears in

Programme Semester Classification Qualifying conditions
Mathematics with Statistics MMath 2 Optional A weighted aggregate mark of 50% is required to pass the module
Financial Data Science MSc 2 Compulsory A weighted aggregate mark of 50% is required to pass the module
Computer Science MEng 2 Optional A weighted aggregate mark of 50% is required to pass the module
Mathematics MMath 2 Optional A weighted aggregate mark of 50% is required to pass the module
Mathematics MSc 2 Optional A weighted aggregate mark of 50% is required to pass the module

Please note that the information detailed within this record is accurate at the time of publishing and may be subject to change. This record contains information for the most up to date version of the programme / module for the 2024/5 academic year.