PRINCIPLES OF DATA SCIENCE - 2024/5

Module code: MATM063

Module Overview

This module introduces programming in Python for data science, with a focus on data pre-processing, data mining and analysis, machine learning and deep learning. Besides the practical hands-on experience with writing code, this course also covers the theoretical background on different data analysis techniques and machine learning approaches. The goal is to develop an understanding of how information can be extracted from data and how this information can be further used to make predictions, but importantly how this is done practically in terms of writing clear and transparent source code. Using real-world data sets and illustrative examples, this course will help to develop a theoretical understanding of data science as well as practical experience by developing useful software tools. Many of the techniques acquired through this module are likely to be of potential use in the dissertation project.

Module provider

Mathematics & Physics

Module Leader

BAUER Werner (Maths & Phys)

Number of Credits: 15

ECTS Credits: 7.5

Framework: FHEQ Level 7

Module cap (Maximum number of students): N/A

Overall student workload

Independent Learning Hours: 77

Laboratory Hours: 22

Guided Learning: 45

Captured Content: 6

Module Availability

Semester 2

Prerequisites / Co-requisites

None.

Module content

Introduction to Python

Indicative contents include: insights into the structure of Python such as objects, instances, attributes, functions, classes, definitions, and so on; the use of packages useful for data science and machine learning; scientific code structuring; loading of python packages; loading and storing of data files; plotting and debugging codes; working on the titanic data set to conduct data pre-processing; and data analysis. 

Machine learning, neural networks, and decision trees

Indicative contents include: machine learning methods, neural networks, decision trees, regression modeling, self-organizing maps, deep learning; practical applications of machine learning such as using pandas for data analysis; applying PyTorch, TensorFlow to train neural; studying on how to predict the outcome of the titanic data set; and other data sets by using machine learning methods.

Assessment pattern

Assessment type Unit of assessment Weighting
Coursework Mid-term coursework 40
Coursework End-of-term coursework 60

Alternative Assessment

N/A

Assessment Strategy

 

The assessment strategy is designed to provide students with the opportunity to demonstrate: 


  • Their understanding of the use of Python as a scientific language to solve problems in data science.

  • Their ability to extract valuable information from the results of their data analysis.

  • Their skills in using machine learning methods to predict outcomes.

  • Their ability to write well-structured, reusable functional code.



Thus, the summative assessment for this module consists of:

One shorter coursework at the middle of the semester; weighted at 40% of the module mark. Covering learning outcomes 1-4.

One substantial coursework to be submitted towards the end of the semester; weighted at 60% of the module mark. Covering learning outcomes 1-4.

Formative assessment and feedback

Students receive written feedback via a number of marked unassessed coursework assignments over an 11 week period. Formative guidance is given on the coursework.

 

 

Module aims

  • To equip students with the skills to program in Python for data science, data analysis, machine learning, and other data-related applications.
  • To equip students with the skills to extract information out of large data sets.
  • To equip students with the skills to make predictions of certain events by using machine learning and artificial intelligence.

Learning outcomes

Attributes Developed
001 Students will be able to demonstrate the ability of using Python for scientific computing, data analysis, machine learning, and data assimilation. KPT
002 Students will be able to demonstrate the capability to apply data analysis tools and to interpret the results and show a systematic understanding of key aspects of selected topics within data science and statistical learning theory. KCPT
003 Students will be able to demonstrate the ability to understand and assess related data science and machine learning methods and their applications and limitations. KCPT
004 Students will be able to demonstrate the capability to implement machine learning algorithms and to use established libraries. PT

Attributes Developed

C - Cognitive/analytical

K - Subject knowledge

T - Transferable skills

P - Professional/Practical skills

Methods of Teaching / Learning

The learning and teaching strategy is designed to provide:


  • A comprehensive introduction to Python with a focus on giving students the experience in implementing Python codes for problem solving in mathematics.

  • Experience in programming in Python to tackle problems in data science, to enhance students’ digital capabilities and employability.

  • Practical experience in analysing data to extract valuable information, to enhance student resourcefulness and resilience.



The learning and teaching methods include:


  • A flipped classroom approach, with videos prepared in advance that cover the theoretical background on the topics. This will allow for opportunities of lively discussion on how theoretical ideas can be implemented in practical situations.

  • 2 x 1 hour computer laboratories per week x 11 weeks. This will include discussions on the content of the video lectures and their applications through practical programming exercises.

  • Assessed coursework to give students practical experience of implementing techniques covered in lectures and lab sessions in an extended piece of work.

  • Several pieces of unassessed coursework to give students experience of using techniques introduced in the module and to receive formative feedback.

  • Laboratories may be recorded. Laboratory recordings are intended to give students the opportunity to review parts of the session that they might not have understood fully and should not be seen as an alternative to attendance at lab sessions.


Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.

Reading list

https://readinglists.surrey.ac.uk
Upon accessing the reading list, please search for the module using the module code: MATM063

Other information

The School of Mathematics and Physics is committed to developing graduates with strengths in Digital Capabilities, Employability, Global and Cultural Capabilities, Resourcefulness and Resilience and Sustainability. This module is designed to allow students to develop knowledge, skills, and capabilities in the following areas:

Digital Capabilities: This foundational maths for data science module teaches students to use computer to solve mathematical problems. This involves learning to programming and learning to apply these skills to solve technical problems. Students also gain experience in programming with Python.

Employability: The ability to draw meaning from large data sets is currently an area that is in high demand in industry. This module teaches the mathematical foundations to allow students to work with large and complex real-world datasets. These skills are highly valuable to employers.

Global and Cultural Capabilities: Mathematics is a global language and the tools and languages used on this module can be used internationally. This module allows students to develop skills that will allow them to reason about and develop applications with global reach and collaborate with their peers around the world.

Resourcefulness and Resilience: This module involves practical problem-solving skills that teach a student how to work with complex and unstructured data sets. The foundational maths taught in this can be applied to a wide range of different scenarios, giving students new techniques for solving problems.

Sustainability: The data analysis skills learned in MATM063 equip students with the skills to analyse data on resource consumption, emissions, and environmental impact, facilitating the development of sustainable practices. Thus, this module plays a role in creating a more sustainable future.

Please note that the information detailed within this record is accurate at the time of publishing and may be subject to change. This record contains information for the most up to date version of the programme / module for the 2024/5 academic year.