PRINCIPLES OF DATA SCIENCE - 2022/3

Module code: MATM063

Module Overview

This module introduces programming in Python for data science, with a focus on data pre-processing, data mining and analysis, machine learning and deep learning. Besides the practical hands-on experience with writing code, this course also covers the theoretical background on different data analysis techniques and machine learning approaches. The goal is to develop an understanding of how information can be extracted from data and how this information can be further used to make predictions, but importantly how this is done practically in terms of writing clear and transparent source code. Using real-world data sets and illustrative examples, this course will help to develop a theoretical understanding of data science as well as practical experience by developing useful software tools. Many of the techniques acquired through this module are likely to be of potential use in the dissertation project.

Module provider

Mathematics & Physics

Module Leader

BAUER Werner (Maths & Phys)

Number of Credits: 15

ECTS Credits: 7.5

Framework: FHEQ Level 7

Module cap (Maximum number of students): N/A

Overall student workload

Independent Learning Hours: 106

Laboratory Hours: 22

Guided Learning: 16.5

Captured Content: 5.5

Module Availability

Semester 1

Prerequisites / Co-requisites

None.

Module content

Introduction to Python

Indicative contents include: insights into the structure of Python such as objects, instances, attributes, functions, classes, definitions, and so on; the use of packages useful for data science and machine learning; scientific code structuring; loading of python packages; loading and storing of data files; plotting and debugging codes; working on the titanic data set to conduct data pre-processing; and data analysis.

 

Machine learning, neural networks, and decision trees

Indicative contents include: machine learning methods, neural networks, decision trees, regression modeling, self-organizing maps, deep learning; practical applications of machine learning such as using pandas for data analysis; applying PyTorch, TensorFlow to train neural; studying on how to predict the outcome of the titanic data set; and other data sets by using machine learning methods

Assessment pattern

Assessment type Unit of assessment Weighting
Coursework Mid-term coursework 40
Coursework End-of-term coursework 60

Alternative Assessment

N/A

Assessment Strategy

  The assessment strategy is designed to provide students with the opportunity to demonstrate:


  • Their understanding of using Python as a scientific language to solve problems in data science. 

  • Their ability to extract valuable information out of the results of their data analysis.

  • Their skills in using machine learning methods to predict outcomes.

  • Their ability to write well-structured, reusable functional codes.



 

Thus, the assessment for this module consists of:


  • One shorter coursework at the middle of the term; weighted at 40% of the module mark.

  • One substantial coursework to be submitted towards the end of the semester; weighted at 60% of the module mark.



 

Formative assessment and feedback

 

Students receive written feedback via a number of marked unassessed coursework assignments over an 11 week period. Formative guidance is given on the coursework.

Module aims

  • To equip students with the skills to program in Python for data science, data analysis, machine learning, and other data-related applications.
  • To equip students with the skills to extract information out of large data sets.
  • To equip students with the skills to make predictions of certain events by using machine learning and artificial intelligence.

Learning outcomes

Attributes Developed
001 Demonstrate the ability of using Python for scientific computing, data analysis, machine learning, and data assimilation. KPT
002 Demonstrate the capability to apply data analysis tools and to interpret the results and show a systematic understanding of key aspects of selected topics within data science and statistical learning theory. KCPT
003 Demonstrate the ability to understand and assess related data science and machine learning methods and their applications and limitations. KCPT
004 Demonstrate the capability to implement machine learning algorithms and to use established libraries. PT

Attributes Developed

C - Cognitive/analytical

K - Subject knowledge

T - Transferable skills

P - Professional/Practical skills

Methods of Teaching / Learning

The learning and teaching strategy is designed to provide:


  • A comprehensive introduction to programming in Python with the view towards applications in data science, to enhance their digital capabilities and employability.

  • Experience in implementing Python codes for problem solving.

  • Practical experience in analysing data to extract valuable information, to enhance their resourcefulness and resilience.



 

The learning and teaching methods include:


  • A flipped classroom approach, with videos prepared in advance that cover the theoretical background on the topics. This will allow for opportunities of lively discussion on how theoretical ideas can be implemented in practical situations.

  • 2 x 1 hour lectures per week x 11 weeks. This will include discussions on the content of the video lectures (about 30 min) and their applications through practical programming exercises (about 1.5 hour).

  • Assessed coursework to give students practical experience of implementing techniques covered in lectures and lab sessions in an extended piece of work. 

  • Several pieces of unassessed coursework to give students experience of using techniques introduced in the module and to receive formative feedback.


Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.

Reading list

https://readinglists.surrey.ac.uk
Upon accessing the reading list, please search for the module using the module code: MATM063

Other information

The primary purpose of the course is to enhance digital capabilities of the student in the context of data science. Because applications of data science are becoming increasingly more important in many areas, the course will significantly improve the employability of the students, while through extensive coding practices including design and debugging of codes, the course will enhance their resourcefulness and resilience.

Programmes this module appears in

Programme Semester Classification Qualifying conditions
Mathematics MSc 1 Optional A weighted aggregate mark of 50% is required to pass the module
Mathematics MMath 1 Optional A weighted aggregate mark of 50% is required to pass the module
Mathematical Data Science MSc 1 Compulsory A weighted aggregate mark of 50% is required to pass the module

Please note that the information detailed within this record is accurate at the time of publishing and may be subject to change. This record contains information for the most up to date version of the programme / module for the 2022/3 academic year.