Surrey University Stag

BIOINFORMATICS AND DATA SCIENCE - 2022/3

Module code: BMSM031

Module Overview

Bioinformatics and data science underlies many academic disciplines and are essential for epidemiologists, geneticists, biologists, and biomedical scientists. Computer programming (coding) and data science are salient skills needed for securing academic and industry jobs.

Scientists in Life and Health sciences translate hypotheses and research questions into coherent study designs, that are then implemented to generate new data. Scientists apply notions of causality, statistical inference, and artificial intelligence to analyse data and to generate interpretable, reproducible and valid results. This module provides an introduction to data analysis and study design with the aim of enabling graduate students to independently query nature in a coherent and reproducible way.

The module will contribute to the five pillars of graduate learning at the university of surrey.
To enhance employability, students will be equipped with essential skills for the future, like communications and writing skills, the logic of scientific inference, principles of statistical analysis, artificial intelligence, causal inference and experimental design. The module will develop communication skills in seminars and group work during workshops. The classroom will be a safespace where students learn by giving and receiving feedback. To extend digital competencies, the student will acquire computer programming and big data analysis using statistical and artificial intelligence methods. To expand cultural and global capabilities, students will be exposed to data analysis examples from global health and one health issues across the world. To improve resourcefulness and resilience, students will work independently to apply the learned methodologies in a mini-project, which will require self-regulation as the work is distributed across the duration of the module. There will be opportunities to get feedback on the work conducted during seminar presentations and drop-in sessions.

Module provider

School of Biosciences and Medicine

Module Leader

DA SILVA COUTO ALVES Alexessander (Biosc & Med)

Number of Credits: 15

ECTS Credits: 7.5

Framework: FHEQ Level 7

JACs code:

Module cap (Maximum number of students): 40

Overall student workload

Workshop Hours: 33

Independent Learning Hours: 114

Seminar Hours: 3

Module Availability

Semester 2

Prerequisites / Co-requisites

None

Module content

Indicative content includes:

1) Introduction to programming in R
- Introduction to R environment and RStudio
- Data types
- Input and Output of Data
- Exploring, formatting and manipulating data
- Plotting and Saving Plots.
- Conditionals and Loops
- Functions

2) Introduction to statistics and data analysis
- Descriptive statistics
- Statistical Inference
- Multiple testing correction
- Linear regression

3) Computational genomics
- Sequence databases
- Sequence alignment
- Blast Search
- Sequence annotation

4) Gene expression analysis
- Pre-processing of RNA-Seq data
- Differential gene expression analysis of RNA-Seq data
- Pathway and Gene ontology enrichment

5) Introduction to Artificial Intelligence
- Clustering and unsupervised learning
- Classification and supervised learning
- Overfitting and Model Complexity

6) Introduction to causal inference
- Definition of causality as a concept
- Difficulties with causal inference
- The counterfactual framework
- Definitions of treatment effect

7) Design of Observational and Experimental Studies
- Measure of effect and measures of occurance.
- Experimental studies: Blocked designs, Randomized clinical trials, Adaptive clinical trials
- Observational studies: Cohort, Case-control and Cross-sectional

Assessment pattern

Assessment type Unit of assessment Weighting
Oral exam or presentation Seminar: Study design and analysis plan 30
Coursework Written report and source code 70

Alternative Assessment

N/A

Assessment Strategy

The assessment strategy is designed to provide students with the opportunity to demonstrate they have achieved the learning outcomes by testing their ability to:
-Evaluate methodologies and study designs to address concrete research questions
-Design, develop and implement a data analysis plan
-Demonstrate effective communication and computer programming skills

Thus, the summative assessment for this module consists of:
- 30% Seminar. The student will make an oral (PowerPoint) presentation of a study design and analysis plan to answer a research question. During the seminar, the student will obtain feedback from their peers and will be assessed on i) communication skills and on ii) the coherence with which the the study design and the analysis plan addresses the research question.
- 70% report. At the end of the module, the student will submit a mini-report with the outcome of the miniproject. The report will be assessed for i) the coherence of analysis and results, ii) the ability to display data (tables and graphs), iii) the ability to interpret results and iv) the understanding of limitations in statistical methodology and study design.
The student will submit the source code of the analysis plan for verification and evaluation. Good coding practices like modularity and encapsulation, pertinent use of software packages, good model building skills like controlling batch effects and adequate exploration of parameter space will be assessed.

Formative assessment
-Practical sessions will provide students with opportunities to assess their progress

Feedback
-Seminars and practical sessions will provide opportunities for feedback from team members and from teacher

Module aims

  • Introduce students to the field of bioinformatics and data science.
  • Develop an effective command of computer programming (coding) in R.
  • Understand the inter-relation between experimental design and statistical methodology when addressing a research question.
  • Understand the impact of sampling uncertainty and adequate statistical inference on the reproducibility of experimental and observational results.
  • Independently construct study designs and analysis plans to answer research questions. This involves independently select, conduct, interpret and present results obtained by statistical and artificial intelligence methodologies.
  • Evaluate statistical and bioinformatics analyses and critically interpret results presented in a scientific paper

Learning outcomes

Attributes Developed
001 To develop, debug and test data analysis scripts and functions in the programming language R KP
002 To select and apply the appropriate descriptive statistics for categorical and numeric variables CKP
003 To apply the adequate statistical hypothesis test for questions involving measures of central tendency and 1-way association tables CKP
004 To display data graphically and interpret graphs KP
005 To conduct and interpret linear regression analysis CKP
006 To understand the basic elements of causal inference, and know their importance for the construction of designs and interpretation of data analyses CKPT
007 To construct good designs for observational and experimental studies CKPT
008 To apply and interpret the results of supervised and unsupervised learning and understand the fundamentals of machine learning like model complexity, model selection, overfitting. CKP
009 Apply and interpret results of methodologies to align and annotate the genomes CKP
010 Apply and interpret results of methodologies to analyze gene expression data CKP

Attributes Developed

C - Cognitive/analytical

K - Subject knowledge

T - Transferable skills

P - Professional/Practical skills

Methods of Teaching / Learning

The learning and teaching strategy is designed to:
-Practice both programming and application of data analysis methods to realistic problems
-Foster independent and critical thinking about data science and bioinformatics
-Communicate clearly and succinctly both verbally and in writing

The learning and teaching methods include:
1. Interactive active-learning sessions combining lecture exposition and computer practicals to provide opportunities for hands-on programming, practical application of theoretical concepts, and face-to-face feedback and guidance. (33h)
2. Seminars (3h)
3. Independent learning - Miniproject using a problem-based learning framework (114h)

Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.

Reading list

https://readinglists.surrey.ac.uk
Upon accessing the reading list, please search for the module using the module code: BMSM031

Other information

N/A

Programmes this module appears in

Programme Semester Classification Qualifying conditions
Biomedical Science MSci (Hons) 2 Optional A weighted aggregate mark of 50% is required to pass the module

Please note that the information detailed within this record is accurate at the time of publishing and may be subject to change. This record contains information for the most up to date version of the programme / module for the 2022/3 academic year.