BIOINFORMATICS AND DATA SCIENCE - 2023/4
Module code: BMSM031
Bioinformatics and data science underlies many academic disciplines and are essential for epidemiologists, geneticists, biologists, and biomedical scientists. Computer programming (coding) and data science are salient skills needed for securing academic and industry jobs.
Scientists in Life and Health sciences translate hypotheses and research questions into coherent study designs, that are then implemented to generate new data. Scientists apply notions of causality, statistical inference, and artificial intelligence to analyse data and to generate interpretable, reproducible and valid results. This module provides an introduction to data analysis and study design with the aim of enabling graduate students to independently query nature in a coherent and reproducible way.
The module will contribute to the five pillars of graduate learning at the university of surrey.
To enhance employability, students will be equipped with essential skills for the future, like communications and writing skills, the logic of scientific inference, principles of statistical analysis, artificial intelligence, causal inference and experimental design. The module will develop communication skills in seminars and group work during workshops. The classroom will be a safespace where students learn by giving and receiving feedback. To extend digital competencies, the student will acquire computer programming and big data analysis using statistical and artificial intelligence methods. To expand cultural and global capabilities, students will be exposed to data analysis examples from global health and one health issues across the world. To improve resourcefulness and resilience, students will work independently to apply the learned methodologies in a mini-project, which will require self-regulation as the work is distributed across the duration of the module. There will be opportunities to get feedback on the work conducted during seminar presentations and drop-in sessions.
School of Biosciences
COUTO ALVES Alex (Biosciences)
Number of Credits: 15
ECTS Credits: 7.5
Framework: FHEQ Level 7
Module cap (Maximum number of students): 40
Overall student workload
Workshop Hours: 33
Independent Learning Hours: 114
Seminar Hours: 3
Prerequisites / Co-requisites
Indicative content includes:
1) Introduction to programming in R
- Introduction to R environment and RStudio
- Data types
- Input and Output of Data
- Exploring, formatting and manipulating data
- Plotting and Saving Plots.
- Conditionals and Loops
2) Introduction to statistics and data analysis
- Descriptive statistics
- Statistical Inference
- Multiple testing correction
- Linear regression
3) Computational genomics
- Sequence databases
- Sequence alignment
- Blast Search
- Sequence annotation
4) Gene expression analysis
- Pre-processing of RNA-Seq data
- Differential gene expression analysis of RNA-Seq data
- Pathway and Gene ontology enrichment
5) Introduction to Artificial Intelligence
- Clustering and unsupervised learning
- Classification and supervised learning
- Overfitting and Model Complexity
6) Introduction to causal inference
- Definition of causality as a concept
- Difficulties with causal inference
- The counterfactual framework
- Definitions of treatment effect
7) Design of Observational and Experimental Studies
- Measure of effect and measures of occurance.
- Experimental studies: Blocked designs, Randomized clinical trials, Adaptive clinical trials
- Observational studies: Cohort, Case-control and Cross-sectional
|Assessment type||Unit of assessment||Weighting|
|Oral exam or presentation||Seminar: Study design and analysis plan||30|
|Coursework||Written report and source code||70|
The assessment strategy is designed to provide students with the opportunity to demonstrate they have achieved the learning outcomes by testing their ability to:
-Evaluate methodologies and study designs to address concrete research questions
-Design, develop and implement a data analysis plan
-Demonstrate effective communication and computer programming skills
Thus, the summative assessment for this module consists of:
- 30% Seminar. The student will make an oral (PowerPoint) presentation of a study design and analysis plan to answer a research question. During the seminar, the student will obtain feedback from their peers and will be assessed on i) communication skills and on ii) the coherence with which the the study design and the analysis plan addresses the research question.
- 70% report. At the end of the module, the student will submit a mini-report with the outcome of the miniproject. The report will be assessed for i) the coherence of analysis and results, ii) the ability to display data (tables and graphs), iii) the ability to interpret results and iv) the understanding of limitations in statistical methodology and study design.
The student will submit the source code of the analysis plan for verification and evaluation. Good coding practices like modularity and encapsulation, pertinent use of software packages, good model building skills like controlling batch effects and adequate exploration of parameter space will be assessed.
-Practical sessions will provide students with opportunities to assess their progress
-Seminars and practical sessions will provide opportunities for feedback from team members and from teacher
- Introduce students to the field of bioinformatics and data science.
- Develop an effective command of computer programming (coding) in R.
- Understand the inter-relation between experimental design and statistical methodology when addressing a research question.
- Understand the impact of sampling uncertainty and adequate statistical inference on the reproducibility of experimental and observational results.
- Independently construct study designs and analysis plans to answer research questions. This involves independently select, conduct, interpret and present results obtained by statistical and artificial intelligence methodologies.
- Evaluate statistical and bioinformatics analyses and critically interpret results presented in a scientific paper
|001||To develop, debug and test data analysis scripts and functions in the programming language R||KP|
|002||To select and apply the appropriate descriptive statistics for categorical and numeric variables||CKP|
|003||To apply the adequate statistical hypothesis test for questions involving measures of central tendency and 1-way association tables||CKP|
|004||To display data graphically and interpret graphs||KP|
|005||To conduct and interpret linear regression analysis||CKP|
|006||To understand the basic elements of causal inference, and know their importance for the construction of designs and interpretation of data analyses||CKPT|
|007||To construct good designs for observational and experimental studies||CKPT|
|008||To apply and interpret the results of supervised and unsupervised learning and understand the fundamentals of machine learning like model complexity, model selection, overfitting.||CKP|
|009||Apply and interpret results of methodologies to align and annotate the genomes||CKP|
|010||Apply and interpret results of methodologies to analyze gene expression data||CKP|
C - Cognitive/analytical
K - Subject knowledge
T - Transferable skills
P - Professional/Practical skills
Methods of Teaching / Learning
The learning and teaching strategy is designed to:
-Practice both programming and application of data analysis methods to realistic problems
-Foster independent and critical thinking about data science and bioinformatics
-Communicate clearly and succinctly both verbally and in writing
The learning and teaching methods include:
1. Interactive active-learning sessions combining lecture exposition and computer practicals to provide opportunities for hands-on programming, practical application of theoretical concepts, and face-to-face feedback and guidance. (33h)
2. Seminars (3h)
3. Independent learning - Miniproject using a problem-based learning framework (114h)
Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.
Upon accessing the reading list, please search for the module using the module code: BMSM031
Programmes this module appears in
|Biomedical Science MSci (Hons)||2||Optional||A weighted aggregate mark of 50% is required to pass the module|
Please note that the information detailed within this record is accurate at the time of publishing and may be subject to change. This record contains information for the most up to date version of the programme / module for the 2023/4 academic year.