Table of contents
- Programming with Data
- Professor(s)
- Topics covered
- Assessment
- Mock exam
- Module overview
- Module specification
- Past exams
- Syllabus
- Resources
Programming with Data
This module will show you how to work with data: getting data from a variety of sources, visualising data in compelling, informative ways, processing data to make it useful and shareable, and reasoning with data to test hypotheses and make parameterised predictions. The module will also introduce you to a new language and programming environment that is well-adapted to languages for these applications.
Professor(s)
- Dr. Sean McGrath
Topics covered
- Setting up the programming environment
- Control structures, functions and comprehensions
- Data-driven programming
- Visualising data
- Descriptive statistics
- Getting data
- Processing data: cleaning, normalizing, and scaling
- Classification with K-nearest neighbours
- Bayes’ theorem and naïve Bayes classification
- Clustering
Assessment
One two hour unseen written examination and coursework (Type I)
Mock exam
See the binary-assets
repository.
Module overview
See the binary-assets
repository.
Module specification
Past exams
Syllabus
Primary programming language
Python
Resources
Complementary learning
Data Science
- Applied Data Science with Python Specialization - “University of Michigan, Coursera.”
- CS 88: Computational Structures in Data Science - “Spring 2021. Instructors: Gerald Friedland, Michael Ball”
- Data 8: The Foundations of Data Science - “The UC Berkeley Foundations of Data Science course combines three perspectives: inferential thinking, computational thinking, and real-world relevance.”
- Data Science playlist - Youtube, by Keith Galli: web scraping, numpy, pandas, plotting, NLP, sklearn.
- Data Science: University of Cambridge - “Department of Computer Science and Technology.”
- Foundations of Data Science: K-Means Clustering in Python - Coursera, by Dr Matthew Yee-King +3 more instructors.
- Machine Learning & Data Science playlist - Youtube, by Derek Banas: probability, statistics, numpy, pandas, plotting, time series
- Statistics with Python Specialization - “University of Michigan, Coursera.”
- The Data Science Design Manual - “Steven Skiena - The Data Science Design Manual serves as an introduction to data science, focusing on the skills and principles needed to build systems for collection, analyzing, and interpreting data.”
Python
- Courses (free) - REPL
- Learn Python, Data Viz, Pandas & More on Kaggle - Kaggle
- Official Python documentation
- Python Data Science Handbook
- Python Design Patterns - “[…] evolving guide to design patterns in the Python programming language.”.
- Videos - REPL/YouTube
- Websites: references - working with data - REPL
Sentiment analysis
Libraries
Matplotlib
- Matplotlib - “Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.”
Numpy
- NumPy - “The fundamental package for scientific computing with Python.”
Pandas
- Pandas Tutorial Playlist - Corey Schafer - YouTube
- 10 minutes to pandas - pydata.org
- Brandon Rhodes - Pandas From The Ground Up - PyCon 2015
- Learn Pandas - Kaggle
- Vincent D. Warmerdam - PyData Eindhoven 2019 - YouTube