Data Science
Table of contents
Data Science specialism modules
Data Science
This module will develop your data science skillset so that you’ll be able to write programs that can read, process and analyse textual and numerical data. You will be able to generate plots and interactive visualisations of data and understand how to apply statistical methods to the interpretation of results. You’ll be able to use data analysis in the decision-making process. You’ll also learn about application domains for data science.
Databases and Advanced Data Techniques
This module aims to show you how to work with data in your computer programs. You will learn how to use SQL and NoSQL databases to store tabular data and documents. You will learn about the ethics of gathering and processing data and why it is important to consider issues around data security. You will learn about open data resources, and how you can access them from your computer programs. You will learn about audio and video data, and the challenges of working with this kind of data.
Machine Learning and Neural Networks
This module provides a broad view of machine learning and neural networks. You’ll learn how to solve common machine learning problems such as regression, classification, clustering, matrix completion and pattern recognition. You’ll explore how neural networks can be trained and optimised. You’ll learn how to develop machine learning systems rapidly, and you will learn how to verify and evaluate the results.
Advanced Web Development
Advanced Web Development teaches you how to build dynamic, data-driven websites using databases, front-end frameworks and server-side programming. You’ll develop the skills needed for full stack web development work and develop a web developer skillset, enabling you to build and deploy complete, data-driven websites. You’ll consider different technologies for clientside web development such as HTML, CSS, JavaScript and templates. You’ll explore methods for developing server-side web applications, by building web-accessible wrappers around databases, consider scalability issues and learn about web app configuration and deployment.
Natural Language Processing
Natural Language Processing (NLP) provides a grounding in both rule-based and statistical approaches to NLP, combining theoretical study with hands-on work employing widely used software packages. The module focuses on text processing and you’ll learn about how to work with text-based natural language in your programs. You’ll explore grammars and how they can be used to analyse text. You’ll learn how to use statistical analysis to extract information from and classify text. You’ll use appropriate programming libraries to implement NLP workflows.
Resources
Jupyter
- Jupyter Notebook for Beginners: A Tutorial
- Six easy ways to run your Jupyter Notebook in the cloud
- Tutorial: Advanced Jupyter Notebooks
Pandas
- Modern Pandas - “This series is about how to make effective use of pandas, a data analysis library for the Python programming language. It’s targeted at an intermediate level: people who have some experience with pandas, but are looking to improve.”
- Official website - “pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.”
Working with data (Pandas, NumPy, Matplotlib, IPython, Scikit-Learn…)
- Kaggle: Micro-Courses - “Practical data skills you can apply immediately: that’s what you’ll learn in these free micro-courses. They’re the fastest (and most fun) way to become a data scientist or improve your current skills.”
- Python Data Science Handbook: full text in Jupyter Notebooks - “This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.”