NICO 101 - Introduction to Programming for Big Data

Profs. Adam Pah and Luis Amaral

View the Project on GitHub amarallab/Introduction-to-Python-Programming-and-Data-Science

NICO 101 is designed for students who have little to no previous experience with programming. Through interactive instruction and project-based work we plan to teach you the basics of programming in Python and computational data analysis.

Northwestern undergraduate students can register for the class immediately on Caesar. Graduate or professional students can register for the class on Caesar with a permission number, contact Adam Pah for a permission number.

Requirements

You must have a laptop with a current version of Windows or OS X

For Windows, you must be using at least Windows 7.

For Macs, you must be using OS X 10.9 or later.

This class uses the Anaconda Python 3.5 distribution (Important!!! Install the Python 3.5 distribution, which is the right hand option for each operating system).

There are videos to help you understand the installation process; however, it is a simple installer package that should be similar to any other program (so don't be afraid!).

You must have Anaconda Python 3.5 installed before the first day of class.

Downloading the course materials

The course materials can be downloaded from the icon at the top of this web page (click the .zip folder icon). Just download the zip file, unzip it onto your Desktop, and rename the directory NICO-101.

Usage of Course Materials

Usage of Course Materials

This text and the majority of the course will conducted with Jupyter Notebook. Jupyter Notebook is a 'web-based interactive computational environment', meaning that it allows to write and execute python code in a web page from your own computers. Jupyter Notebook is a relatively new tool and we believe that is an excellent way to teach the basics of python programming and computational data analysis.

Jupyter Notebook is installed by default with the Anaconda Python distribution and can be launched from the Anaconda Navigator program. We have an introductory video that details how to launch and use Jupyter notebook.

Expectations

We expect that students will attend all class sessions, complete the assignments with original work of their own, and participate in both the class overall and in group assignments.

At the end of this class you will be able to:

This course is intended as an introduction and does not explicitly use any 'big' datasets during the quarter, what it does do is teach you the fundamentals of programming and analysis that can then be scaled to any size data. As a part of this we will discuss the basics of statistical analysis and how that can be applied to datasets.

Syllabus

What follows is the daily schedule for the course. We reserve the right to change the timing of the topics and overall coverage to best suit the class (i.e. changes can occur).

Day 1. Course Overview, Introduction to Jupyter Notebook, Basic Data Types, Flow Control, Errors

Day 2. Lists, Tuples, and Sets; File I/O; Review

Day 3. The Python Standard Library, Data Visualization, Functions

Day 4. Dictionaries, Review, Mini-Project

Day 5. Text Analysis, Regular Expressions, Sentiment Analysis

Day 6. Introduction to APIs, Reading and Posting with APIs, Web Scraping

Day 7. Statistical Analysis with Python, Bootstrapping Monte Carlo Chains, Model Fitting, Structured data analysis

Day 8. Image Manipulation, Image Analysis, Mini-Project

There will be a homework assigned everyday that mirrors the concepts covered that day. The final will be a group project of your own choosing that will use all of the skills learned up to that point.