labs & assignments
Data Analytics Introduction and Practicum Assignments & Projects
Assignment1
data maniuplation and aggregation
{the goals of this assignment are: To practice data manipulation with Pandas To develop intuition about the interplay of precision, accuracy, and bias when making predictions To better understand how election forecasts are constructed}
visualization
{Applying different visualization techniques to Part 1} {Project Proposal also attached below}
Results | Research Proposal |
---|---|
Assignment1 | Research Proposal |
Assignment2
scientific computing
{ The goal of this assignment is to introduce Scikit-Learn and its functions, Regression, and PCA, and still more regression. All objects within scikit-learn share a uniform common basic API consisting of three complementary interfaces: an estimator interface for building and fitting models, a predictor interface for making predictions and a transformer interface for converting data. The estimator interface is at the core of the library. It defines instantiation mechanisms of objects and exposes a fit method for learning a model from training data. All supervised and unsupervised learning algorithms (e.g., for classification, regression or clustering) are offered as objects implementing this interface. Machine learning tasks like feature extraction, feature selection or dimensionality reduction are also provided as estimators.}
statistic alanalysis
{In this lab, and in homework 2, we alluded to cross-validation with a weak explanation about finding the right hyper-parameters, some of which were regularization parameters. We will have more to say about regularization soon, but lets tackle the reasons we do cross-validation. The bottom line is: finding the model which has an appropriate mix of bias and variance. We usually want to sit at the point of the tradeoff between the two: be simple but no simpler than necessary. We do not want a model with too much variance: it would not generalize well. This phenomenon is also called overfitting. There is no point doing prediction if we cant generalize well. At the same time, if we have too much bias in our model, we will systematically underpredict or overpredict values and miss most predictions. This is also known as underfitting. Cross-Validation provides us a way to find the “hyperparameters” of our model, such that we achieve the balance point.}
Results | Repository |
---|---|
Assignment2 | [assignment2 repositiory] |
Assignment3
machine learning part1
{Classification
Identifying to which category an object belongs to. Applications: Spam detection, Image recognition. Algorithms: SVM, nearest neighbors, random forest, …
Regression
Predicting a continuous-valued attribute associated with an object. Applications: Drug response, Stock prices. Algorithms: SVR, ridge regression, Lasso, …
Clustering
Automatic grouping of similar objects into sets. Applications: Customer segmentation, Grouping experiment outcomes Algorithms: k-Means, spectral clustering, mean-shift, … }
machine learning part2
{Dimensionality reduction
Reducing the number of random variables to consider. Applications: Visualization, Increased efficiency Algorithms: PCA, feature selection, non-negative matrix factorizations
Model selection
Comparing, validating and choosing parameters and models. Goal: Improved accuracy via parameter tuning Modules: grid search, cross validation, metrics.
Preprocessing
Feature extraction and normalization. Application: Transforming input data such as text for use with machine learning algorithms. Modules: preprocessing, feature extraction. }
Results | Repository |
---|---|
Assignment3 | Assignment3 Repository |
Extra Lab
network analysis
{In this lab we will do the following:
1. Get a LinkedIn API key
2. Use oauth2 to get an acceess token
3. First we are going to download our own LinkedIn data using the LinkedIn API.
4. Then we are exporting this data as a csv file to be able to import it into Gephi.
5. Before starting Gephi we will do some network analysis directly in python
6. We will analyze our data with the external tool Gephi }
big data analytics
{In this week’s lab, we will mostly ignore statistics and instead focus on some practical issues that you will encouter on Homework 4. Section 4 of that homework includes new python techniques (classes, inheritance), an unfamiliar approach to breaking up large computing problems (MapReduce), code that has to be run outside the friendly confines of an ipython notebook, and then you are asked to put it all to use on Amazon’s Elastic Compute Cloud (EC2). This sounds very complicated, but the end result is a simpler algorithm for that problem of calculating similarity scores, as well as the ability to expand to arbitrarily large data sets.}
Extra Lab2
webs craping
- Lab: Web Scraping Part 1 Lab: Web Scraping Part 2 {In this example we will fetch data about countries and their population from Wikipedia}
sampling and text processing
{In this example we will see how to sample data and do text processing}
EMSE 6992 Labs
lab assignments
- Web Scraping
- Exploratory Data Analysis for Classification using Pandas and Matplotlib
- Scikit-Learn, Regression, PCA
- Bias, Variance, Cross-Validation
- Bayes, Linear Regression, and Metropolis Sampling
- Support Vector Machines
- Networks
- MapReduce
Programming For Analytics Assignments & Projects
- List, List Comprehensions, Dictionary, Function, Class
- Numpy Array
- DataFrame, Nan handling, Graph Plotting, Data Processing
- Midterm Take-home Exam
- Final Project: General/Logistic Regression/Decision Tree Analysis based on Adult Census Income Data
Data Analysis for Eng & Sci Assignments & Projects
Design and Analysis of Algorithms Assignments & Projects
- Coding Project 1- Divide and Conquer
- Coding Project 2- Dynamic Programming
- Coding Project 3- Graph Algorithms
- Final Project: Code Implementation for SGD & FSGD for Recommender Systems
DBMS For Data Analytics Assignments & Projects
- MongoDB Querying and Analysis
- DynamoDB Querying and Analysis
- AWS MySQL Querying and Analysis
- Final Project: User Analysis of Kindle Review Dataset
Applied Machine Learning For Analytics Assignments & Projects
- Pickling/Serialization & Data Importation & Exploration from Different types of data file
- Linear Regression & Logistic Regression & AIC/BIC For Model Evaluation
- Naive Bayes & Support Vector Machine (SVM) & Kernel Tricks
- Crowdsourcing with AMT
- Principal Components Analysis & k-means Clustering & Singular Value Decomposition & Latent Semantic Analysis
- Latent Dirichlet Allocation
- Final Project: Feature Engineering & Target Engineering
Applied Machine Learning For Analytics Assignments & Projects
- Pickling/Serialization & Data Importation & Exploration from Different types of data file
- Linear Regression & Logistic Regression & AIC/BIC For Model Evaluation
- Naive Bayes & Support Vector Machine (SVM) & Kernel Tricks
- Crowdsourcing with AMT
- Principal Components Analysis & k-means Clustering & Singular Value Decomposition & Latent Semantic Analysis
- Latent Dirichlet Allocation
- Final Project: Feature Engineering & Target Engineering