Introduction to Data Analytics

CRISP(Cross-Industry Standard Process for Data Mining)-DM

  • Business Understanding

  • Data Understanding

  • Data Preparation

  • Modeling

  • Evaluation

  • Deployment

Business Understanding

  • Objective: Understand project objectives and requirements from business perspectives

  • Task:

    • Define business goal

    • Covert business goal to datqa goal

Data Understanding

  • Objective: To collect and explore the data to understand its structure and quality

  • Task:

    • Gather initial data

    • Describe the data

    • Explore the data

    • Verify data

Data Preparation

  • Objective: To prepare the data for modeling by cleaning, transforming, and organizing it

  • Task:

    • Select relevant data

    • Clean the data

    • Construct new features

    • Integrate data

    • Format and structure the data

Modeling

  • Objective: To build and evaluate predictive models based on the prepared data

  • Task:

    • Select appropriate modeling techniques

    • Train models using the prepared data

    • Evaluate model performance

    • Precision and recall

    • Tune model parameters

Evaluation

  • Objective: To assess the model's performance and ensure it meets business objectives

  • Task:

    • Evaluate the model's result against business objectives

    • Validate the model's effectiveness and reliabilty

    • Review the process and results

Deployment

  • Objective: To implement the model in a real-world setting and monitor its performance

  • Task:

    • Deploy the model into production

    • Monitor the model's performance and maintain

    • Update and refine the model

What is predictive analytics?

  • Data mining

  • Statistical inference

  • Machine Learning

  • Business Sense

Machine Learning

  • Supervised Learning: The model is trained on a labeled dataset. This means that for each input in the training set, the corresponding output (or label) is known. The goal is to learn a mapping from inputs to outputs so that the model can accurately predict the output for new, unseen data. Always requires a labelled training dataset Examples: Predictive Modeling, uplift modeling, recommender systems, sentiment analysis

  • Unsupervised Learning: The model is trained on a dataset that does not contain labeled outputs. Instead, the model tries to find hidden patterns, structures, or relationships within the data without any explicit instructions on what to predict. Examples: Association Rule Mining, Clustering,

Forms of Predictive Analytics

  • Predictive Modeling

    • Regression: It estimates relationships between variables to predict a continuous numerical outcome.

    • Classification: Predicts discrete categories or classes, such as spam, cancer cells, or speech. The output is typically a label or a class from a set of predefined options.

  • Clustering

    • This technique groups similar data points together based on their inherent characteristics without predefined labels.

    • K-means, hierarchical clustering, and density-based clustering are prominent algorithms.

    • Used for: Customer segmentation, market basket analysis, identifying anomalies

  • Association Rule Mining

    • Identifies relationships between variables in large datasets

    • For example, market basket analysis predicts customer purchasing behavior by finding associations between products.

  • Recommender Systems

    • Recommender systems are a type of predictive modeling and data filtering technology that aims to suggest items or content to users based on their preferences, behavior, or similarities with other users.

    • These systems predict the relevance of items (such as products, movies, articles, etc.) to a particular user, helping to personalize their experience by recommending things they are likely to be interested in

  • Sentiment Analysis

    • Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the emotional tone or sentiment expressed in a piece of text.

    • It involves classifying text into categories such as positive, negative, or neutral, based on the underlying emotions or opinions conveyed by the words and phrases.

  • Uplift Modeling

    • Uplift modeling, also known as incremental modeling, is a predictive modeling technique used to estimate the causal impact of a specific action or treatment on an individual's behavior.

    • Uplift models predict the difference in outcomes caused by an intervention (e.g., how likely a customer is to buy a product as a result of receiving a targeted marketing campaign).