DAG (Directed Acyclic Graph)
A graph with directed edges and no cycles, commonly used in Bayesian networks and computational graphs.
Types of DAG Applications
- Bayesian Networks - Used for probabilistic modeling.
- Computational Graphs - Used in deep learning frameworks like TensorFlow.
Example
Used in machine learning pipelines to manage dependencies.
Data Augmentation
A technique to artificially expand a dataset by applying transformations such as rotation, flipping, or scaling.
Types of Data Augmentation
- Geometric Transformations - Rotation, flipping, scaling.
- Color Space Augmentations - Changes in brightness or contrast.
Example
Used in image classification to improve model generalization.
Data Cleaning
The process of identifying and correcting errors or inconsistencies in datasets.
Types of Data Cleaning Methods
- Missing Value Handling - Imputation or deletion.
- Outlier Removal - Statistical methods to filter anomalies.
Example
Used in pre-processing steps of machine learning pipelines.
Data Drift
A phenomenon where the statistical properties of input data change over time, affecting model performance.
Types of Data Drift
- Covariate Shift - Changes in feature distribution.
- Concept Drift - Changes in target variable relationships.
Example
Detected in fraud detection systems when user behavior changes.
Data Engineering
The discipline of preparing and processing data for analytical and machine learning tasks.
Types of Data Engineering Tasks
- ETL (Extract, Transform, Load) - Data pipeline management.
- Feature Engineering - Creating meaningful features for models.
Example
Used in large-scale data processing with Apache Spark.
Data Imputation
The process of replacing missing values in a dataset with estimated values.
Types of Data Imputation
- Mean/Median Imputation - Filling with statistical averages.
- K-Nearest Neighbors Imputation - Using nearest neighbors to estimate values.
Example
Used in healthcare datasets where missing patient records exist.
Data Labeling
The process of annotating data with meaningful tags for supervised learning models.
Types of Data Labeling
- Manual Labeling - Human annotation.
- Automated Labeling - AI-based annotation.
Example
Used in image recognition to label objects in images.
Data Leakage
A scenario where training data contains information that would not be available at prediction time, leading to over-optimistic models.
Types of Data Leakage
- Target Leakage - Using future information in training.
- Feature Leakage - Using features derived from the target variable.
Example
Detected in fraud detection models where transaction approvals are included as features.
Data Normalization
A technique used to scale numeric features to a standard range to improve model convergence.
Types of Normalization
- Min-Max Scaling - Scales data to a fixed range.
- Z-Score Normalization - Centers data around zero with unit variance.
Example
Used in neural networks to stabilize training.
Data Preprocessing
The process of transforming raw data into a structured format suitable for machine learning.
Types of Data Preprocessing
- Feature Scaling - Normalization and standardization.
- Feature Selection - Selecting relevant attributes.
Example
Used in NLP for tokenization and text vectorization.
Data Reduction
The process of reducing the volume of data while maintaining its integrity for analysis.
Types of Data Reduction
- Dimensionality Reduction - PCA, t-SNE.
- Sampling - Selecting representative data points.
Example
Used in big data analytics to handle large datasets.
Data Sampling
Selecting a subset of data from a larger dataset to train models efficiently.
Types of Data Sampling
- Random Sampling - Selecting random data points.
- Stratified Sampling - Ensuring proportional representation.
Example
Used in surveys to analyze population trends.
Data Science
An interdisciplinary field that combines statistics, computer science, and machine learning to extract insights from data.
Types of Data Science Applications
- Predictive Analytics - Forecasting future trends.
- Natural Language Processing - Text analysis and sentiment detection.
Example
Used in e-commerce for personalized recommendations.
Data Scrubbing
A process of cleaning and correcting data inconsistencies to improve quality.
Types of Data Scrubbing
- Duplicate Removal - Eliminating redundant records.
- Validation Checks - Ensuring data accuracy.
Example
Used in customer databases to merge duplicate records.
Data Segmentation
Dividing data into meaningful groups for analysis and training.
Types of Data Segmentation
- Demographic Segmentation - Age, income, location.
- Behavioral Segmentation - Usage patterns and preferences.
Example
Used in marketing for targeted advertising.
Data Smoothing
A technique used to reduce noise in data for better trend detection.
Types of Data Smoothing
- Moving Average - Rolling window-based smoothing.
- Exponential Smoothing - Weighted averaging method.
Example
Used in stock market analysis for trend forecasting.
Data Standardization
A preprocessing technique that scales data to have a mean of zero and a standard deviation of one.
Types of Standardization
- Z-Score Normalization - Adjusting data distribution.
- Feature Scaling - Bringing all variables to a standard range.
Example
Used in deep learning for stable model training.
Types of Data Transformation
- Log Transformation - Reducing skewness in data.
- Encoding - Converting categorical values into numeric form.
Example
Used in machine learning pipelines to preprocess input features.
Data Warehousing
A system for storing and managing large-scale structured data for analytics.
Types of Data Warehousing
- Enterprise Data Warehouse - Centralized data repository.
- Cloud Data Warehouse - Cloud-based data storage.
Example
Used in business intelligence for decision-making.
Decision Boundary
A hyperplane that separates different classes in a classification model.
Types of Decision Boundaries
- Linear Decision Boundary - Used in logistic regression.
- Non-Linear Decision Boundary - Used in neural networks.
Example
Used in support vector machines for classification tasks.
Decision Stump
A simple decision tree with only one split, often used in boosting algorithms.
Types of Decision Stumps
- Univariate Stump - Splits on a single feature.
- Multivariate Stump - Uses multiple features for splitting.
Example
Used in AdaBoost as a weak classifier.
Decision Tree
A tree-based model used for classification and regression by splitting data at decision nodes.
Types of Decision Trees
- Classification Trees - Used for categorical outcomes.
- Regression Trees - Used for continuous outcomes.
Example
Used in customer segmentation models.
Deep Belief Network (DBN)
A type of deep neural network that consists of multiple layers of restricted Boltzmann machines.
Types of DBN Components
- Visible Layer - Input features.
- Hidden Layers - Learn feature representations.
Example
Used in unsupervised pre-training for deep learning.
Deep Learning
A subset of machine learning using deep neural networks to model complex patterns in data.
Types of Deep Learning Models
- Convolutional Neural Networks (CNNs) - Used for image processing.
- Recurrent Neural Networks (RNNs) - Used for sequential data.
Example
Used in self-driving car perception systems.
Deep Reinforcement Learning
A combination of deep learning and reinforcement learning for decision-making tasks.
Types of Deep RL Algorithms
- Deep Q-Networks (DQN) - Value-based learning.
- Actor-Critic Methods - Policy and value-based learning.
Example
Used in AlphaGo for playing Go.
Deployment in Machine Learning
The process of integrating a trained model into a production environment.
Types of Deployment Methods
- Batch Deployment - Running predictions at intervals.
- Real-Time Deployment - Serving predictions instantly.
Example
Used in recommendation systems on e-commerce websites.
Descriptive Analytics
A type of data analysis that focuses on summarizing historical data.
Types of Descriptive Analytics
- Data Aggregation - Summarizing datasets.
- Data Visualization - Graphical representation of insights.
Example
Used in business reports for past sales trends.
Dimensionality Reduction
A technique to reduce the number of input variables while preserving meaningful information.
Types of Dimensionality Reduction
- Principal Component Analysis (PCA) - Projects data into a lower-dimensional space.
- t-SNE - Used for visualization in 2D or 3D.
Example
Used in image compression techniques.
Discriminative Model
A type of machine learning model that focuses on differentiating between classes.
Types of Discriminative Models
- Logistic Regression - Predicts class probabilities.
- Support Vector Machines - Finds decision boundaries.
Example
Used in spam email classification.
Distance Metrics
Mathematical measures used to calculate the similarity or dissimilarity between data points.
Types of Distance Metrics
- Euclidean Distance - Measures straight-line distance.
- Cosine Similarity - Measures angle between vectors.
Example
Used in k-nearest neighbors (KNN) classification.
Distributed Machine Learning
A technique where ML models are trained across multiple machines to handle large-scale data.
Types of Distributed Learning
- Data Parallelism - Splitting data across nodes.
- Model Parallelism - Splitting model computations across nodes.
Example
Used in Google's TensorFlow for large-scale training.
Dropout Regularization
A technique used to prevent overfitting by randomly disabling neurons during training.
Types of Dropout
- Standard Dropout - Randomly disables neurons.
- Spatial Dropout - Drops entire feature maps in CNNs.
Example
Used in deep neural networks to improve generalization.
Dynamic Time Warping (DTW)
An algorithm used to measure the similarity between two time-series sequences.
Types of DTW Applications
- Speech Recognition - Aligning spoken words.
- Gesture Recognition - Matching movement patterns.
Example
Used in time-series analysis for pattern matching.
Data Drift
A phenomenon where the statistical properties of input data change over time, affecting model accuracy.
Types of Data Drift
- Covariate Drift - Changes in feature distribution.
- Concept Drift - Changes in relationships between input and output.
Example
Observed in fraud detection models when fraud patterns evolve.
Data Fusion
The process of integrating multiple data sources to improve model performance.
Types of Data Fusion
- Low-Level Fusion - Combining raw data sources.
- High-Level Fusion - Combining predictions from different models.
Example
Used in autonomous vehicles combining LiDAR and camera data.
Domain Adaptation
A technique to transfer knowledge from a source domain to a target domain with different data distributions.
Types of Domain Adaptation
- Supervised Adaptation - Labeled data in both domains.
- Unsupervised Adaptation - No labeled data in the target domain.
Example
Used in NLP models trained on formal text but applied to social media.
Decision Forest
A collection of decision trees used to improve prediction accuracy.
Types of Decision Forests
- Random Forest - Uses bagging for variance reduction.
- Extra Trees - Uses random splits to improve generalization.
Example
Used in medical diagnosis models for robust predictions.
Dual Learning
A machine learning framework where two models reinforce each other through mutual feedback.
Types of Dual Learning
- Bidirectional Learning - Models learn from each other.
- Self-Learning - Models refine their predictions iteratively.
Example
Used in machine translation to improve accuracy in both directions.
Data Leakage
When training data includes information that will not be available at prediction time, leading to overly optimistic models.
Types of Data Leakage
- Target Leakage - Labels influence feature selection.
- Train-Test Contamination - Overlapping data between train and test sets.
Example
Occurs in fraud detection when transaction time is used as a feature.
Data Monetization
The process of leveraging data assets to generate economic value.
Types of Data Monetization
- Direct Monetization - Selling data to third parties.
- Indirect Monetization - Using data insights to improve services.
Example
Used by social media platforms for targeted advertising.
Machine Learning (ML)
ML is a subset of AI that enables machines to learn patterns from data and make predictions or decisions without explicit programming.
Types of ML
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Example
Spam detection in emails using classification models.
Deep Learning (DL)
DL is a subset of ML that uses artificial neural networks to process complex data and perform high-level computations.
Example
Image recognition in self-driving cars.
Generative AI (Gen AI)
Gen AI refers to AI models that generate new content, including text, images, and code, using trained knowledge bases.
Example
AI models like ChatGPT and Stable Diffusion that generate text and images.