Glossary - AI Terminology

DAG (Directed Acyclic Graph)

A graph with directed edges and no cycles, commonly used in Bayesian networks and computational graphs.

Types of DAG Applications

Bayesian Networks - Used for probabilistic modeling.
Computational Graphs - Used in deep learning frameworks like TensorFlow.

Example

Used in machine learning pipelines to manage dependencies.

Data Augmentation

A technique to artificially expand a dataset by applying transformations such as rotation, flipping, or scaling.

Types of Data Augmentation

Geometric Transformations - Rotation, flipping, scaling.
Color Space Augmentations - Changes in brightness or contrast.

Example

Used in image classification to improve model generalization.

Data Cleaning

The process of identifying and correcting errors or inconsistencies in datasets.

Types of Data Cleaning Methods

Missing Value Handling - Imputation or deletion.
Outlier Removal - Statistical methods to filter anomalies.

Example

Used in pre-processing steps of machine learning pipelines.

Data Drift

A phenomenon where the statistical properties of input data change over time, affecting model performance.

Types of Data Drift

Covariate Shift - Changes in feature distribution.
Concept Drift - Changes in target variable relationships.

Example

Detected in fraud detection systems when user behavior changes.

Data Engineering

The discipline of preparing and processing data for analytical and machine learning tasks.

Types of Data Engineering Tasks

ETL (Extract, Transform, Load) - Data pipeline management.
Feature Engineering - Creating meaningful features for models.

Example

Used in large-scale data processing with Apache Spark.

Data Imputation

The process of replacing missing values in a dataset with estimated values.

Types of Data Imputation

Mean/Median Imputation - Filling with statistical averages.
K-Nearest Neighbors Imputation - Using nearest neighbors to estimate values.

Example

Used in healthcare datasets where missing patient records exist.

Data Labeling

The process of annotating data with meaningful tags for supervised learning models.

Types of Data Labeling

Manual Labeling - Human annotation.
Automated Labeling - AI-based annotation.

Example

Used in image recognition to label objects in images.

Data Leakage

A scenario where training data contains information that would not be available at prediction time, leading to over-optimistic models.

Types of Data Leakage

Target Leakage - Using future information in training.
Feature Leakage - Using features derived from the target variable.

Example

Detected in fraud detection models where transaction approvals are included as features.

Data Normalization

A technique used to scale numeric features to a standard range to improve model convergence.

Types of Normalization

Min-Max Scaling - Scales data to a fixed range.
Z-Score Normalization - Centers data around zero with unit variance.

Example

Used in neural networks to stabilize training.

Data Preprocessing

The process of transforming raw data into a structured format suitable for machine learning.

Types of Data Preprocessing

Feature Scaling - Normalization and standardization.
Feature Selection - Selecting relevant attributes.

Example

Used in NLP for tokenization and text vectorization.

Data Reduction

The process of reducing the volume of data while maintaining its integrity for analysis.

Types of Data Reduction

Dimensionality Reduction - PCA, t-SNE.
Sampling - Selecting representative data points.

Example

Used in big data analytics to handle large datasets.

Data Sampling

Selecting a subset of data from a larger dataset to train models efficiently.

Types of Data Sampling

Random Sampling - Selecting random data points.
Stratified Sampling - Ensuring proportional representation.

Example

Used in surveys to analyze population trends.

Data Science

An interdisciplinary field that combines statistics, computer science, and machine learning to extract insights from data.

Types of Data Science Applications

Predictive Analytics - Forecasting future trends.
Natural Language Processing - Text analysis and sentiment detection.

Example

Used in e-commerce for personalized recommendations.

Data Scrubbing

A process of cleaning and correcting data inconsistencies to improve quality.

Types of Data Scrubbing

Duplicate Removal - Eliminating redundant records.
Validation Checks - Ensuring data accuracy.

Example

Used in customer databases to merge duplicate records.

Data Segmentation

Dividing data into meaningful groups for analysis and training.

Types of Data Segmentation

Demographic Segmentation - Age, income, location.
Behavioral Segmentation - Usage patterns and preferences.

Example

Used in marketing for targeted advertising.

Data Smoothing

A technique used to reduce noise in data for better trend detection.

Types of Data Smoothing

Moving Average - Rolling window-based smoothing.
Exponential Smoothing - Weighted averaging method.

Example

Used in stock market analysis for trend forecasting.

Data Standardization

A preprocessing technique that scales data to have a mean of zero and a standard deviation of one.

Types of Standardization

Z-Score Normalization - Adjusting data distribution.
Feature Scaling - Bringing all variables to a standard range.

Example

Used in deep learning for stable model training.

Data Transformation

Converting data from one format or structure into another to improve analysis.

Types of Data Transformation

Log Transformation - Reducing skewness in data.
Encoding - Converting categorical values into numeric form.

Example

Used in machine learning pipelines to preprocess input features.

Data Warehousing

A system for storing and managing large-scale structured data for analytics.

Types of Data Warehousing

Enterprise Data Warehouse - Centralized data repository.
Cloud Data Warehouse - Cloud-based data storage.

Example

Used in business intelligence for decision-making.

Decision Boundary

A hyperplane that separates different classes in a classification model.

Types of Decision Boundaries

Linear Decision Boundary - Used in logistic regression.
Non-Linear Decision Boundary - Used in neural networks.

Example

Used in support vector machines for classification tasks.

Decision Stump

A simple decision tree with only one split, often used in boosting algorithms.

Types of Decision Stumps

Univariate Stump - Splits on a single feature.
Multivariate Stump - Uses multiple features for splitting.

Example

Used in AdaBoost as a weak classifier.

Decision Tree

A tree-based model used for classification and regression by splitting data at decision nodes.

Types of Decision Trees

Classification Trees - Used for categorical outcomes.
Regression Trees - Used for continuous outcomes.

Example

Used in customer segmentation models.

Deep Belief Network (DBN)

A type of deep neural network that consists of multiple layers of restricted Boltzmann machines.

Types of DBN Components

Visible Layer - Input features.
Hidden Layers - Learn feature representations.

Example

Used in unsupervised pre-training for deep learning.

Deep Learning

A subset of machine learning using deep neural networks to model complex patterns in data.

Types of Deep Learning Models

Convolutional Neural Networks (CNNs) - Used for image processing.
Recurrent Neural Networks (RNNs) - Used for sequential data.

Example

Used in self-driving car perception systems.

Deep Reinforcement Learning

A combination of deep learning and reinforcement learning for decision-making tasks.

Types of Deep RL Algorithms

Deep Q-Networks (DQN) - Value-based learning.
Actor-Critic Methods - Policy and value-based learning.

Example

Used in AlphaGo for playing Go.

Deployment in Machine Learning

The process of integrating a trained model into a production environment.

Types of Deployment Methods

Batch Deployment - Running predictions at intervals.
Real-Time Deployment - Serving predictions instantly.

Example

Used in recommendation systems on e-commerce websites.

Descriptive Analytics

A type of data analysis that focuses on summarizing historical data.

Types of Descriptive Analytics

Data Aggregation - Summarizing datasets.
Data Visualization - Graphical representation of insights.

Example

Used in business reports for past sales trends.

Dimensionality Reduction

A technique to reduce the number of input variables while preserving meaningful information.

Types of Dimensionality Reduction

Principal Component Analysis (PCA) - Projects data into a lower-dimensional space.
t-SNE - Used for visualization in 2D or 3D.

Example

Used in image compression techniques.

Discriminative Model

A type of machine learning model that focuses on differentiating between classes.

Types of Discriminative Models

Logistic Regression - Predicts class probabilities.
Support Vector Machines - Finds decision boundaries.

Example

Used in spam email classification.

Distance Metrics

Mathematical measures used to calculate the similarity or dissimilarity between data points.

Types of Distance Metrics

Euclidean Distance - Measures straight-line distance.
Cosine Similarity - Measures angle between vectors.

Example

Used in k-nearest neighbors (KNN) classification.

Distributed Machine Learning

A technique where ML models are trained across multiple machines to handle large-scale data.

Types of Distributed Learning

Data Parallelism - Splitting data across nodes.
Model Parallelism - Splitting model computations across nodes.

Example

Used in Google's TensorFlow for large-scale training.

Dropout Regularization

A technique used to prevent overfitting by randomly disabling neurons during training.

Types of Dropout

Standard Dropout - Randomly disables neurons.
Spatial Dropout - Drops entire feature maps in CNNs.

Example

Used in deep neural networks to improve generalization.

Dynamic Time Warping (DTW)

An algorithm used to measure the similarity between two time-series sequences.

Types of DTW Applications

Speech Recognition - Aligning spoken words.
Gesture Recognition - Matching movement patterns.

Example

Used in time-series analysis for pattern matching.

Data Drift

A phenomenon where the statistical properties of input data change over time, affecting model accuracy.

Types of Data Drift

Covariate Drift - Changes in feature distribution.
Concept Drift - Changes in relationships between input and output.

Example

Observed in fraud detection models when fraud patterns evolve.

Data Fusion

The process of integrating multiple data sources to improve model performance.

Types of Data Fusion

Low-Level Fusion - Combining raw data sources.
High-Level Fusion - Combining predictions from different models.

Example

Used in autonomous vehicles combining LiDAR and camera data.

Domain Adaptation

A technique to transfer knowledge from a source domain to a target domain with different data distributions.

Types of Domain Adaptation

Supervised Adaptation - Labeled data in both domains.
Unsupervised Adaptation - No labeled data in the target domain.

Example

Used in NLP models trained on formal text but applied to social media.

Decision Forest

A collection of decision trees used to improve prediction accuracy.

Types of Decision Forests

Random Forest - Uses bagging for variance reduction.
Extra Trees - Uses random splits to improve generalization.

Example

Used in medical diagnosis models for robust predictions.

Dual Learning

A machine learning framework where two models reinforce each other through mutual feedback.

Types of Dual Learning

Bidirectional Learning - Models learn from each other.
Self-Learning - Models refine their predictions iteratively.

Example

Used in machine translation to improve accuracy in both directions.

Data Leakage

When training data includes information that will not be available at prediction time, leading to overly optimistic models.

Types of Data Leakage

Target Leakage - Labels influence feature selection.
Train-Test Contamination - Overlapping data between train and test sets.

Example

Occurs in fraud detection when transaction time is used as a feature.

Data Monetization

The process of leveraging data assets to generate economic value.

Types of Data Monetization

Direct Monetization - Selling data to third parties.
Indirect Monetization - Using data insights to improve services.

Example

Used by social media platforms for targeted advertising.

Machine Learning (ML)

ML is a subset of AI that enables machines to learn patterns from data and make predictions or decisions without explicit programming.

Types of ML

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Example

Spam detection in emails using classification models.

Deep Learning (DL)

DL is a subset of ML that uses artificial neural networks to process complex data and perform high-level computations.

Example

Image recognition in self-driving cars.

Generative AI (Gen AI)

Gen AI refers to AI models that generate new content, including text, images, and code, using trained knowledge bases.

Example

AI models like ChatGPT and Stable Diffusion that generate text and images.