Getatlas Ai6qy6oo3j
Help CenterData Quality ChecksData Distribution Analysis

Data Distribution Analysis

Last updated September 6, 2024

Understanding the distribution of your data is essential for building accurate and reliable machine learning models. Data distribution analysis helps you identify potential problems, such as outliers, skewness, and inconsistencies, that can affect model performance and prediction accuracy. Evidently AI provides powerful tools for visual analysis of your data distributions.

Data Distribution Analysis

  • Histograms: Histograms help you visualize the frequency distribution of a single feature. They show the frequency of data points within specific ranges, providing insights into the shape of the distribution, potential outliers, and the presence of skewness.
  • Box Plots: Box plots offer a concise representation of the distribution of a feature, encompassing the median, quartiles, and potential outliers. They highlight the spread of the data and help you identify any unusual values or changes in the data distribution compared to your reference data.
  • Scatter Plots: Scatter plots are useful for visualizing the relationship between two features. They allow you to identify potential patterns, trends, and correlations between the features.
  • Cumulative Distribution Function (CDF): CDFs show the probability that a feature's value is less than or equal to a given value. They are helpful for comparing the distributions of features in the reference and current data and identifying significant shifts.
  • KDE Plots: Kernel Density Estimation (KDE) plots provide a smoothed estimate of the probability density function of the data. They offer a more continuous and smoother representation of the data's distribution compared to histograms.
  • Quantile-Quantile (Q-Q) Plots: Q-Q plots compare the distribution of a feature in the reference and current data. They help you visually assess whether the distributions are statistically similar or if there are significant differences.
  • Data Drift Detection: Analyze the distributions of different features over time to detect potential data drift. This allows you to identify changes in the data patterns that could impact your model's accuracy.
Was this article helpful?