Customizing Data Quality Rules
Last updated September 6, 2024
Data quality is crucial for building reliable and accurate machine learning models. While Evidently AI provides a range of built-in data quality checks, sometimes you need to create custom rules tailored to your specific domain and data requirements. Fortunately, Evidently AI allows you to define and apply your own custom data quality rules.
Defining Custom Data Quality Rules
- Understand Your Data: Before defining custom rules, thoroughly understand your data, domain knowledge, and the specific requirements for data quality. What are the unique aspects of your data that need to be validated? What are the acceptable ranges or values for your features?
- Use the `DataDriftPreset`: Start by using the `DataDriftPreset` in your Evidently AI report to identify potential data quality issues. This provides a baseline for evaluating your data and understanding common inconsistencies or problems.
- Define Custom Metrics: Evidently AI allows you to define your own custom metrics using the `Metric` class. These metrics can be tailored to specific data quality checks you want to perform.
- Create Custom Checks: You can create custom checks based on your specific requirements using the `Check` class. This allows you to define custom rules based on conditional statements or calculations.
- Implement Logic: Within your custom checks, implement the logic to evaluate your data quality based on your defined rules. This might involve comparing values, calculating statistics, or performing custom calculations specific to your data.
- Example:
Was this article helpful?