Define basic data integrity rules for a given dataset and outline how to validate them.
Task: Propose basic data integrity rules for a dataset based on its description and suggest methods to validate them. Context: You are working with a dataset that needs to adhere to certain quality standards. Provide details about your dataset's columns and their expected values, for example: 'Column [column_name] should be a [data_type] and values must be between [min_value] and [max_value].' Rules to define: - Data type consistency - Range constraints - Uniqueness constraints - Referential integrity (if applicable, e.g., foreign keys) Validation methods to suggest: - Simple checks (e.g., value.is_numeric(), value in allowed_list) - Aggregation checks (e.g., sum(column) == expected_sum) - Cross-column checks Output Format: List the proposed rules and corresponding validation methods.
Generate a basic cleaning process for numerical datasets, including handling missing values and outliers.
Formulate fundamental guidelines for consistent and accurate data entry during field research, suitable for basic studies.
Generate a clear, simple explanation of cryptographic hashing and its purpose for data integrity, suitable for a non-technical audience.