Prompts for Cleaning and Processing Data with AI

  • A blog on using using LLMs to clean, process, and enrich data. It includes prompts and code snippets. The post draws on my experiences and two really interesting papers:

    - Can Foundation Models Wrangle Your Data? (https://arxiv.org/abs/2205.09911)

    - Large Language Models as Data Preprocessors (https://arxiv.org/abs/2308.16361)

    I cover:

    - Error and Anomaly Detection

    - Enriching Data with LLMs

    - Matching Data Labels

    - Identifying Matching Records

    Thank you and I'd appreciate your feedback.