Skip to main content
Skip table of contents

Clean your generative AI data

Last updated: 24 September 2024

When uploading grounding data (like Word documents, PDFs, etc.) to your generative AI chatbot, it is important to ensure the data is clean and well-organised. Clean data helps the AI provide more accurate and useful responses, improving the overall quality of interactions.

This document outlines the key steps to follow when preparing your data before uploading it to the platform.

Why clean data matters

Generative AI relies on clear, well-structured information to understand and generate accurate responses. Excessive formatting, inconsistent text, or irrelevant information can confuse the AI, leading to lower-quality outputs.

By cleaning your data beforehand, you help the AI focus on the essential content and ensure the responses it generates are relevant and accurate.

Data cleaning best practices

  1. If possible, avoid PDFs. Although our platform supports it, PDFs tend to be poorly formatted.

    1. Particularly avoid PDFs that contain images or scanned text.

    2. Whenever possible, use Word documents.

  2. Remove unnecessary formatting. Avoid using complex formatting as it can interfere with the AI’s ability to process the text properly.

    1. Remove headers and footers, page numbers, footnotes, and any special fonts, colours, or backgrounds.

    2. Keep paragraphs and basic headings.

  3. Simplify and standardise text. Ensure the text is easy to read and consistent throughout the document.

    1. Avoid long paragraphs; break them into smaller, readable chunks.

    2. Use consistent capitalisation and punctuation.

    3. Remove unnecessary symbols, special characters, and emojis.

    4. Replace any shorthand, abbreviations, or jargon with clear, plain language to avoid confusion.

  4. Remove unnecessary content. Ensure that only relevant information is included in the data.

    1. Remove irrelevant sections such as personal notes, comments, or outdated content.

    2. Remove images, tables, and charts (they will not be interpreted).

    3. Remove links.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.