Ingesting Data Source Data in Generative AI Agents

A data ingestion job extracts data from data source documents, converts it into a structured format suitable for analysis, and then stores it in a knowledge base.

  1. On the Knowledge Bases list page, select the knowledge base that you want to ingest data for its data source.
    If you need help finding the list page, see Listing Knowledge Bases.
  2. Click the data source that you want to ingest its data.
  3. Click Create Ingestion job.
  4. Enter the following values:
    • Name: A name that starts with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be from 1 to 255 characters.
    • Description: An optional description
    • Tags: Click Show advanced options and add one or more tags to the ingestion job. If you have permissions to create a resource, then you have permission to update its tags. If you need help, see Tags and Tag Namespace Concepts.
  5. Click Create.
  6. Wait for the Lifecycle state of the job to display as Succeeded.
    Note

    After Creating an Ingestion Job
    1. Review the status logs to confirm that all updated files were successfully ingested.
    2. If the ingestion job fails (for example, because of a file being too large), address the issue and restart the job.
    How the Ingestion Pipeline Handles Previously Run Jobs

    When you restart a previously run ingestion job, the pipeline:

    1. Detects files that were successfully ingested earlier and skip them.
    2. Only ingests files that failed previously and have since been updated.
    Example Scenario

    Suppose you have 20 files to ingest, and the initial job run results in 2 failed files. When you restart the job, the pipeline:

    1. Recognizes that 18 files have already been successfully ingested and ignore them.
    2. Ingests only the 2 files that failed earlier and have since been updated.