Importing a Dataset
Follow these steps to import a dataset into Data Labeling.
- Open the navigation menu and click Analytics and AI. Under Machine Learning, click Data Labeling.
- Click Datasets.
- Click Import dataset.
- On the Import folder page, you specify whether upload a local folder to Object Storage to create a dataset (go to step 5) or to retrieve metadata and records that are already in Object Storage (skip to step 6).
-
To upload files: from a local folder to Object Storage, click Upload local
folder and follow these steps:
- Click Select a folder to select a folder from the
file manager containing the dataset files. Note
In some formats, dataset contain a metadata file and record files. The record files might be in a subfolder or in the same folder as the metadata file. - Select the folder and click Upload. The folder name and path to the metadata file are detected and displayed under Selected Folder. Click Edit to change the choice of folder and Delete to delete the choice.
- Under Object_Storage location, specify the Object Storage bucket in which you want
to load the local files:
- Object Storage URL: A read-only field, already populated.
- Compartment: Select the compartment that contains the bucket.
- Namespace: Automatically populated based on the compartment selected.
- Bucket: Select a the bucket from the list. If the list is long, you can choose to view all buckets. If you click it, a panel opens listing all the available buckets. If you need to create a bucket, click the link in the tool tip next to the Bucket label, which takes you to the Buckets list page in the Object Storage service. See Creating a Bucket.
- (Optional) Prefix: Enter a prefix string added to add to the start of the files' names or paths.
- Click Next and skip to step 7.
- Click Select a folder to select a folder from the
file manager containing the dataset files.
-
To retrieve metadata and records that are already in Object Storage, click Select from
Object Storage and follow these steps:
- In Object Storage location, enter the URL to the
metadata file in Object Storage that you
want to load, in the format:
https://objectstorage.<region-identifer>.oraclecloud.com/n/<namespace>/b/<bucket>/o/<object>
. You can find this URL on the bucket details page as follows:- Go to the bucket that contains the file.
- Under Objects, click the folder that containsing the dataset metadata file.
- Find the metadata file.
- From the Action menu for the metadata file, select View Object Details.
- Copy the value for URL Path (URI).
- Paste it into Object Storage URL.
- If the record files are located in a different directory than
the metadata file, In under File Location, clear
the A record is present in the same metadata path
check box and provide the following information:
- Object Storage URL: A read-only field, already populated.
- Compartment: Select the compartment that contains the bucket.
- Namespace: Automatically populated based on the compartment selected.
- Bucket: Select a the bucket from the list. If the list is long, you can choose to view all buckets. If you click it, a panel opens listing all the available buckets.
- (Optional) Prefix: Enter a prefix string to add to the start of the files' names or paths.
- Click Next.
- In Object Storage location, enter the URL to the
metadata file in Object Storage that you
want to load, in the format:
-
On the Add dataset details page, the fields are
populated from the metadata file, but you can populate any fields as
necessary:
- Name: Give the dataset a suitable name.
- Description: (Optional) Give the dataset a relevant description that you can use to help search for it.
- Labeling instructions: (Optional) Enter instructions and directions for the team labeling the data.
- Dataset format: Click Images, Text, or Documents, depending on whether you want to label images, pieces of text, or documents.
- Import format: Select the dataset format. For example COCO or YOLO v5.
- Annotation class: Select how to annotate the
images, text, or documents.
- Single labels: Categorizes images, text, or documents into one class.
- Multiple Labels: Categorizes images, text, or documents into one or more classes.
- Object Detection: For images only. Draws bounding boxes around object in the images.
- Entity Extraction: For text only. Highlights and labels text into one or more classes.
- Key Value: For documents only. Uses Document Understanding's Optical Character Recognition (OCR) to identify and extract information from documents.
- Labels: Enter the labels to use with the dataset. After entering each label, press Return.
- Click Next.
- On the Review page, verify the information that you entered. If the dataset details need editing, click Edit. If you need to go back and change any values click Edit.
-
Click Import.
The records are generated when the dataset is imported. The dataset state changes to
Updating
while the records are generated. Only after the records have been created do the files used appear in the dataset details page.Note
If you have specified the wrong format the import fails with an error message. Click Retry import in the error message to display the Retry Import dialog box. Here you can update the Metadata URL or the Import format as appropriate and click Retry to try the import again. -
To import the dataset later using Resource Manager and Terraform, click
Save as stack to save the resource definition as a
Terraform configuration.
For information about saving stacks from resource definitions, see Creating a Stack from a Resource Creation Page.
Use the import-pre-annotated-data command and required parameters to import records and annotations from dataset files into a dataset:
Use the import-pre-annotated-data-object-storage-import-metadata-path command and required parameters to import records and annotations from dataset files in object storage into a dataset:oci data-labeling-service dataset import-pre-annotated [OPTIONS]
oci data-labeling-service dataset import-pre-annotated-data-object-storage-import-metadata-path [OPTIONS]
For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
This task is not available in the API.