Importing Datasets
Importing datasets lets you reuse datasets within the same tenancy, or merge and replace content, without the need to create a dataset from scratch.
Note
From a local directory, you can import a maximum of 201 files in a dataset, and the dataset can be no more than 4.9 GB in size. If the number of files or the dataset size exceeds these values, upload the folder to Object Storage. The following formats are supported:
From a local directory, you can import a maximum of 201 files in a dataset, and the dataset can be no more than 4.9 GB in size. If the number of files or the dataset size exceeds these values, upload the folder to Object Storage. The following formats are supported:
Format | Dataset Type | Annotation Type | File Structure | Maximum File Count and File Size |
---|---|---|---|---|
JSONL Metadata
|
Image |
|
|
|
JSONL Metadata
|
Text |
|
|
|
JSONL Metadata
|
Document |
|
|
|
COCO Metadata
|
Image | Object detection |
|
|
YOLO v5 Metadata
|
Image | Object detection |
|
|
PASCAL VOC Metadata
|
Image | Object detection |
|
|
spaCy | Text | NER |
|
|
CoNLL 2003 | Text | NER |
|
|
For more information on supported file types and sizes, see Supported File Formats.
Sample Metadata Files Contents
Sample file contents for each of the metadata file options.
- Data Labeling JSONL Consolidated
-
{"id":"<Dataset OCID>", "compartmentId":"<Compartment OCID>", "displayName":"<Dataset Name>", "description":"<Dataset Description>", "labelsSet":[{"name":"<Label Name>"},{"name":"<Label Name>"},...], "annotationFormat":"<SINGLE_LABEL/MULTI_LABEL/BOUNDING_BOX/ENTITY_EXTRACTION>", "datasetSourceDetails":{"namespace":"<Namespace>","bucket":"<Bucket>"}, "datasetFormatDetails":{"formatType":"<IMAGE/TEXT/DOCUMENT>"} } {"id":"<Record OCID>", "timeCreated":"<Created datetime>", "sourceDetails":{"sourceType":"OBJECT_STORAGE","path":"<Path of recrod file>"}, "annotations":[{"id":"<Annotation OCID>", "timeCreated":"<Created datetime>", "createdBy":"<User OCID>", "entities":[{"entityType":"<GENERIC/IMAGEOBJECTSELECTION...>", "labels":[{"label_name":"<Label Name>"},{"label_name":"<Label Name>"},...], "boundingPolygon<IN CASE OF BOUNDING_BOX>":{"normalizedVertices":[{"x":"0.1752872","y":"0.18566811"},...]}}]}] } ...other record objects
- Compact JSONL
-
{"labelsSet":[{"name":"<Label Name>"}, {"name":"<Label Name>"},...], "annotationFormat":"SINGLE_LABEL/MULTI_LABEL/ENTITY_EXTRACTION", "datasetFormatDetails":{"formatType":"TEXT"} } {"sourceDetails":{"path":"<Path of text recrod file>"}, "annotations":[{"entities":[{"entityType":"GENERIC","labels":[{"label_name":"<Label Name>"},...]}]}] } ...other record objects
- COCO
-
{ "info": { "year": "<Year>", "version": "1", "description": "<Dataset description>", "contributor": "", "url": "<URL>", "date_created": "<Created datetime>" }, "licenses": [ { "id": 1, "url": "", "name": "Unknown" } ], "categories": [ { "id": 0, "name": "animals", "supercategory": "none" }, { "id": 1, "name": "cat", "supercategory": "animals" }, { "id": 2, "name": "dog", "supercategory": "animals" } ], "images": [ { "id": 1, "license": 1, "file_name": "<Record file path>", "height": 500, "width": 400, "date_captured": "<Captured datetime>" }, ... ], "annotations": [ { "id": 1, "image_id": 1, "category_id": 1, "bbox": [84, 44, 282.5, 143], "area": 40397.5, "segmentation": [], "iscrowd": 0 }, ... ] }
- YOLO v5
-
train: ../train/images nc: 4 names: ["Label1", "Label2", "Label3", "Label4", "..."]
- PASCAL VOC
-
<annotation> <folder/> <filename>recordFile.jpg</filename> <path>/n/Namespace/b/Bucket/o/recordFile.jpg</path> <source> <database>Unknown</database> </source> <size> <width>3800</width> <height>2534</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>LabelName</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <occluded>0</occluded> <bndbox> <xmin>186.94249</xmin> <xmax>1878.6903</xmax> <ymin>330.67606</ymin> <ymax>1396.7037</ymax> </bndbox> </object> <object>....</object> ... </annotation>
- spaCy
- Example 1:
[ { "content": "<Text Content>", "entities": [ { "start": 0, "end": 29, "labelName": "<Label Name>" }, { "start": 65, "end": 86, "labelName": "<Label Name>" }, { "start": 80, "end": 104, "labelName": "<Label Name>" }, ... ] }, ... ]
- CoNLL 2003
-
-DOCSTART- -X-O This -X- _ B-Label1 is -X- _ I-Label1 sample -X- _ I-Label1 data, -X- _ I-Label1 and -X- _ O new -X- _ O data -X- _ O information -X- _ O new -X- _ B-Label1 sample -X- _ I-Label1 Data -X- _ O ...