Exporting the Dataset
You can export datasets in Data Labeling in various text and image formats, and snapshot JSONL files.
You can export datasets in Data Labeling to any Object Storage location in the tenancy. Thus, you can maintain versions, or use the dataset elsewhere, for example, as an input to machine learning model development. The output file location is included in the export panel. After export, the destination is available in the associated work request. The destination is also displayed in the Dataset Details page, but only while the work request exists.
For documents, you can export to JSONL files.
- JSONL
- YOLO V5
- COCO
- PASCAL VOC
- JSONL
- JSONL Compact Plus Content
- spaCy
- CoNLL V2003 Note
If you export text in the CoNLL format, recursive and overlapping entities are ignored.
For CSV, the only option is to export to
JSONL
.This task is not available in the CLI.
This task is not available in the API.
Examples of Exported Document, Image, and Text Datasets
Examples of the JSON files created when a dataset is exported in Data Labeling.
An example of an exported consolidated JSON file.
{
"id": "ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyag7jcbu3xnpw4dcn3tmniarzorpxbtegnipsw5oleeauq",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaihdqc5z4zq4sqt7t4c7vbwc6lbf5dr6mky2phcpvdlh7c3p5mtuq",
"displayName": "test-check",
"description": "test check",
"labelsSet": [{
"name": "location"
}, {
"name": "university"
}],
"annotationFormat": "ENTITY_EXTRACTION",
"datasetSourceDetails": {
"namespace": "idrcdhfxwqwa",
"bucket": "test-sachin-cucket"
},
"datasetFormatDetails": {
"formatType": "TEXT"
}
} {
"id": "ocid1.datalabelingrecord.oc1.iad.amaaaaaazaehrjyahykmu6hvdksayw64a3wmur7mk2366hgitlypk6u2soea",
"timeCreated": "2021-10-12 12:09:37",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "sample-text.txt"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyat64zcfbjviu3pttykthabv5jiuicva3dkv6oikstzd7q",
"timeCreated": "2021-10-12 12:16:51",
"createdBy": "ocid1.user.oc1..aaaaaaaaktqgvx2skco6bfyziwjzfjaxensoewscqbk7p44sjqyrxmz4qozq",
"entities": [{
"entityType": "TEXTSELECTION",
"labels": [{
"label_name": "university"
}],
"textSpan": {
"offset": 60,
"length": 11
}
}]
}]
}
An example of an exported document dataset JSON file.
{
"id":"ocid1.datalabelingdatasetint.oc1.iad.amaaaaaaniob46iafkiyw6a4uwgrnpy4lfxjoslocap7elaj257mxh4fzuwq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaajqiw27knoagxurhzjlihw7ijnoshsu4zi2uawdn5gfexdqwvu4vq",
"displayName":"Sep6_PDF",
"labelsSet":[
{
"name":"L1"
},
{
"name":"L"
},
{
"name":"23423"
}
],
"annotationFormat":"MULTI_LABEL",
"datasetSourceDetails":{
"namespace":"idgszs0xipmn",
"bucket":"Demo-bucket"
},
"datasetFormatDetails":{"formatType":"DOCUMENT"},
"recordFiles":[
{
"namespace":"idgszs0xipmn",
"bucket":"COVID_Dataset",
"path":"Snapshotsrecords_1632479104889.jsonl"
}
]
}
An example of an exported image dataset JSON file.
{
"id": "ocid1...",
"compartmentId": "",
"timeCreated":2020-12-15...,
"displayName":...,
"description":...,
"labelsSet": [
{"name":"germanshepherd"},
{"name":"americanshepherd"},
{"name":"australianshepherd"},
{"name":"irishwolfhound"}
]
"annotationFormat": "IMAGE_OBJECT_SELECTION",
"datasetSourceDetails": {
"sourceType": "OBJECT_STORAGE",
"namespace": "i235o3idk",
"bucket": "mytrainingdata",
"prefix": "puppyproject/"
}
"datasetFormatDetails": {
"formatType": "IMAGE" # image requires less metadata than delimited for example
}
"recordsFiles: {
[
{
"namespace": "i235o3idk"
"bucket": "mylabels"
"path": "puppyproject/records1.json"
}
]
}
"definedTags": {}
"freeformTags": {}
}
An example of an exported text dataset JSON file.
{
"id":"ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyamqjx733dhxd25zxcro2nftrewq7ltj34ua2cfapzsmjq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaagzh2kii2frktoc7bcvfydpzkxr7dbn6nf6jcyrxwgzen4pi5y4zq",
"displayName":"NER DEMO DATASET UNLABELLED",
"description":"NER DEMO DATASET UNLABELLED",
"labelsSet":[
{
"name":"Person"
},
{
"name":"Organization"
},
{
"name":"Event"
},
{
"name":"Place"
}
],
"annotationFormat":"ENTITY_EXTRACTION",
"datasetSourceDetails":{
"namespace":"idrcdhfxwqwa",
"bucket":"news-articles"
},
"datasetFormatDetails":{
},
"recordFiles":[
{
"namespace":"idrcdhfxwqwa",
"bucket":"snapshots",
"path":"forReview/records_1621847577526.jsonl"
}
]
}
An example of an exported document record JSON file.
{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaqgpzhscdpdcgohg5ocp3obwmjjgju6m73bmyrt4aovhq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 98.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iatjg3p6hlszxrgmsj4y76b5tndddaedm6ardkoxbtt6mq",
"timeCreated":"2021-09-06 03:42:43",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"23423"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iasb5klulgaj4djn3acsgsd3cekx3ix46ftxjdip4tu23a",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 99.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iav45mlpcleqjt7cnmhyogopszi2rfnilwjhd4xyxa7irq",
"timeCreated":"2021-09-06 03:42:47",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"L1"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaxhixolkqryomyu6i4jrrmzwcckw2tmgva47suylu5rzq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 97.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iagymrjuem42kvzilxjd5hdrr3djznrl7aajvvcr6zc6sq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 96.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaclpccpxn5hgmplesv3mt3g6hxkfaepzv6fuy7b6he3ca",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 2.pdf"
}
}
An example of an exported image record JSON file.
{
"id": "ocid1...",
"timeCreated": 2020-12-15...,
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "filename2.jpg"
}
"annotations": [
{
"id": "ocid1....",
"timeCreated": ...,
"createdBy": ...,
"entities: [
{
"entityType": "IMAGEOBJECTSELECTION",
"labels": [
{"name": "germanshepherd"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.2, "y":0.2},
{"x":0.3, "y":0.2},
{"x":0.3, "y":0.3},
{"x":0.2, "y":0.3}
]
}
},
{
"entityType": "BOUNDING_BOX",
"labels": [
{"name": "irishwolfhound"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.4, "y":0.4},
{"x":0.5, "y":0.4},
{"x":0.5, "y":0.5},
{"x":0.4, "y":0.5}
]
}
}
]
}
],
"freeformTags": {
"set": "validation" # optional, user defined convention used for reproducibility
}
}
An example of an exported text record JSON file.
{
"id":"ocid1.record.oc1.iad.UxxfPBMZVYfwZHZnjCPUGkhMwpWoTPMOnxDnrgXbBxwLKkrdeGwewdViOoUJ",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_3.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyadghacojq3nmo2mtcbcmlo4rgslmpzxeboujhduft5nta",
"timeCreated":"2021-46-21 09:46:45",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":141,
"length":12
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":204,
"length":20
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":254,
"length":15
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":402,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":638,
"length":11
}
}
]
}
]
}{
"id":"ocid1.record.oc1.iad.AakCoDHvJpnZofzIYfRCfpZnFUqNmfiWNIuNysbXCSRZeTVqdwKGvYjJpMvh",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_1.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyafoed6oimxqxeyey6osjo3jp52vsyd75i5zspfvcfdz3q",
"timeCreated":"2021-30-21 03:30:10",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":36,
"length":8
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":147,
"length":23
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":196,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":311,
"length":22
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":512,
"length":49
}
}
]
}
]
}
An example of an exported CSV (text) dataset JSON file.
{
"id": "ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46iaxarhafiu42tbdm2d2nkxlkxwhnc76ohnwvpsdfccqw5q",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaundh4v2w4spnyt4hgy367qf54jonakpz6gh573bspmgzfoj2auga",
"displayName": "Text Classification CSV dataset",
"labelsSet": [{
"name": "positive"
}, {
"name": "neutral"
}, {
"name": "negative"
}],
"annotationFormat": "SINGLE_LABEL",
"datasetSourceDetails": {
"namespace": "idgszs0xipmn",
"bucket": "TEST",
"prefix": "languageteam/Text_Classification_Context_Oracle_advt.csv"
},
"datasetFormatDetails": {
"formatType": "TEXT",
"textFileTypeMetadata": {
"formatType": "DELIMITED",
"delimitedFileTypeMetaData": {
"columnIndex": 5,
"columnName": "CONTENT",
"columnDelimiter": ","
}
}
}
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iajx42mojwkktind744i3t2q3di6tdhwysw2wy4d42tseq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/546"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iadsu6zpch4lvozx7ci3as5st23jqxjpjdcryp4jworala",
"timeCreated": "2022-06-05 05:40:48",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia7otgs2rb3kuh464sisfbjxxbbkb65sbg2icst3gquw3q",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/303"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatfuceqzjb5nnh7quk5wupvwe74bfpn5oka57cz6gqv4a",
"timeCreated": "2022-06-05 05:41:30",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iab55fqcxlfb3xszlpp7qnpsthjdhzzb7nki65xqdvgceq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/547"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iamosgunt72lci3g3mzyyx2sskjdje4e5zspts7mbnsl5q",
"timeCreated": "2022-06-05 05:41:36",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia45ave4zhtisvu2k7d6tbciskcge4ecm2imb6bvdqe4da",
"timeCreated": "2022-06-05 04:39:21",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/564"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iauqo6tlqil7vijetsayt6vsmpohxum5vmj6cde3wbfxua",
"timeCreated": "2022-06-05 05:40:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iasymkpbstgjwmae7ar5ikgp5mtth2izcaaaruatpl45ma",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/545"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatu6k7afdwirdtvv6bofrquc65m4ruet4hlfmhgzhqjxa",
"timeCreated": "2022-06-05 05:41:02",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia6n4whohdhn257pmot7zlncawockthadosdhrp5so2nna",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/304"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iaslgb6s6h5ffce5mcgeidndp3vydcxzjya7yrbaj6pw5a",
"timeCreated": "2022-06-05 05:40:57",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iamgsncrjarzujr6duaedmsjyrp67yi7dpe2uoi6h54c5a",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/548"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iabt3hwyc7mkaanez7q24k7vlfds3lisa6hdu53hntq2qq",
"timeCreated": "2022-06-05 05:42:55",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iactsl4j7v633d2y2t67lkxawv2nyemz7wwarppjpxeofq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/305"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46ia7xxg4ukky3ur56zzwaodvwrks4vqgvoug2z2moif274a",
"timeCreated": "2022-06-05 05:41:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]