Load Data from Files in the Cloud

The PL/SQL package DBMS_CLOUD provides support for loading data from files in the cloud to tables created in your Autonomous Database on Dedicated Exadata Infrastructure.

You can load data from different file formats using the following PL/SQL procedures provided by DBMS_CLOUD:
  • Text files in the cloud, using the DBMS_CLOUD.COPY_DATA procedure
  • JSON files in the cloud, using the DBMS_CLOUD.COPY_TEXT procedure
Before loading the data from the files, ensure that:
  • The source file is available as a local file in your client computer or uploaded to a cloud-based object store, such as Oracle Cloud Infrastructure Object Storage and is accessible to the database user attempting to load data.
  • Your Cloud Object Storage credentials are stored using the DBMS_CLOUD.CREATE_CREDENTIAL procedure. See Create Credentials for more information.

The package DBMS_CLOUD supports loading from files in the following cloud services: Oracle Cloud Infrastructure Object Storage, Oracle Cloud Infrastructure Object Storage Classic, Azure Blob Storage, and Amazon S3.

Create Credentials

Learn how to store your Cloud Object Storage credential using the DBMS_CLOUD.CREATE_CREDENTIAL procedure.

Execute the DBMS_CLOUD.CREATE_CREDENTIAL procedure using any database tool such as SQL*Plus, SQL Developer, or Database Actions (web based SQL Developer tool). For example:
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'DEF_CRED_NAME',
    username => 'adb_user@oracle.com',
    password => 'password'
  );
END;
/

The values you provide for username and password depend on the Cloud Object Storage service you are using:

  • Oracle Cloud Infrastructure Object Storage: username is your Oracle Cloud Infrastructure user name and password is your Oracle Cloud Infrastructure auth token. See Working with Auth Tokens.

  • Oracle Cloud Infrastructure Object Storage Classic: username is your Oracle Cloud Infrastructure Classic user name and password is your Oracle Cloud Infrastructure Classic password.

This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for all data loads.

Load Data from Text Files

Learn how to load data from text files in the cloud to your Autonomous Database using the DBMS_CLOUD.COPY_DATA procedure.

The source file in this example, channels.txt, has the following data:

S,Direct Sales,Direct
T,Tele Sales,Direct
C,Catalog,Indirect
I,Internet,Indirect
P,Partners,Others
  1. Store your Cloud Object Storage credential using the DBMS_CREDENTIAL.CREATE_CREDENTIAL procedure. See Create Credentials for more details.
  2. Create the table that will contain the data. For example:
    CREATE TABLE CHANNELS
       (channel_id CHAR(1),
        channel_desc VARCHAR2(20),
        channel_class VARCHAR2(20)
       );
    /
  3. Load data into the table using the procedure DBMS_CLOUD.COPY_DATA. For example:
    BEGIN
     DBMS_CLOUD.COPY_DATA(
        table_name =>'CHANNELS',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/idthydc0kinr/mybucket/channels.txt',
        format => json_object('delimiter' value ',')
     );
    END;
    /
    

    The parameters are:

    • table_name: is the target table’s name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load.

      In this example, file_uri_list is an Oracle Cloud Infrastructure Swift URI that specifies the channels.txt file in the mybucket bucket in the us-phoenix-1 region. (idthydc0kinr is the object storage namespace in which the bucket resides.) For information about the supported URI formats, see Cloud Object Storage URI Formats.

    • format: defines the options you specify to describe the format of the source file. For information about the format options you can specify, see Format Parameter.

    For more detailed information, see COPY_DATA Procedure .

Load a JSON File of Delimited Documents into a Collection

Learn how to load a JSON file of delimited documents into a collection in your Autonomous Database using the DBMS_CLOUD.COPY_DATA procedure.

This example loads JSON values from a line-delimited file and uses the JSON file myCollection.json. Each value, each line, is loaded into a collection on your Autonomous Database as a single document.

Here is an example of such a file. It has three lines, with one object per line. Each of those objects gets loaded as a separate JSON document.

{ "name" : "apple", "count": 20 }
{ "name" : "orange", "count": 42 }
{ "name" : "pear", "count": 10 }
Procedure:
  1. Store your Cloud Object Storage credential using the DBMS_CLOUD.CREATE_CREDENTIAL procedure. See Create Credentials for more details.
  2. Load data into a collection using the procedure DBMS_CLOUD.COPY_DATA. For example:
    BEGIN 
      DBMS_CLOUD.COPY_COLLECTION(
        collection_name =>'fruit',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/fruit_bucket/o/myCollection.json',
        format => json_object('recorddelimiter' value '''\n''')
     );
    END;
    /
    

    The parameters are:

    • collection_name: is the target collection’s name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load.

      In this example, file_uri_list is an Oracle Cloud Infrastructure Swift URI that specifies the myCollection.json file in the mybucket bucket in the us-phoenix-1 region. For information about the supported URI formats, see Cloud Object Storage URI Formats.

    • format: defines the options you specify to describe the format of the source file. The format options characterset, compression, ignoreblanklines, jsonpath, maxdocsize, recorddelimiter, rejectlimit, unpackarray are supported for loading JSON data. Any other formats specified will result in an error. For information about the format options you can specify, see Format Parameter.

    For more detailed information, see COPY_COLLECTION Procedure.

Load an Array of JSON Documents into a Collection

Learn how to load an array of JSON documents into a collection in your Autonomous Database using the DBMS_CLOUD.COPY_COLLECTION procedure.

This example uses the JSON file fruit_array.json. The following shows the contents of the file fruit_array.json:

[{"name" : "apple", "count": 20 },
 {"name" : "orange", "count": 42 },
 {"name" : "pear", "count": 10 }]
Procedure:
  1. Store your Cloud Object Storage credential using the DBMS_CLOUD.CREATE_CREDENTIAL procedure. See Create Credentials for more details.
  2. Load data into a collection using the procedure DBMS_CLOUD.COPY_DATA. For example:
    BEGIN 
      DBMS_CLOUD.COPY_COLLECTION(    
        collection_name => 'fruits',    
        credential_name => 'DEF_CRED_NAME',    
        file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/json/o/fruit_array.json',
        format => '{"recorddelimiter" : "0x''01''", "unpackarrays" : "TRUE", "maxdocsize" : "10240000"}'
      );
    END;
    /

    In this example, you load a single JSON value which occupies the whole file. So, there is no need to specify a record delimiter. To indicate that there is no record delimiter, you can use a character that does not occur in the input file. For example, you can use value "0x''01''" because this character does not occur directly in JSON text.

    When unpackarrays parameter for format value is set to TRUE, the array of documents is loaded as individual documents rather than as an entire array. The unpacking of array elements is however limited to single level. If there are nested arrays in the documents, those arrays are not unpacked.

    The parameters are:

    • collection_name: is the target collection’s name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load.

      In this example, file_uri_list is an Oracle Cloud Infrastructure Swift URI that specifies the myCollection.json file in the mybucket bucket in the us-phoenix-1 region. For information about the supported URI formats, see Cloud Object Storage URI Formats.

    • format: defines the options you specify to describe the format of the source file. The format options characterset, compression, ignoreblanklines, jsonpath, maxdocsize, recorddelimiter, rejectlimit, unpackarray are supported for loading JSON data. Any other formats specified will result in an error. For information about the format options you can specify, see Format Parameter.

    Loading fruit_array.json with DBMS_CLOUD.COPY_COLLECTION using the format option unpackarrays makes the procedure recognize array values in the source. Therefore, instead of loading the data as a single document, as it would by default, the data is loaded in the collection fruits with each value in the array as a single document.

    For more detailed information, see COPY_COLLECTION Procedure.

Copy JSON Data into an Existing Table

Use DBMS_CLOUD.COPY_DATA to load JSON data in the cloud into a table.

The source file in this example is a JSON data file.

  1. Store your object store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL. For example:
    SET DEFINE OFF
    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'adb_user@example.com',
        password => 'password'
      );
    END;
    /

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for all data loads.

    For detailed information about the parameters, see CREATE_CREDENTIAL Procedure.

  2. Load JSON data into an existing table using the procedure DBMS_CLOUD.COPY_DATA.

    For example:

    CREATE TABLE WEATHER2
        (WEATHER_STATION_ID VARCHAR2(20),
         WEATHER_STATION_NAME VARCHAR2(50));
    / 
    
    BEGIN 
      DBMS_CLOUD.COPY_DATA(
          table_name      => 'WEATHER2',
          credential_name => 'DEF_CRED_NAME',
          file_uri_list   => 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/jsonfiles*',
          format          =>  JSON_OBJECT('type' value 'json', 'columnpath' value '["$.WEATHER_STATION_ID",
              "$.WEATHER_STATION_NAME"]')
        );
    END;
    / 
    

    The parameters are:

    • table_name: is the target table’s name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load. You can use wildcards in the file names in your URIs. The character "*" can be used as the wildcard for multiple characters, the character "?" can be used as the wildcard for a single character.

    • format: for DBMS_CLOUD.COPY_DATA with JSON data, the type is json. Specify other format values to define the options to describe the format of the JSON source file. See DBMS_CLOUD Package Format Options for more information.

    In this example, namespace-string is the Oracle Cloud Infrastructure object storage namespace and bucketname is the bucket name. See Understanding Object Storage Namespaces for more information.

    For detailed information about the parameters, see COPY_DATA Procedure.

Textual JSON Objects That Represent Extended Scalar Values

Native binary JSON data (OSON format) extends the JSON language by adding scalar types, such as date, that correspond to SQL types and are not part of the JSON standard. Oracle Database also supports the use of textual JSON objects that represent JSON scalar values, including such nonstandard values.

When you create native binary JSON data from textual JSON data that contains such extended objects, they can optionally be replaced with corresponding (native binary) JSON scalar values.

An example of an extended object is {"$numberDecimal":31}. It represents a JSON scalar value of the nonstandard type decimal number, and when interpreted as such it is replaced by a decimal number in native binary format.

For example, when you use the JSON data type constructor, JSON, if you use keyword EXTENDED then recognized extended objects in the textual input are replaced with corresponding scalar values in the native binary JSON result. If you do not include keyword EXTENDED then no such replacement occurs; the textual extended JSON objects are simply converted as-is to JSON objects in the native binary format.

In the opposite direction, when you use Oracle SQL function json_serialize to serialize binary JSON data as textual JSON data (VARCHAR2, CLOB, or BLOB), you can use keyword EXTENDED to replace (native binary) JSON scalar values with corresponding textual extended JSON objects.

Note

If the database you use is an Oracle Autonomous Database then you can use PL/SQL procedure DBMS_CLOUD.copy_collection to create a JSON document collection from a file of JSON data such as that produced by common NoSQL databases, including Oracle NoSQL Database.

If you use ejson as the value of the type parameter of the procedure, then recognized extended JSON objects in the input file are replaced with corresponding scalar values in the resulting native binary JSON collection. In the other direction, you can use function json_serialize with keyword EXTENDED to replace scalar values with extended JSON objects in the resulting textual JSON data.

These are the two main use cases for extended objects:

  • Exchange (import/export):

    • Ingest existing JSON data (from somewhere) that contains extended objects.

    • Serialize native binary JSON data as textual JSON data with extended objects, for some use outside the database.

  • Inspection of native binary JSON data: see what you have by looking at corresponding extended objects.

For exchange purposes, you can ingest JSON data from a file produced by common NoSQL databases, including Oracle NoSQL Database, converting extended objects to native binary JSON scalars. In the other direction, you can export native binary JSON data as textual data, replacing Oracle-specific scalar JSON values with corresponding textual extended JSON objects.

Tip:

As an example of inspection, consider an object such as {"dob" : "2000-01-02T00:00:00"} as the result of serializing native JSON data. Is "2000-01-02T00:00:00" the result of serializing a native binary value of type date, or is the native binary value just a string? Using json_serialize with keyword EXTENDED lets you know.

The mapping of extended object fields to scalar JSON types is, in general, many-to-one: more than one kind of extended JSON object can be mapped to a given scalar value. For example, the extended JSON objects {"$numberDecimal":"31"} and {"$numberLong:"31"} are both translated as the value 31 of JSON-language scalar type number, and item method type() returns "number" for each of those JSON scalars.

Item method type() reports the JSON-language scalar type of its targeted value (as a JSON string). Some scalar values are distinguishable internally, even when they have the same scalar type. This generally allows function json_serialize (with keyword EXTENDED) to reconstruct the original extended JSON object. Such scalar values are distinguished internally either by using different SQL types to implement them or by tagging them with the kind of extended JSON object from which they were derived.

When json_serialize reconstructs the original extended JSON object the result is not always textually identical to the original, but it is always semantically equivalent. For example, {"$numberDecimal":"31"} and {"$numberDecimal":31} are semantically equivalent, even though the field values differ in type (string and number). They are translated to the same internal value, and each is tagged as being derived from a $numberDecimal extended object (same tag). But when serialized, the result for both is {"$numberDecimal":31}. Oracle always uses the most directly relevant type for the field value, which in this case is the JSON-language value 31, of scalar type number.

Table 4-1 presents correspondences among the various types used. It maps across (1) types of extended objects used as input, (2) types reported by item method type(), (3) SQL types used internally, (4) standard JSON-language types used as output by function json_serialize, and (5) types of extended objects output by json_serialize when keyword EXTENDED is specified.

Table 4-1 Extended JSON Object Type Relations

Extended Object Type (Input) Oracle JSON Scalar Type (Reported by type()) SQL Scalar Type Standard JSON Scalar Type (Output) Extended Object Type (Output)
$numberDouble with value a JSON number, a string representing the number, or one of these strings: "Infinity", "-Infinity", "Inf", "-Inf", "Nan"Foot 1 double BINARY_DOUBLE

number

$numberDouble with value a JSON number or one of these strings: "Inf", "-Inf", "Nan"Foot 2
$numberFloat with value the same as for $numberDouble float BINARY_FLOAT

number

$numberFloat with value the same as for $numberDouble
$numberDecimal with value the same as for $numberDouble number NUMBER

number

$numberDecimal with value the same as for $numberDouble
$numberInt with value a signed 32-bit integer or a string representing the number number NUMBER

number

$numberInt with value the same as for $numberDouble
$numberLong with value a JSON number or a string representing the number number NUMBER

number

$numberLong with value the same as for $numberDouble

$binary with value one of these:

  • a string of base-64 characters
  • An object with fields base64 and subType, whose values are a string of base-64 characters and the number 0 (arbitrary binary) or 4 (UUID), respectively

When the value is a string of base-64 characters, the extended object can also have field $subtype with value 0 or 4, expressed as a one-byte integer (0-255) or a 2-character hexadecimal string. representing such an integer

binary BLOB or RAW

string

Conversion is equivalent to the use of SQL function rawtohex.

One of the following:
  • $binary with value a string of base-64 characters
  • $rawid with value a string of 32 hexadecimal characters, if input had a subType value of 4 (UUID)
$oid with value a string of 24 hexadecimal characters binary RAW(12)

string

Conversion is equivalent to the use of SQL function rawtohex.

$rawid with value a string of 24 hexadecimal characters
$rawhex with value a string with an even number of hexadecimal characters binary RAW

string

Conversion is equivalent to the use of SQL function rawtohex.

$binary with value a string of base-64 characters, right-padded with = characters
$rawid with value a string of 24 or 32 hexadecimal characters binary RAW

string

Conversion is equivalent to the use of SQL function rawtohex.

$rawid
$oracleDate with value an ISO 8601 date string date DATE

string

$oracleDate with value an ISO 8601 date string
$oracleTimestamp with value an ISO 8601 timestamp string timestamp TIMESTAMP

string

$oracleTimestamp with value an ISO 8601 timestamp string
$oracleTimestampTZ with value an ISO 8601 timestamp string with a numeric time zone offset or with Z timestamp with time zone TIMESTAMP WITH TIME ZONE

string

$oracleTimestampTZ with value an ISO 8601 timestamp string with a numeric time zone offset or with Z

$date with value one of the following:

  • An integer millisecond count since January 1, 1990
  • An ISO 8601 timestamp string
  • An object with field numberLong with value an integer millisecond count since January 1, 1990
timestamp with time zone TIMESTAMP WITH TIME ZONE

string

$oracleTimestampTZ with value an ISO 8601 timestamp string with a numeric time zone offset or with Z
$intervalDaySecond with value an ISO 8601 interval string as specified for SQL function to_dsinterval daysecondInterval INTERVAL DAY TO SECOND

string

$intervalDaySecond with value an ISO 8601 interval string as specified for SQL function to_dsinterval
$intervalYearMonth with value an ISO 8601 interval string as specified for SQL function to_yminterval yearmonthInterval INTERVAL YEAR TO MONTH

string

$intervalYearMonth with value an ISO 8601 interval string as specified for SQL function to_yminterval

Two fields:

  • Field $vector with value an array whose elements are numbers or the strings "Nan", "Inf", and "-Inf" (representing not-a-number and infinite values).

  • Field $vectorElementType with string value either"float32" or "float64". These correspond to IEEE 32-bit and IEEE 64-bit numbers, respectively.

vector VECTOR

array of numbers

Two fields:

  • Field $vector with value an array whose elements are numbers or the strings "Nan", "Inf", and "-Inf" (representing not-a-number and infinite values).

  • Field $vectorElementType with string value either"float32" or "float64".

Footnote 1 The string values are interpreted case-insensitively. For example, "NAN" "nan", and "nAn" are accepted and equivalent, and similarly "INF", "inFinity", and "iNf". Infinitely large ("Infinity" or "Inf") and small ("-Infinity" or "-Inf") numbers are accepted with either the full word or the abbreviation.

Footnote 2 On output, only these string values are used — no full-word Infinity or letter-case variants.

Monitor and Troubleshoot Data Loading

All data load operations done using the PL/SQL package DBMS_CLOUD are logged in the tables dba_load_operations and user_load_operations:

  • dba_load_operations: shows all load operations.

  • user_load_operations: shows the load operations in your schema.

Query these tables to see information about ongoing and completed data loads. For example, using a SELECT statement with a WHERE clause predicate on the TYPE column, shows load operations with the type COPY:


SELECT table_name, owner_name, type, status, start_time, update_time, logfile_table, badfile_table 
   FROM user_load_operations WHERE type = 'COPY';

TABLE_NAME OWNER_NAME  TYPE   STATUS     START_TIME                            UPDATE_TIME                          LOGFILE_TABLE   BADFILE_TABLE
---------- ----------- ------- ---------- ---------------------- --------------------- --------------- ------------- ------------- -------------
CHANNELS   SH          COPY   COMPLETED  04-MAR-21 07.38.30.522711000 AM GMT    04-MAR-21 07.38.30.522711000 AM GMT  COPY$1_LOG     COPY$1_BAD

The LOGFILE_TABLE column shows the name of the table you can query to look at the log of a load operation. For example, the following query shows the log of the load operation:

select * from COPY$21_LOG;

The BADFILE_TABLE column shows the name of the table you can query to look at the rows that got errors during loading. For example, the following query shows the rejected records for the load operation:

select * from COPY$21_BAD;

Depending on the errors shown in the log and the rows shown in the specified BADFILE_TABLE table, you can correct the error by specifying the correct format options in DBMS_CLOUD.COPY_DATA.

Note

The LOGFILE_TABLE and BADFILE_TABLE tables are stored for two days for each load operation and then removed automatically.

See DELETE_ALL_OPERATIONS Procedure for information on clearing the user_load_operations table.