Export Data as Parquet to Cloud Object Storage

Shows the steps to export table data from your Autonomous Database to Cloud Object Storage as Parquet data by specifying a query.

This export method supports all the Cloud Object Stores supported by Autonomous Database, and you can use an Oracle Cloud Infrastructure resource principal to access your Oracle Cloud Infrastructure Object Store, Amazon Resource Names (ARNs) to access AWS Simple Storage Service (S3), an Azure service principal to access Azure BLOB storage or Azure Data Lake Storage, or a Google service account to access Google Cloud Platform (GCP) resources.

  1. Connect to your Autonomous Database instance.

    See Connect to Autonomous Database for more information.

  2. Store your Cloud Object Storage credential using DBMS_CLOUD.CREATE_CREDENTIAL.

    For example:

    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'user1@example.com',
        password => 'password'
      );
    END;
    /

    The values you provide for username and password depend on the Cloud Object Storage service you are using.

    Creating a credential to access Oracle Cloud Infrastructure Object Store is not required if you enable resource principal credentials. See Use Resource Principal to Access Oracle Cloud Infrastructure Resources for more information.

  3. Run DBMS_CLOUD.EXPORT_DATA and specify the format parameter type with the value parquet to export the results as parquet files on Cloud Object Storage.
    To generate the parquet output files there are two options for the file_uri_list parameter:
    • Set the file_uri_list value to the URL for an existing bucket on your Cloud Object Storage.

    • Set the file_uri_list value to the URL for an existing bucket on your Cloud Object Storage and include a file name prefix to use when generating the file names for the exported parquet files.

    If you do not include the file name prefix in the file_uri_list, DBMS_CLOUD.EXPORT_DATA supplies a file name prefix. See File Naming for Text Output (CSV, JSON, Parquet, or XML) for details.

    For example, the following shows DBMS_CLOUD.EXPORT_DATA with a file name prefix specified in file_uri_list:

    BEGIN
      DBMS_CLOUD.EXPORT_DATA(
        credential_name => 'DEF_CRED_NAME',
        file_uri_list   => 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/dept_export',
        query           => 'SELECT * FROM DEPT',
        format          => JSON_OBJECT('type' value 'parquet', 'compression' value 'snappy'));
    END;
    /

    In this example, namespace-string is the Oracle Cloud Infrastructure object storage namespace and bucketname is the bucket name. See Understanding Object Storage Namespaces for more information.

    For detailed information about the parameters, see EXPORT_DATA Procedure.

    For detailed information about the available format parameters you can use with DBMS_CLOUD.EXPORT_DATA, see DBMS_CLOUD Package Format Options for EXPORT_DATA.

Notes for exporting with DBMS_CLOUD.EXPORT_DATA:

  • The query parameter that you supply can be an advanced query, if required, such as a query that includes joins or subqueries.

  • Specify the format parameter with the compression option to compress the output files. The default compression for type parquet is snappy.

  • When you no longer need the files that you export, use the procedure DBMS_CLOUD.DELETE_OBJECT or use native Cloud Object Storage commands to delete the files.

  • See DBMS_CLOUD Package Oracle Data Type to Parquet Mapping for details on Oracle Type to Parquet Type mapping.

    The following types are not supported or have limitations on their support for exporting Parquet with DBMS_CLOUD.EXPORT_DATA:

    Oracle Type Notes

    BFILE

    Not supported

    BLOB

    Not supported

    DATE

    Supported with the following limitation: DATE format supports only date, month and year. Hour, minute and seconds are not supported.

    See DBMS_CLOUD Package Oracle Data Type to Parquet Mapping for details on NLS format limitations for exporting DATE to Parquet.

    INTERVAL DAY TO SECOND

    Supported and is treated as string internally

    INTERVAL YEAR TO MONTH

    Supported and is treated as string internally

    LONG

    Not supported

    LONG RAW

    Not supported

    NUMBER

    Supported with the following limitations:

    • Can have maximum precision of 38 and scale equal to less than precision.
    • If no precision and scale is provided for the column NUMBER type, by default precision of 38 and scale of 20 is used.
    • Negative scale is not supported for NUMBER types.

    Object Types

    Not supported

    TIMESTAMP

    Supported with the following limitations:

    • If there are multiple columns with different precision, highest precision will be taken.
    • TIMESTAMP WITH TIME ZONE Oracle datatype will use the timestamp only.

    See DBMS_CLOUD Package Oracle Data Type to Parquet Mapping for details on NLS format limitations for exporting TIMESTAMP to Parquet.