Query External Tables with Implicit Partitioning

On Autonomous Database you can create implicit partitioned external tables from Hive style partitioned data or from simple folder partitioned data stored on your Cloud Object Store.

About External Tables with Implicit Partitioning

On Autonomous Database use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create implicit partitioned external tables from Hive style partitioned data or from simple folder partitioned data stored on your Cloud Object Store.

Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create implicit partitioned external tables. By passing the appropriate options to this procedure, the partitions are derived from the source data. The partitioned external tables support runtime discovery of partition columns and their values. Runtime discovery of changes in the underlying object store structure, such as adding or removing objects, simplifies the maintenance process by eliminating the need for additional synchronization procedures required by DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE. This ensures that the data is up to date at query runtime.

With implicit partitioning, Autonomous Database automatically determines the columns a table is "partitioned on" based on the object storage source hierarchical file structure. There is no need to explicitly declare a partitioning scheme. Implicit partitioning provides partition-table-like performance benefits without the need to explicitly define a partitioned external table using the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE procedure.

Implicit partitioned external tables support the following naming styles for partitioned objects in object store:

For more information on naming formats, see About External Tables with Source File Partitioning.

Using the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure, implicit partitioned external tables can be created in the following ways:
  • Set the partition type to hive with a given list of partition columns

    In this case, implicit_partition_type is set to hive and implicit_partition_columns provides a list of partition columns.

    For example:

    DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
       table_name        => 'mysales',
       credential_name   => 'mycredential',
       file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
       column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
       format            => '{"type":"parquet",
                              "implicit_partition_type":"hive",
                              "implicit_partition_columns":["country","year","month"]}');
  • Set the partition type to hive without providing a list of partition columns

    In this case, implicit_partition_type is set to hive and implicit_partition_columns is not provided. The partition columns are automatically detected by searching for '=' in the path specified by file_uri_list. The column_name is to the left of the '=' and the value is on the right side. If a column_name is not found in the path, an error is thrown.

    For example:

    DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
       table_name        => 'mysales',
       credential_name   => 'mycredential',
       file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
       column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
       format            => '{"type":"parquet", "implicit_partition_type":"hive"');
  • Provide a list of partition columns without specifying the type

    In this case, implicit_partition_type is not set and implicit_partition_columns provides a list of columns.

    For example:

    DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
       table_name        => 'mysales',
       credential_name   => 'mycredential',
       file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
       column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
       format            => '{"type":"parquet",
                              "implicit_partition_columns":["country","year","month"]}');

See CREATE_EXTERNAL_TABLE Procedure for further information.

Query External Implicit Partitioned Data with Hive Format Source File Organization

Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create an implicit partitioned external table from data in Object Store that was generated from Hive data.

The sample source files in the follow example, use this naming format:

OBJBUCKET/<table>/<partcol1>=<value1>/<partcol2>=<value2>/file.parquet

Consider the following sample source files:

OBJBUCKET/sales/country=USA/year=2024/month=01/sales-2024-01.parquet

OBJBUCKET/sales/country=USA/year=2024/month=02/sales-2024-02.parquet

OBJBUCKET/sales/country=USA/year=2024/month=02/sales-2024-03.parquet

To create an implicit partitioned external table with data stored in this sample Hive format, do the following:

  1. Store Object Store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL.

    For example:

    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL (
        credential_name => 'DEF_CRED_NAME',
        username => 'adb_user@example.com',
        password => 'password' );
    END;
    /

    Creating a credential to access Oracle Cloud Infrastructure Object Store is not required if you enable resource principal credentials. See Use Resource Principal to Access Oracle Cloud Infrastructure Resources for more information.

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.

    See CREATE_CREDENTIAL Procedure for information about the username and password parameters for different object storage services.

  2. Create an implicit partitioned external table on top of your source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE.
    In this example, since the format option implicit_partition_columns is not provided, partition columns are automatically detected and implicit partitioning is enabled by setting the format option implicit_partition_type to hive.

    However, if a list of columns is given, then those columns are used as partition columns and Autonomous Database doesn't try to discover the columns.

    To detect the partition columns Autonomous Database starts searching from the beginning of the path, specified by file_uri_list, for '='. When found, the left part of '=' up to the last '/' is taken as column (e.g. "country") and the right part of '=', until the first '/' as value (e.g. "USA"). The search continues for '=' until after the '/' that follows the first partition value, detecting the second '=' in the path, and so on.

    DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
       table_name        => 'mysales',
       credential_name   => 'DEF_CRED_NAME',
       file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
       column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
       format            => '{"type":"parquet", "implicit_partition_type":"hive"');

    The parameters are:

    • table_name: is the external table name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma-delimited list of source file URIs. There are two options for this list:

      • Specify a comma-delimited list of individual file URIs without wildcarding.

      • Specify a single file URI with wildcards, where the wildcards can only be after the last slash "/". The character "*" can be used as the wildcard for multiple characters, the character "?" can be used as the wildcard for a single character.

    • column_list: is a comma delimited list of column names and data types for the external table. The list includes the columns inside the data file and those derived from the object name (from names in the file path specified by file_uri_list).

      The column_list is not required when the data files are structured files (Parquet, Avro, or ORC).

    • format: defines the options you can specify to describe the format of the source file. The implicit_partition_type option specifies the data format type as hive.

      If the data in your source file is encrypted, decrypt the data by specifying the encryption format option. See Decrypt Data While Importing from Object Storage for more information on decrypting data.

      See DBMS_CLOUD Package Format Options for more information.

    In this example, namespace-string is the Oracle Cloud Infrastructure object storage namespace and bucketname is the bucket name. See Understanding Object Storage Namespaces for more information.

    See CREATE_EXTERNAL_TABLE Procedure for detailed information about the parameters. See DBMS_CLOUD URI Formats for more information on the supported cloud object storage services.

  3. You can now run queries on the external partitioned table you created in the previous step.

    Your Autonomous Database takes advantage of the partitioning information of your external partitioned table, ensuring that the query only accesses the relevant data files in Object Store.

    For example:

    SELECT product, units FROM mysales WHERE year='2024' and month='02'

    This SQL statement only queries data in the partition for month 02 of year 2024.

Query External Implicit Partitioned Non-Hive Style Data

Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create an implicit partitioned external table from data in object storage that has been generated from non-Hive data.

The sample source files in the follow example, use this naming format:

OBJBUCKET/<table>/<value1>/<value2>/file.parquet

Consider the following sample source files:

OBJBUCKET/sales/USA/2024/01/sales-2024-01.parquet

OBJBUCKET/sales/USA/2024/01/sales-2024-02.parquet

OBJBUCKET/sales/USA/2024/01/sales-2024-03.parquet

To create an implicit partitioned external table with data stored in this sample Hive format, do the following:

  1. Store Object Store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL.

    For example:

    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL (
        credential_name => 'DEF_CRED_NAME',
        username => 'adb_user@example.com',
        password => 'password' );
    END;
    /

    Creating a credential to access Oracle Cloud Infrastructure Object Store is not required if you enable resource principal credentials. See Use Resource Principal to Access Oracle Cloud Infrastructure Resources for more information.

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.

    See CREATE_CREDENTIAL Procedure for information about the username and password parameters for different object storage services.

  2. Create an implicit partitioned external table on top of your source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE.
    In this example, implicit partitioning is enabled by providing the partition columns with the format option implicit_partition_columns. Since the folder name does not include the partition columns, the partition values in the path, specified by file_uri_list, can only be discovered if an explicit list of columns is given with the format option implicit_partition_columns. In order to detect the respective column values in the path the order of columns must be the same as the order of values.

    The partition columns are automatically detected.
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
       table_name        => 'mysales',
       credential_name   => 'DEF_CRED_NAME',
       file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
       column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
       format            => '{"type":"parquet", "implicit_partition_columns":["country","year","month"]}');

    The parameters are:

    • table_name: is the external table name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma-delimited list of source file URIs. There are two options for this list:

      • Specify a comma-delimited list of individual file URIs without wildcarding.

      • Specify a single file URI with wildcards, where the wildcards can only be after the last slash "/". The character "*" can be used as the wildcard for multiple characters, the character "?" can be used as the wildcard for a single character.

    • column_list: is a comma delimited list of column names and data types for the external table. The list includes the columns inside the data file and those derived from the object name (from names in the file path specified by file_uri_list).

      The column_list is not required when the data files are structured files (Parquet, Avro, or ORC).

    • format: defines the options you can specify to describe the format of the source file. The implicit_partition_type option is unset. Since implicit_partition_columns is given, the type is automatically detected as non-hive.

      If the data in your source file is encrypted, decrypt the data by specifying the encryption format option. See Decrypt Data While Importing from Object Storage for more information on decrypting data.

      See DBMS_CLOUD Package Format Options for more information.

    In this example, namespace-string is the Oracle Cloud Infrastructure object storage namespace and bucketname is the bucket name. See Understanding Object Storage Namespaces for more information.

    See CREATE_EXTERNAL_TABLE Procedure for detailed information about the parameters.

    See DBMS_CLOUD URI Formats for more information on the supported cloud object storage services.

  3. You can now run queries on the external partitioned table you created in the previous step.

    Your Autonomous Database takes advantage of the partitioning information of your external partitioned table, ensuring that the query only accesses the relevant data files in Object Store.

    For example:

    SELECT product, units FROM mysales WHERE year='2024'

    This SQL statement only queries data in the partition for year 2024.