About External Tables with Source File Partitioning
On Autonomous AI Database you can create partitioned external tables from Hive style partitioned data or from simple folder partitioned data stored on your Cloud Object Store.
Using source file partitioning, instead of supplying a complete partition specification the procedure derives partitioning information from the file path for certain file patterns. For example, consider the following data file specifications:
- 
Hive style: for example: sales/country=USA/year=2020/month=01/file1.csv
- 
Simple folder partitioning style: for example: sales/USA/2020/01/file1.parquet
Using one of these common partitioning formats greatly simplifies both the creation and management of partitioned external tables. In addition, even though partition columns may not appear in the data file, they can still be queried using SQL. Partitioning data also improves query performance by dramatically reducing the amount of data scanned. In this example, when you query ‘USA’ data, the query can skip scanning the files for other countries.
Hive Format Partitioned Data in Cloud Object Store
Hive offers a standard metadata format for big data processing engines.
                Partitioned data in Cloud Object Store that is generated in Hive format is
                represented in a folder/subfolder format. For example, on Cloud
                Object Store a Hive format data file is stored as follows:
                  
table/partition1=partition1_value/partition2=partition2_value/data_file.csvFiles saved in Hive partitioned format provide partition information in the data file path name. The data file path name includes information about the object contents, including partition column names and partition column values (the data file does not include the partition columns and their associated values).
For example, consider an external partitioned SALES
                table created from Hive format data on Cloud Object Store:
                  
.../sales/country=USA/year=2020/month=01/file1.csv
.../sales/country=USA/year=2020/month=01/file2.csv
.../sales/country=USA/year=2020/month=02/file3.csv
.../sales/country=USA/year=2020/month=03/file1.csv
.../sales/country=FRA/year=2020/month=03/file1.csv
The Hive format partition information shows the data files in Cloud
                Object Store are partitioned by country,
                    year, and month and the values for
                these partition columns are also specified  within the Hive format  path name for
                each data file (the path name includes values for the partitioned columns:
                    country, year, and
                    month).
                  
The column names in the path will be used by the API to simplify the table definition.
Simple Folder Format Partitioned Data in Cloud Object Store
Partitioned data in Cloud Object Store that is generated in folder format
                is represented in a folder/subfolder format, similar to Hive
                format partitioned data, but the information in the path shows the column values and
                does not include the column names. Also, with folder format partitioned data the
                partition order specified in the object name is significant, and must match the
                order in the table columns.
                  
For example, on Cloud Object Store a folder format data file is stored as follows:
table/partition1_value/partition2_value/*.parquetThe path includes both partition column values, in partition column order, and the data files. Autonomous AI Database allows you to create an external partitioned table from folder format data and you can perform a query using the specified partitions.
Files saved in folder partitioned format provide the data partition column values in
                the file name. Unlike Hive, the paths do not include the column name, therefore the
                column names must be provided. The order of partition columns is important and the
                order in the file name for column partition names must match the order in the
                    partition_columns parameter.
                  
About Querying Partitioned Data in Cloud Object Store
When you query external partitioned data in Hive format, the query
                engine understands and utilizes the partitioning information from the file path
                name. For example, consider an external partitioned SALES table
                where the source file,
                    sales/country=USA/year=2020/month=02/file3.csv on Object
                Store includes the following sales data:
                  
tents, 291
canoes, 22
backpacks, 378The country values in the path name, and the time period
                values for month and year are not specified as
                columns within the data file. The partition column values are specified only in the
                path name with values shown: USA, 2020, and 02. After you create an external
                partitioned table with this data file you can use the partition columns and their
                values when you run a query on the external partitioned table. 
                  
For example:
SELECT year, month, product, units 
     FROM SALES WHERE year='2020' AND month='02' AND country='USA'The benefit of creating an external partitioned table with data generated
                as Hive format partitioned data is that the query engine is optimized to partition
                prune the data to select the correct partition and the query only selects data from
                one partition and only needs to search a single data file. Thus, the query would
                only require a scan of the file3.csv file
                    (/sales/country=USA/year=2020/month=02/file3.csv). For
                large amounts of data, such partition pruning can provide significant performance
                improvements.
                  
Using standard Oracle Database external tables, the partition column must be available as a column within the data file to use it for queries or partition definitions. Without the special handling that is available with external partitioned tables on Autonomous AI Database, this would be a problem if you want to use data stored in Hive format on Cloud Object Store, as you would need to regenerate the data files to include the partition as a column in the data file.
About Creating Partitioned External Tables
When you use unstructured data stored in Hive format on Cloud Object
                Store and you create an external partitioned table, the columns and their types
                cannot be derived from the source file. Thus, the columns and their data types must
                be specified with the column_list parameter. To create the
                partitioned external tables, use the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE to specify the
                partition columns and their types as follows:
                  
- 
The root for the file list is specified in the path name with the file_uri_listparameter. For example,http://.../sales/*
- 
The column names and data types are specified with the column_listparameter.
- 
The option partition_columnsin theformatparameter specifies the partition columns.
- 
The generated DLL includes the columns specified in the path name. 
For this example, when the external table is created the
                    country, year, and month
                columns are added in the column_list parameter. The external table
                is created with the country, year, and
                    month columns, which are not in the data files, and list
                partitions are created enabling partition pruning.
                  
When you use structured data, such as Parquet, Avro, or ORC files stored
                in folder format on Cloud Object Store, the columns and their data types are known.
                and you do not need to specify the column list as is required with unstructured
                data. To create the partitioned external tables, use the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE
                to specify the partition columns and their types as follows:
                  
- The root for the file list is specified in the path name with the
                        file_uri_listparameter. For example,http://.../sales/*
- The column_listparameter is not required for structured files. If you do not specify the column list, you must define the partition columns and their data types when you create the external partitioned table. Use the optionpartition_columnsin theformatparameter to specify the partition columns and their data types.
- The generated DLL includes the columns specified in the path name.
See Query External Partitioned Data with Hive Format Source File Organization and Query External Partitioned Data with Folder Format Source File Organization for complete examples.
- External Partitioning: CSV Source Files with Hive-style Folders
 Shows how to create external partitioned tables with CSV source files stored on Cloud Object Store in Hive-style folders.
- External Partitioning: CSV Source Files with Simple Folders
 Shows how to create external partitioned tables with CSV source files stored on Cloud Object Store in simple folder format.
- External Partitioning: Parquet Source Files with Hive-style Folders
 Shows how to create external partitioned tables with Parquet source files stored on Cloud Object Store in Hive-style folders.
- External Partitioning: Parquet with Simple Folders
 Shows how to create external partitioned tables with Parquet source files stored on Cloud Object Store in simple folder format.
External Partitioning: CSV Source Files with Hive-style Folders
Shows how to create external partitioned tables with CSV source files stored on Cloud Object Store in Hive-style folders.
Source file list:
.../sales/country=USA/year=2020/month=01/file1.csv
.../sales/country=USA/year=2020/month=01/file2.csv
.../sales/country=USA/year=2020/month=02/file3.csv
.../sales/country=USA/year=2020/month=03/file1.csv
.../sales/country=FRA/year=2020/month=03/file1.csvAPI:
DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE (
   table_name        => 'mysales',
   credential_name   => 'mycredential', 
   file_uri_list     => 'https://objectstorage.us-phoenix-1.oraclecloud.com/.../sales/*.csv', 
   column_list       => 'product varchar2(100), units number, country varchar2(100), year number, month varchar2(2)', 
   field_list        => 'product, units', --[Because country, year and month are not in the file, they are not listed in the field list]
   format            => '{"type":"csv","partition_columns":["country", "year", "month"]}');The
partition_columns
            in the format parameter must match the column names found in the path
            (for example, the country column matches
            “country=…”)
                  Parent topic: About External Tables with Source File Partitioning
External Partitioning: CSV Source Files with Simple Folders
Shows how to create external partitioned tables with CSV source files stored on Cloud Object Store in simple folder format.
Source file list:
.../sales/USA/2020/01/file1.csv
.../sales/USA/2020/01/file2.csv
.../sales/USA/2020/02/file3.csv
.../sales/USA/2020/03/file1.csv
.../sales/FRA/2020/03/file1.csvAPI:
DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE(
   table_name        => 'mysales',
   credential_name   => 'mycredential', 
   file_uri_list     =>  'https://objectstorage.us-phoenix-1.oraclecloud.com/.../sales/*.csv', 
   column_list       => 'product varchar2(100), units number, country varchar2(100), year number, month varchar2(2)', 
   field_list        => 'product, units', --[Because country, year and month are not in the file, they are not listed in the field list]
   format            => '{"type":"csv","partition_columns":["country", "year", "month"]}');The API call is the same as in the previous example, but the order of the
partition_columns in the
                format parameter is significant because the column name is not in
            the file path.
                  Parent topic: About External Tables with Source File Partitioning
External Partitioning: Parquet Source Files with Hive-style Folders
Shows how to create external partitioned tables with Parquet source files stored on Cloud Object Store in Hive-style folders.
Source file list:
.../sales/USA/2020/01/file1.parquet
.../sales/USA/2020/01/file2.parquet
.../sales/USA/2020/02/file3.parquet
.../sales/USA/2020/03/file1.parquet
.../sales/FRA/2020/03/file1.parquet
API:
DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE (
   table_name            => 'mysales',
     credential_name     => 'mycredential', 
     file_uri_list       => 'https://objectstorage.us-phoenix-1.oraclecloud.com/.../sales/*.parquet',
     format              => 
         json_object(    'type' value 'parquet', 
               'schema' value 'first',
               'partition_columns' value 
                  json_array(
                      json_object('name' value 'country', 'type' value 'varchar2(100)'),
                      json_object('name' value 'year', 'type' value 'number'),
                      json_object('name' value 'month', 'type' value 'varchar2(2)')
            )
        )
);The
column_list
            parameter is not specified. As shown, for each partition column specify both the name
            and data type in the format parameter
                partition_columns.
                  Parent topic: About External Tables with Source File Partitioning
External Partitioning: Parquet with Simple Folders
Shows how to create external partitioned tables with Parquet source files stored on Cloud Object Store in simple folder format.
Source file list:
.../sales/USA/2020/01/file1.parquet
.../sales/USA/2020/01/file2.parquet
.../sales/USA/2020/02/file3.parquet
.../sales/USA/2020/03/file1.parquet
.../sales/FRA/2020/03/file1.parquetAPI:
DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE (
   table_name        => 'mysales',
   credential_name   => 'mycredential', 
   file_uri_list     => 'https://objectstorage.us-phoenix-1.oraclecloud.com/.../sales/*.parquet', 
   format            => 
     json_object(    'type' value 'parquet', 
                'schema' value 'first',
         'partition_columns' value 
           json_array(
              json_object('name' value 'country', 'type' value 'varchar2(100)'),
              json_object('name' value 'year', 'type' value 'number'),
              json_object('name' value 'month', 'type' value 'varchar2(2)')
            )
        )
);The
column_list
            parameter is not specified. You must include both the name and data type for the
            partition columns. In addition, the order of the partition_columns in
            the format clause matters because the column name is not in the file path.
                  Parent topic: About External Tables with Source File Partitioning