Using Big Data Connectors
You can use Oracle Big Data Connectors and Oracle Copy to Hadoop (a feature of Big Data SQL) to load data from an Big Data Service cluster into an Oracle Cloud database instance and to copy from an Oracle Cloud database instance to a Big Data Service cluster. The database can be an Oracle Autonomous Database or a co-managed Oracle Database service, as shown in the following table:
Type of Database | Features Supported for Copying Data |
---|---|
Oracle Autonomous Database | You can use the following with Oracle Shell for Hadoop Loaders
(OHSH) to copy data between a Big Data Service cluster and an Autonomous Database instance.
|
Co-managed Oracle Database | You can use the following with Oracle Shell for Hadoop Loaders to
copy data between a Big Data Service cluster and a
co-managed Oracle Database instance.
|
Features
Big Data connectors and features are pre-installed on your Big Data Service clusters. The Copy to Hadoop feature of Oracle Big Data SQL is also already installed on your cluster.
The following features are pre-installed on every node of your cluster:
-
Oracle Shell for Hadoop Loaders
Oracle Shell for Hadoop Loaders (OHSH) is a helper shell that provides a simple-to-use command line interface to Oracle Loader for Hadoop, Oracle SQL Connector for HDFS, and Copy to Hadoop.
-
Copy to Hadoop
Copy to Hadoop (CP2HADOOP) is a feature of Oracle Big Data SQL, for copying data from an Oracle database to HDFS.
-
Oracle Loader for Hadoop
Oracle Loader for Hadoop (OLH) is a high-performance loader for loading data from a Hadoop cluster into a table in an Oracle database.
-
Oracle SQL Connector for Hadoop Distributed File System (HDFS)
Oracle SQL Connector for HDFS (OSCH) enables an Oracle external table to access data stored in HDFS files or in a table in Apache Hive. Use this connector only for loading data into a co-managed Oracle Database service.
Note
Oracle SQL Connector for HDFS is supported only for connecting to a co-managed Oracle Database service. It is not supported for connecting to Oracle Autonomous Database. -
Oracle Instant Client for Linux
Oracle Instant Client enables development and deployment of applications that connect to Oracle Database.
Set TNS Settings for Connecting to a Database
-
Download Client Credentials from the Autonomous Database console and unzip it to the
/opt/oracle/bdc-test/dbwallet/client
directory. -
Change to the directory where you unzipped the file.
cd /opt/oracle/bdc-test/dbwallet/client
-
Edit
sqlnet.ora
and change theWALLET_LOCATION
parameter to the path/opt/oracle/bdc-test/dbwallet/client
.For example:
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/opt/oracle/bdc-test/dbwallet/client")))
-
Create a file called
connection.properties
in this directory and include the following properties:javax.net.ssl.trustStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso javax.net.ssl.trustStoreType=SSO javax.net.ssl.keyStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso javax.net.ssl.keyStoreType=SSO
-
Test to verify database connectivity using your Autonomous Database wallet configuration, as follows:
-
Get
tns names
fromtnsnames.ora
. For example:myuseradw_high = ( -- configuration )
-
Run the following commands and enter the admin password when prompted:
sqlplus admin@<tnsentry_name>
For example:
export TNS_ADMIN=/opt/oracle/bdc-test/dbwallet/client/ sqlplus admin@myuseradw_high
-
-
Download the
tnsnames.ora
file for a co-managed Oracle Database service and copy it to the/opt/oracle/bdc-test/dbwallet/client
directory. -
If SSL is enabled on the co-managed database, copy the
sqlnet.ora
andcwallet.sso
files to the/opt/oracle/bdc-test/dbwallet/client
directory. -
Edit
sqlnet.ora
and change theWALLET_LOCATION
parameter to the path/opt/oracle/bdc-test/dbwallet/client
. -
Create a file called
connection.properties
in this directory and include the following properties:javax.net.ssl.trustStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso javax.net.ssl.trustStoreType=SSO javax.net.ssl.keyStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso javax.net.ssl.keyStoreType=SSO
-
Test to verify database connectivity as follows:
-
Get
tns names
fromtnsnames.ora
. For example:myuseradw_high = ( -- configuration )
-
Run the following commands and enter the admin password when prompted:
sqlplus admin@<tnsentry_name>
For example:
export TNS_ADMIN=/opt/oracle/bdc-test/dbwallet/client/ sqlplus admin@myuseradw_high
-
Configuring Big Data Service to Run Connector Examples
Before running the examples included with your Oracle Shell for Hadoop Loaders
installation, perform the following configuration steps. Use the oracle
operating system user.
The examples are included in the Oracle Shell for Hadoop Loaders installation on your
cluster. Unzip the file examples.zip
into
/opt/oracle/bdc-test
directory.
For example:
cd /opt/oracle/bdc-test
unzip /opt/oracle/ohsh-version/examples.zip -d /opt/oracle/bdc-test
-
Change to the
/opt/oracle/bdc-test/examples
directory and perform the following steps. Seeexamples/README.txt
for additional information. -
The environment file
bdc_env.sh
is pre-configured with the environment variables required to run OHSH. Run the following command to configure the environment on your cluster node.source /opt/oracle/bdc-test/bdc_env.sh
-
Log on to SQL*Plus as
sysdba
, create a database user which will be used to run the Oracle Loader for Hadoop examples, and grant the required privileges, as shown in the following examples:Example for Co-Managed Database Service:
--<ServiceName>
points to the co-managed Database service TNS entry name you configured in Set TNS Settings for Connecting to a Database.sqlplus sys@<ServiceName> as sysdba sqlplus sys@PDB1 as sysdba create user ohsh_examples identified by <example-password> quota unlimited on users; alter user ohsh_examples default tablespace users; grant create session, alter session, create table, create view, create procedure, create type, create sequence to ohsh_examples;
Example for Autonomous Database:
--<ServiceName>
points to your ADB TNS entry name configured in Set TNS Settings for Connecting to a Database.sqlplus admin@<ServiceName> sqlplus admin@myuseradw_high create user ohsh_examples identified by <example-password> quota unlimited on data; alter user ohsh_examples default tablespace data; grant create session, alter session, create table, create view, create procedure, create type, create sequence to ohsh_examples;
Follow these steps if you are using a co-managed Oracle Database service only.
-
Connect to the co-managed Database service node using SSH and copy the
/opt/oracle/bdc-test/examples/sysdba_configure.sql
script. -
Sign in as the
oracle
operating system user. -
Connect as the
sysdba
user for the pluggable database (PDB) and runsysdba_configure.sql <USER_NAME>
, where<USER_NAME>
is the database user you created in Step 2: Create the OHSH_EXAMPLES User, above. Note that the user name must be all uppercase.Example:
@sysdba_configure.sql OHSH_EXAMPLES
Thesysdba_configure.sql
creates two database directory objects required to run some examples:OLHP_DEFAULT_DIR
points to/tmp/olhp_default
OLHP_STAGE_DIR
points to/tmp/olhp_stage
-
Before running the examples, verify the above directories exist at the OS level. Both directories must be owned by the
oracle
OS user, with read and write permissions. If these directories don't exist, you can create them using the following commands:sudo su oracle mkdir /tmp/olhp_default mkdir /tmp/olhp_stage
You must configure the co-managed Database service node in order to run the examples, as shown below. See Oracle Big Data Connectors User's Guide, section Installing and Configuring a Hadoop Client on the Oracle Database System for more details.
- Generate Oracle SQL Connector for HDFS zip file on the cluster node and copy to the
database node. Example:
cd /opt/oracle zip -r /tmp/orahdfs-<version>.zip orahdfs-<version>/*
- Unzip the Oracle SQL Connector for HDFS zip file on the database node.
Example:
mkdir -p /u01/misc_products/bdc unzip orahdfs-<version>.zip -d /u01/misc_products/bdc
- Install the Hadoop client on the database node in the
/u01/misc_products/
directory. -
Connect as the
sysdba
user for the PDB and verify that bothOSCH_BIN_PATH
andOSCH_DEF_DIR
database directories exist and point to valid operating system directories. For example,create or replace directory OSCH_BIN_PATH as '/u01/misc_products/bdc/orahdfs-<version>/bin'; grant read,execute on directory OSCH_BIN_PATH to OHSH_EXAMPLES;
where
OHSH_EXAMPLES
is the user created in Step 2: Create the OHSH_EXAMPLES User, above.create or replace directory OSCH_DEF_DIR as '/u01/misc_products/bdc/xtab_dirs'; grant read,write on directory OSCH_DEF_DIR to OHSH_EXAMPLES;
Note: create the
xtab_dirs
operating system directory if it doesn't exist. -
Change to your
OSCH
(Oracle SQL Connector for HDFS) installation directory, and edit the configuration filehdfs_stream
. For example,sudo su -l oracle cd /u01/misc_products/bdc/orahdfs-<version> vi bin/hdfs_stream
Check that the following variables are configured correctly. Read the instructions included in the
hdfs_stream
file for more details.#Include Hadoop client bin directory to the PATH variable export PATH=/u01/misc_products/hadoop-<version>/bin:/usr/bin:/bin export JAVA_HOME=/usr/java/jdk<version> #See explanation below export HADOOP_CONF_DIR=/u01/misc_products/hadoop-conf #Activate the Kerberos configuration for secure clusters export HADOOP_CLIENT_OPTS="-Djava.security.krb5.conf=/u01/misc_products/krb5.conf"
-
Configure the Hadoop configuration directory (
HADOOP_CONF_DIR
).If it's not already configured, use Apache Ambari to download the Hadoop Client configuration archive file, as follows:
- Login to Apache
Ambari.
https://<ambari-host>:7183/#/main/services/BIGDATASQL/summary
-
Click the HDFS service, and select the action Download Client Configuration.
-
Extract the files under the
HADOOP_CONF_DIR
(/u01/misc_products/hadoop-conf
) directory. Ensure that the hostnames and ports configured inHADOOP_CONF_DIR/core-site.xml
are accessible from your co-managed Database service node (see the steps below). For example,<property> <name>fs.defaultFS</name> <value>hdfs://bdsmyhostmn0.bmbdcsxxx.bmbdcs.myvcn.com:8020</value> </property>
In this example host
bdsmyhostmn0.bmbdcsxxx.bmbdcs.myvcn.com
and port8020
must be accessible from your co-managed Database service node. - For secure clusters:
- Copy the Kerberos configuration file from the cluster node to the database node. Example:
cp krb5.conf /u01/misc_products/
- Copy the Kerberos
keytab
file from the cluster node to the database node. Example:cp <kerberos-user-keytab> /u01/misc_products/
- Copy the Kerberos configuration file from the cluster node to the database node. Example:
- Login to Apache
Ambari.
-
Run the following commands to verify that HDFS access is working.
#Change to the Hadoop client bin directory cd /u01/misc_products/hadoop-<version>/bin #--config points to your HADOOP_CONF_DIR directory. ./hadoop --config /u01/misc_products/hadoop-conf fs -ls
This command should list the HDFS contents. If you get a timeout or "no route to host" or "unknown host" errors, you will need to update your
/etc/hosts
file and verify your Big Data Service Console network configuration, as follows:-
Sign into the Cloud Console, click Big Data, then Clusters, then <your_cluster>, then Cluster Details.
-
Under the List of cluster nodes section, get the fully qualified name of all your cluster nodes and all the IP addresses .
-
Edit your co-managed Database service configuration file
/etc/hosts
, for example:#BDS hostnames xxx.xxx.xxx.xxx bdsmynodemn0.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodemn0 xxx.xxx.xxx.xxx bdsmynodewn0.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodewn0 xxx.xxx.xxx.xxx bdsmynodewn2.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodewn2 xxx.xxx.xxx.xxx bdsmynodewn1.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodewn1 xxx.xxx.xxx.xxx bdsmynodeun0.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodeun0
-
Verify that your Network Ingress Rules are configured correctly. Click Big Data, then Clusters, then <your_cluster>, then Network Information, then <your_subnet>, then <your_security_list>.
Verify that the following ingress rules are configured.
- Configure an ingress rule to allow TCP network traffic using the HDFS port which
was configured in the
fs.defaultFS
property (see step 3, above). -
Configure an ingress rule to allow TCP network traffic for the datanode using the port number of the property
dfs.datanode.address
.To confirm the port number, in Apache Ambari click Services, then HDFS, and then Configs.
-
In the Properties filter box, search for the
dfs.datanode.address
, for example:Source IP Protocol Source Port Range Destination Port Range ----------------------------------------------------------------------- 111.111.0.0/16 TCP All 8020 111.111.0.0/24 TCP All 50010
If no ingress rules are configured, click Add Ingress Rule to create a new rule.
- Configure an ingress rule to allow TCP network traffic using the HDFS port which
was configured in the
-
On the Big Data Service utility node, edit
/opt/oracle/bdc-test/examples/setresources.ohsh
and complete the
following configuration.
-
Configure local directories.
-
For a co-managed Database service:
set defaultdirectory OLHP_DEFAULT_DIR set locationdirectory OLHP_STAGE_DIR
-
For Autonomous Database Service, comment the previous lines (that is, append with
#
). Local directories are not available:#set defaultdirectory OLHP_DEFAULT_DIR #set locationdirectory OLHP_STAGE_DIR
-
-
Configure OHSH resources.
Configure the
connect_id
andconnectiondir
parameters according to your database service (co-managed Database service or Autonomous Database) which was configured as described in Set TNS Settings for Connecting to a Database.<TNS_ADMIN>
points to the wallet location path.Co-managed Database service example:
# SQL*Plus resource create sqlplus resource sql0 connectid="PDB1" # JDBC resource create oracle jdbc resource jdbc0 connectid="PDB1"
Autonomous Database example:
# SQL*Plus resource create sqlplus resource sql0 connectid="myuseradw_high" # JDBC resource create oracle jdbc resource jdbc0 connectiondir=<TNS_ADMIN> \ connectid="myuseradw_high"
-
Configure Oracle Shell for Hadoop Loaders resources to use Hadoop Credential Store
The previous examples will prompt for the database user password before running the examples. You can create a Hadoop credential store to protect your database password and avoid being prompted each time.
To enable this feature, run the following commands:
-
Create a Hadoop credential store.
hadoop credential create pw_alias -provider \ "jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks"
Enter the database user password when prompted. This command will store the alias
pw_alias
in a jceks file under the OHSH examples directory. -
Configure OHSH resources using the credential store.
Autonomous Database:
# SQL*Plus resource create sqlplus resource sql0 \ user=OHSH_EXAMPLES passwordalias="pw_alias" \ provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ connectid="myuseradw_high" # JDBC resource create oracle jdbc resource jdbc0 \ user=OHSH_EXAMPLES passwordalias="pw_alias" \ provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ connectiondir=<TNS_ADMIN> connectid="myuseradw_high"
Co-managed Database service:
# SQL*Plus resource create sqlplus resource sql0 \ user=OHSH_EXAMPLES passwordalias="pw_alias" \ provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ connectid="PDB1" # JDBC resource create oracle jdbc resource jdbc0 \ user=OHSH_EXAMPLES passwordalias="pw_alias" \ provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ connectid="PDB1"
-
-
Run the ohsh -help command to make sure everything was configured correctly. You should get something like the following example.
$ cd /opt/oracle/bdc-test/ $ source bdc_env.sh $ cd examples/ $ ../../ohsh-<version>/bin/ohsh -h Oracle Shell for Hadoop Loaders Release 5.1.2 - Production (Build:20211008105544) Copyright (c) 2016, 2021, Oracle and/or its affiliates. All rights reserved. Oracle Loader for Hadoop (5.1.2), Oracle SQL Connector for HDFS (5.1.2), The Copy to Hadoop feature of Oracle Big Data SQL (5.1.2) enabled. usage: ohsh [-i <FILE>] [-f <FILE>] [-c] [-h] -i <FILE> initialization script -f <FILE> main script -c hadoop configuration -h help
-
Run the
setresources
script.ohsh>@setresources
-
Run the following command to confirm the available resources.
ohsh>show resources jdbc0 : Oracle JDBC Resource user=OHSH_EXAMPLES connectId=PDB1 sql0 : Oracle SQL*Plus Command Line Resource user=OHSH_EXAMPLES connectId=PDB1 sysdba=false hadoop0 : Hadoop Command Line Resource HADOOP_CONF_DIR = /etc/hadoop/conf hive0 : Hive Command Line Resource HIVE_CONF_DIR = /etc/hive/conf Connect to jdbc:hive2://<hiveserver2-host>:<port>/default;ssl=true; sslTrustStore=<trust-store-path>; trustStorePassword=<trust-store-password> hadoop resource id = hadoop0 bash0 : Bash Command Line Resource path = /bin/bash
- Before running the examples, you must setup the required environment for OHSH. The
bdc_env.sh
file is pre-configured with the environment variables required by OHSH.$ cd /opt/oracle/bdc-test/; source bdc_env.sh
- Launch OHSH from the examples directory for ease of running example scripts.
$ cd /opt/oracle/bdc-test/examples/ $ ../../ohsh-<version>/bin/ohsh
-
Run OHSH to create Oracle tables that will be loaded from content in HDFS.
ohsh>@create_oracle_tables
-
Run the following command to show the Oracle tables.
ohsh>%sql0 select table_name from tabs; TABLE_NAME -------------------------- OHSH_FIVDTI OHSH_FIVDTI_REVERSE OHSH_FIVDTI_PART OHSH_CP2HADOOP_FIVDTI
-
Run OHSH to populate HDFS with delimited files that will serve as content to load into Oracle.
ohsh>@create_hdfs_data_files
-
If you run this example several times, you'll need to do some cleanup to avoid duplicate hdfs directory errors. Edit
create_hdfs_data_files
and add the following lines:# Environment variable HADOOP_USER is set from OHSH #Add this command to remove the ohshdata %hadoop0 fs -rm -r -f /user/oracle/ohshdata
-
If you get permission errors for writing files on HDFS, add the following lines:
# Add this command before the %hadoop0 fs -put command. %hadoop0 fs -chmod 777 /user/${HADOOP_USER}/ohshdata/hive_fivdti %hadoop0 fs -put data/fivdti_part0.dat /user/${HADOOP_USER}/ohshdata/fivdti
-
Running the Connector Examples
Run these examples to see different ways to load data.
For a co-managed Oracle Database service, you can run the following examples:
ohsh>@load_jdbc
ohsh>@load_directpath
ohsh>@load_exttab
For Oracle Autonomous Database, you can run the following examples
ohsh>@load_jdbc
ohsh>@load_directpath
The following error may indicate a problem with how the environment variable
HADOOP_CLASSPATH
was set:
Error: oracle.hadoop.smartloader.api.SmartLoaderException: OLH error occured:
oracle.hadoop.loader.OraLoaderException: Class oracle.hadoop.loader.OraLoader was expected
to come from /opt/oracle/oraloader-<version>/jlib/oraloader.jar,
not /opt/oracle/orahdfs-<version>/jlib/oraloader.jar.
To disable this check, set the configuration property
oracle.hadoop.loader.enforceClasspath
to false
. Add the
following property in OHSH_HOME/conf/smartloader-conf.xml
:
<property> <name>oracle.hadoop.loader.enforceClasspath</name> <value>false</value> </property>
For a co-managed Oracle Database service, you can run the following examples:
ohsh>@load_jdbcimapreverse.ohsh
ohsh>@load_exttabimapreverse.ohsh
For Oracle Autonomous Database, you can run the following example:
ohsh>@load_jdbcimapreverse.ohsh
For a co-managed Oracle Database service, you can run the following examples:
ohsh>@load_hivejdbc.ohsh --partitioned tables
ohsh>@load_directpath_hive2parts.ohsh
ohsh>@load_directpath_hiveallparts.ohsh
ohsh>@load_hiveexttab.ohsh
ohsh>@load_exttab_hive2parts.ohsh
ohsh>@load_exttab_hiveallparts.ohsh
For Oracle Autonomous Database, you can run the following examples:
ohsh>@load_hivejdbc.ohsh --partitioned tables
ohsh>@load_directpath_hive2parts.ohsh
ohsh>@load_directpath_hiveallparts.ohsh
For Oracle Autonomous Database, you can run the following examples:
ohsh>@create_hivetable_empdpserde.ohsh
ohsh>@createreplace_directcopy.ohsh
ohsh>@load_directcopy.ohsh
For Oracle Autonomous Database, you can run the following example:
ohsh> @create_hivetable_empdpserde.ohsh
If you run the create_hivetable_empdpserde.ohsh
example several times, you
will need to do some cleanup to avoid duplicate file errors. as follows:
Edit create_hivetable_empdpserde.ohsh
, and add the following line:
#Add this command before copying the emp.dmp file
%hadoop0 fs -rm -r -f /user/${HADOOP_USER}/ohshdata/hive_ohsh_emp_dpserde/emp.dmp
%hadoop0 fs -put data/emp.dmp /user/${HADOOP_USER}/ohshdata/hive_ohsh_emp_dpserde