Blog Archives
Create Your Own Oracle TPC-H Playground on Linux
I believe sometimes all of us suffer from the limitations of playing with Oracle’s SH, SCOTT, etc. schemas to generate a sufficiently large playground for our tests. In this post you will find how to create your own TPC-H playground database on Linux.
Download TPC-H Data Generator (dbgen)
TPC as being the council for TPC-H benchmarks delivers a standardized data generation tool for all benchmarks. You can download this tool from http://www.tpc.org/tpch/default.asp (The version I will be using can be downloaded from http://www.tpc.org/tpch/spec/tpch_2_12_0_b5.zip). This bundle contains a bunch of C files to be compiled to form dbgen. Copy the zip file into one of your folders and ensure that your Linux environment has the necessary toolkit to compile C language (gcc, make, etc.)
[oracle@localhost ~]$ mkdir tpch [oracle@localhost ~]$ mv tpch_2_12_0_b5.zip ./tpch [oracle@localhost ~]$ cd tpch/ [oracle@localhost tpch]$ unzip tpch_2_12_0_b5.zip Archive: tpch_2_12_0_b5.zip inflating: build.c inflating: driver.c inflating: bm_utils.c inflating: rnd.c ... [oracle@localhost tpch]$
Build DBGEN
Next thing you should do is to cp makefile.suite file in tpch directory and change some parameters in copied file.
- CC
- DATABASE
- MACHINE
- WORKLOAD
[oracle@localhost tpch]$ cp makefile.suite makefile [oracle@localhost tpch]$ vi make ... ################ ## CHANGE NAME OF ANSI COMPILER HERE ################ CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= ORACLE MACHINE = LINUX WORKLOAD = TPCH ...
Now run make command
[oracle@localhost tpch]$ make chmod 755 update_release.sh ./update_release.sh 2 12 0 gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm
After a successful compilation you should have dbgen executable in your current working director:
[oracle@localhost tpch]$ ./dbgen -h TPC-H Population Generator (Version 2.12.0 build 5) Copyright Transaction Processing Performance Council 1994 - 2010 USAGE: dbgen [-{vf}][-T {pcsoPSOL}] [-s <scale>][-C <procs>][-S <step>] dbgen [-v] [-O m] [-s <scale>] [-U <updates>] Basic Options =========================== -C <n> -- separate data set into <n> chunks (requires -S, default: 1) -f -- force. Overwrite existing files -h -- display this message -q -- enable QUIET mode -s <n> -- set Scale Factor (SF) to <n> (default: 1) -S <n> -- build the <n>th step of the data/update set (used with -C or -U) -U <n> -- generate <n> update sets -v -- enable VERBOSE mode Advanced Options =========================== -b <s> -- load distributions for <s> (default: dists.dss) -d <n> -- split deletes between <n> files (requires -U) -i <n> -- split inserts between <n> files (requires -U) -T c -- generate cutomers ONLY -T l -- generate nation/region ONLY -T L -- generate lineitem ONLY -T n -- generate nation ONLY -T o -- generate orders/lineitem ONLY -T O -- generate orders ONLY -T p -- generate parts/partsupp ONLY -T P -- generate parts ONLY -T r -- generate region ONLY -T s -- generate suppliers ONLY -T S -- generate partsupp ONLY To generate the SF=1 (1GB), validation database population, use: dbgen -vf -s 1 To generate updates for a SF=1 (1GB), use: dbgen -v -U 1 -s 1
Generate Your Playgroun Data
Now you are ready to create your playground database. For those of you who are not familiar with TPC-H model, refer to below Relational Model to have an idea of what it looks like.
Use dbgen to generate a 4G of TPC-H benchmark data. In order to be able to load generated files in parallel by using Oracle External tables, we will be using file split feature of dbgen (Remember that this step might take some time and can be parallelized depending on your CPU & I/O capacity):
[oracle@localhost tpch]$ ./dbgen -s 4 -S 1 -C 8 -v TPC-H Population Generator (Version 2.12.0) Copyright Transaction Processing Performance Council 1994 – 2010 Starting to load stage 1 of 8 for suppliers table…/ Preloading text … 100% done. Starting to load stage 1 of 8 for customers table…done. Starting to load stage 1 of 8 for orders/lineitem tables…done. Starting to load stage 1 of 8 for part/partsupplier tables…done. Generating data for nation tabledone. Generating data for region tabledone. [oracle@localhost tpch]$ ./dbgen -s 4 -S 2 -C 8 -v TPC-H Population Generator (Version 2.12.0) Copyright Transaction Processing Performance Council 1994 – 2010 Starting to load stage 2 of 8 for suppliers table…/ Preloading text … 100% done. Starting to load stage 2 of 8 for customers table…done. Starting to load stage 2 of 8 for orders/lineitem tables…done. Starting to load stage 2 of 8 for part/partsupplier tables…done. Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y done. Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y done. [oracle@localhost tpch]$ ./dbgen -s 4 -S 8 -C 8 -v TPC-H Population Generator (Version 2.12.0) Copyright Transaction Processing Performance Council 1994 – 2010 Starting to load stage 8 of 8 for suppliers table…/ Preloading text … 100% done. Starting to load stage 8 of 8 for customers table…done. Starting to load stage 8 of 8 for orders/lineitem tables…done. Starting to load stage 8 of 8 for part/partsupplier tables…done. Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y done. Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y done.
When you are done with all 8 executions you will have *tbl* files in your current working directory. Those are pipe separated files which you will be loading into your database.
DBGEN Options
–s 4 specifies that we are using a scale factor of 4 meaning that we are generating approximately 4GB of benchmark data. –S 1 instructs dbgen to generate first of 8 chunks. –C 8 is the total number of files for each large dataset (excluding nation and region tables). –v is setting the verbosity for dbgen.
DBGEN Output
In total you will see that all *tbl* files will be approximately 4 GB in size.
[oracle@localhost tpch]$ du -ch *.tbl* | tail -1 4.2G total
A good idea is to compress all those files with gzip so that they will consume minimum disk space and optimize read I/O in case of CPU power abundance.
[oracle@localhost tpch]$ gzip -4 -v *.tbl* customer.tbl.1: 61.4% -- replaced with customer.tbl.1.gz customer.tbl.2: 61.6% -- replaced with customer.tbl.2.gz customer.tbl.3: 61.6% -- replaced with customer.tbl.3.gz … [oracle@localhost tpch]$ du -hc *.tbl.* | tail -1 1.3G total
As you see above I have achieved more than 1:3 compression ratio for all external files on the average.
Create Database Objects & Load External Files
Follow the guide lines for creating permanent and auxiliary database objects.
- Create a TPCH user and give necessary grant to it.
- Create an Oracle DIRECTORY (TPCH_DIR) object to point the directory that contains your *.tbl*.gz files.
- Create the External tables that you will be using as a read source for your load process
- See that for the sake of simplicity I have used VARCHAR2 column for DATE columns those will be casted to DATE type during load operation.
- Use inline gzip decompression with PREPROCESSOR option.
- Create your external tables with proper parallelism to read external files concurrently.
- Remember to use APPEND hint for INSERT … SELECT statements.
- Don’t create PK&FK constraints so that you can load in parallel.
- Load your data from auxiliary External tables to your permanent heap tables.
- Create all necessary constraints on your permanent tables.
- Gather a schema statistics to guide CBO correctly.
Start with Auxiliary Loading Objects
We will start by creating TPCH user and TPCH_DIR directory
create tablespace tpch_ts datafile size 100m autoextend on next 100m nologging; create user tpch identified by tpch temporary tablespace temp default tablespace tpch_ts ; grant connect,resource to tpch; grant create any directory to tpch; conn tpch/tpch create directory tpch_dir as '/home/oracle/tpch'; create directory zcat_dir as '/bin';
Now create your external tables and ensure that all works perfectly:
set timing on spool load_object.log drop table region_ext; drop table nation_ext; drop table supplier_ext; drop table customer_ext; drop table order_ext; drop table part_ext; drop table partsupp_ext; drop table lineitem_ext; CREATE TABLE region_ext (r_regionkey NUMBER(10), r_name varchar2(25), r_comment varchar(152)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('region.tbl.gz')) NOPARALLEL REJECT LIMIT 0 NOMONITORING; CREATE TABLE nation_ext (n_nationkey NUMBER(10), n_name varchar2(25), n_regionkey number(10), n_comment varchar(152)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('nation.tbl.gz')) NOPARALLEL REJECT LIMIT 0 NOMONITORING; CREATE TABLE supplier_ext (S_SUPPKEY NUMBER(10), S_NAME VARCHAR2(25), S_ADDRESS VARCHAR2(40), S_NATIONKEY NUMBER(10), S_PHONE VARCHAR2(15), S_ACCTBAL NUMBER, S_COMMENT VARCHAR2(101)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('supplier.tbl.1.gz','supplier.tbl.2.gz','supplier.tbl.3.gz','supplier.tbl.4.gz', 'supplier.tbl.5.gz','supplier.tbl.6.gz','supplier.tbl.7.gz','supplier.tbl.8.gz') ) PARALLEL 2 REJECT LIMIT 0 NOMONITORING; CREATE TABLE customer_ext (C_CUSTKEY NUMBER(10), C_NAME VARCHAR2(25), C_ADDRESS VARCHAR2(40), C_NATIONKEY NUMBER(10), C_PHONE VARCHAR2(15), C_ACCTBAL NUMBER, C_MKTSEGMENT VARCHAR2(10), C_COMMENT VARCHAR2(117)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('customer.tbl.1.gz','customer.tbl.2.gz','customer.tbl.3.gz','customer.tbl.4.gz', 'customer.tbl.5.gz','customer.tbl.6.gz','customer.tbl.7.gz','customer.tbl.8.gz') ) PARALLEL 2 REJECT LIMIT 0 NOMONITORING; CREATE TABLE order_ext (O_ORDERKEY NUMBER(10), O_CUSTKEY NUMBER(10), O_ORDERSTATUS CHAR(1), O_TOTALPRICE NUMBER, O_ORDERDATE VARCHAR2(10), O_ORDERPRIORITY VARCHAR2(15), O_CLERK VARCHAR2(15), O_SHIPPRIORITY NUMBER(38), O_COMMENT VARCHAR2(79)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('orders.tbl.1.gz','orders.tbl.2.gz','orders.tbl.3.gz','orders.tbl.4.gz', 'orders.tbl.5.gz','orders.tbl.6.gz','orders.tbl.7.gz','orders.tbl.8.gz')) PARALLEL 2 REJECT LIMIT 0 NOMONITORING; CREATE TABLE part_ext (P_PARTKEY NUMBER(10), P_NAME VARCHAR2(55), P_MFGR VARCHAR2(25), P_BRAND VARCHAR2(10), P_TYPE VARCHAR2(25), P_SIZE NUMBER(38), P_CONTAINER VARCHAR2(10), P_RETAILPRICE NUMBER, P_COMMENT VARCHAR2(23)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('part.tbl.1.gz','part.tbl.2.gz','part.tbl.3.gz','part.tbl.4.gz', 'part.tbl.5.gz','part.tbl.6.gz','part.tbl.7.gz','part.tbl.8.gz') ) PARALLEL 2 REJECT LIMIT 0 NOMONITORING; CREATE TABLE partsupp_ext (PS_PARTKEY NUMBER(10), PS_SUPPKEY NUMBER(10), PS_AVAILQTY NUMBER(38), PS_SUPPLYCOST NUMBER, PS_COMMENT VARCHAR2(199)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('partsupp.tbl.1.gz','partsupp.tbl.2.gz','partsupp.tbl.3.gz','partsupp.tbl.4.gz', 'partsupp.tbl.5.gz','partsupp.tbl.6.gz','partsupp.tbl.7.gz','partsupp.tbl.8.gz')) PARALLEL 2 REJECT LIMIT 0 NOMONITORING; CREATE TABLE lineitem_ext (L_ORDERKEY NUMBER(10), L_PARTKEY NUMBER(10), L_SUPPKEY NUMBER(10), L_LINENUMBER NUMBER(38), L_QUANTITY NUMBER, L_EXTENDEDPRICE NUMBER, L_DISCOUNT NUMBER, L_TAX NUMBER, L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE VARCHAR2(10), L_COMMITDATE VARCHAR2(10), L_RECEIPTDATE VARCHAR2(10), L_SHIPINSTRUCT VARCHAR2(25), L_SHIPMODE VARCHAR2(10), L_COMMENT VARCHAR2(44)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY tpch_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR zcat_dir:'zcat' BADFILE 'bad_%a_%p.bad' LOGFILE 'log_%a_%p.log' FIELDS TERMINATED BY '|' MISSING FIELD VALUES ARE NULL) LOCATION ('lineitem.tbl.1.gz','lineitem.tbl.2.gz','lineitem.tbl.3.gz','lineitem.tbl.4.gz', 'lineitem.tbl.5.gz','lineitem.tbl.6.gz','lineitem.tbl.7.gz','lineitem.tbl.8.gz')) PARALLEL 2 REJECT LIMIT 0 NOMONITORING; -- Count all -- 5 rows select count(*) region_count from region_ext; -- 25 rows select count(*) nation_count from nation_ext; -- 40000 rows select count(*) supplier_count from supplier_ext; -- 600000 rows select count(*) customer_count from customer_ext; -- 6000000 rows select count(*) order_count from order_ext; -- 800000 rows select count(*) part_count from part_ext; -- 3200000 rows select count(*) partsupp_count from partsupp_ext; --23996604 rows select count(*) lineitem_count from lineitem_ext; spool off
Continue with Persistent Ones
Next step is to create your persistent objects (heap tables) without PK and FK constraints on them.
set timing on spool heap_objects.log DROP TABLE H_CUSTOMER CASCADE CONSTRAINTS ; DROP TABLE H_LINEITEM CASCADE CONSTRAINTS ; DROP TABLE H_NATION CASCADE CONSTRAINTS ; DROP TABLE H_ORDER CASCADE CONSTRAINTS ; DROP TABLE H_PART CASCADE CONSTRAINTS ; DROP TABLE H_PARTSUPP CASCADE CONSTRAINTS ; DROP TABLE H_REGION CASCADE CONSTRAINTS ; DROP TABLE H_SUPPLIER CASCADE CONSTRAINTS ; CREATE TABLE H_CUSTOMER (c_custkey NUMBER(10) NOT NULL, c_name VARCHAR2(25) NOT NULL, c_address VARCHAR2(40) NOT NULL, c_nationkey NUMBER(10) NOT NULL , c_phone VARCHAR2(15) NOT NULL, c_acctbal NUMBER NOT NULL, c_mktsegment VARCHAR2(10) NOT NULL, c_comment VARCHAR2(117) NOT NULL) PARALLEL 2; CREATE TABLE H_LINEITEM (l_orderkey NUMBER(10) NOT NULL, l_partkey NUMBER(10) NOT NULL, l_suppkey NUMBER(10) NOT NULL , l_linenumber INTEGER NOT NULL , l_quantity NUMBER NOT NULL, l_extendedprice NUMBER NOT NULL, l_discount NUMBER NOT NULL, l_tax NUMBER NOT NULL, l_returnflag CHAR(1) NOT NULL , l_linestatus CHAR(1) NOT NULL, l_shipdate DATE NOT NULL, l_commitdate DATE NOT NULL, l_receiptdate DATE NOT NULL, l_shipinstruct VARCHAR2(25) NOT NULL, l_shipmode VARCHAR2(10) NOT NULL, l_comment VARCHAR2(44) NOT NULL) PARALLEL 2; CREATE TABLE H_NATION (n_nationkey NUMBER(10) NOT NULL, n_name VARCHAR2(25) NOT NULL, n_regionkey NUMBER (10) NOT NULL, n_comment VARCHAR2 (152) NOT NULL) NOPARALLEL; CREATE TABLE H_ORDER (o_orderkey NUMBER (10) NOT NULL, o_custkey NUMBER(10) NOT NULL, o_orderstatus CHAR(1) NOT NULL, o_totalprice NUMBER NOT NULL, o_orderdate DATE NOT NULL, o_orderpriority VARCHAR2(15) NOT NULL, o_clerk VARCHAR2(15) NOT NULL, o_shippriority INTEGER NOT NULL, o_comment VARCHAR2(79) NOT NULL) PARALLEL 2; CREATE TABLE H_PART (p_partkey NUMBER(10) NOT NULL, p_name VARCHAR2(55) NOT NULL, p_mfgr VARCHAR2(25) NOT NULL, p_brand VARCHAR2(10) NOT NULL, p_type VARCHAR2(25) NOT NULL, p_size INTEGER NOT NULL, p_container VARCHAR2(10) NOT NULL, p_retailprice NUMBER NOT NULL, p_comment VARCHAR2(23) NOT NULL) PARALLEL 2; CREATE TABLE H_PARTSUPP (ps_partkey NUMBER (10) NOT NULL , ps_suppkey NUMBER (10) NOT NULL , ps_availqty INTEGER NOT NULL, ps_supplycost NUMBER NOT NULL, ps_comment VARCHAR2 (199) NOT NULL) PARALLEL 2; CREATE TABLE H_REGION (r_regionkey NUMBER (10) NOT NULL , r_name VARCHAR2 (25) NOT NULL, r_comment VARCHAR2 (152) NOT NULL) NOPARALLEL; CREATE TABLE H_SUPPLIER (s_suppkey NUMBER (10) NOT NULL , s_name VARCHAR2 (25) NOT NULL, s_address VARCHAR2 (40) NOT NULL, s_nationkey NUMBER (10) NOT NULL , s_phone VARCHAR2 (15) NOT NULL, s_acctbal NUMBER NOT NULL, s_comment VARCHAR2 (101) NOT NULL) PARALLEL 2; spool off
Load your Data using INSERT … SELECT
Now load your data using INSERT … SELECT statements in parallel from external tables to your heap tables
set timing on spool load_data.log truncate table h_lineitem; truncate table h_order; truncate table h_part; truncate table h_customer; truncate table h_nation; truncate table h_region; truncate table h_partsupp; truncate table h_supplier; alter session enable parallel dml; insert /*+append*/into h_lineitem select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, to_date(L_SHIPDATE, 'YYYY-MM-DD'), to_date(L_COMMITDATE, 'YYYY-MM-DD'), to_date(L_RECEIPTDATE, 'YYYY-MM-DD'), L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT from lineitem_ext; insert /*+append*/ into h_partsupp select * from partsupp_ext; insert /*+append*/ into h_part select * from part_ext; insert /*+append*/ into h_order select o_orderkey, o_custkey, o_orderstatus, o_totalprice, to_date(o_orderdate, 'YYYY-MM-DD'), O_ORDERPRIORITY, o_clerk, O_SHIPPRIORITY, o_comment from order_ext; insert /*+append*/ into h_customer select * from customer_ext; insert /*+append*/ into h_supplier select * from supplier_ext; insert /*+append*/ into h_nation select * from nation_ext; insert /*+append*/ into h_region select * from region_ext; commit; spool off
Build you Constraints
Next step is to build your Primary & Foreign Key contraints
set timing on spool constraints.log -- PK Constraints ALTER TABLE H_REGION ADD CONSTRAINT REGION_PK PRIMARY KEY (r_regionkey); ALTER TABLE H_NATION ADD CONSTRAINT NATION_PK PRIMARY KEY (n_nationkey); ALTER TABLE H_SUPPLIER ADD CONSTRAINT SUPPLIER_PK PRIMARY KEY (s_suppkey); create unique index partsupp_pk on h_partsupp(ps_partkey,ps_suppkey) parallel 2; ALTER TABLE H_PARTSUPP ADD CONSTRAINT PARTSUPP_PK PRIMARY KEY(ps_partkey,ps_suppkey) using index PARTSUPP_PK; create unique index PART_PK on H_PART(p_partkey) parallel 2; ALTER TABLE H_PART ADD CONSTRAINT PART_PK PRIMARY KEY (p_partkey) using index PART_PK; create unique index ORDERS_PK on H_ORDER(o_orderkey) parallel 2; ALTER TABLE H_ORDER ADD CONSTRAINT ORDERS_PK PRIMARY KEY (o_orderkey) using index ORDERS_PK; create unique index LINEITEM_PK on H_LINEITEM(l_linenumber, l_orderkey) parallel 2; ALTER TABLE H_LINEITEM ADD CONSTRAINT LINEITEM_PK PRIMARY KEY (l_linenumber, l_orderkey) using index LINEITEM_PK; create unique index CUSTOMER_PK on H_CUSTOMER(c_custkey) parallel 2; ALTER TABLE H_CUSTOMER ADD CONSTRAINT CUSTOMER_PK PRIMARY KEY (c_custkey) using index CUSTOMER_PK; -- FK Constraints ALTER TABLE H_LINEITEM ADD CONSTRAINT LINEITEM_PARTSUPP_FK FOREIGN KEY (l_partkey, l_suppkey) REFERENCES H_PARTSUPP(ps_partkey, ps_suppkey) NOT DEFERRABLE; ALTER TABLE H_ORDER ADD CONSTRAINT ORDER_CUSTOMER_FK FOREIGN KEY (o_custkey) REFERENCES H_CUSTOMER (c_custkey) NOT DEFERRABLE; ALTER TABLE H_PARTSUPP ADD CONSTRAINT PARTSUPP_PART_FK FOREIGN KEY (ps_partkey) REFERENCES H_PART (p_partkey) NOT DEFERRABLE; ALTER TABLE H_PARTSUPP ADD CONSTRAINT PARTSUPP_SUPPLIER_FK FOREIGN KEY (ps_suppkey) REFERENCES H_SUPPLIER (s_suppkey) NOT DEFERRABLE; ALTER TABLE H_SUPPLIER ADD CONSTRAINT SUPPLIER_NATION_FK FOREIGN KEY (s_nationkey) REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE; ALTER TABLE H_CUSTOMER ADD CONSTRAINT CUSTOMER_NATION_FK FOREIGN KEY (c_nationkey) REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE; ALTER TABLE H_NATION ADD CONSTRAINT NATION_REGION_FK FOREIGN KEY (n_regionkey) REFERENCES H_REGION (r_regionkey) NOT DEFERRABLE; ALTER TABLE H_LINEITEM ADD CONSTRAINT LINEITEM_ORDER_FK FOREIGN KEY (l_orderkey) REFERENCES H_ORDER (o_orderkey) NOT DEFERRABLE; spool off
Gather Statistics
Final step is to gather CBO statistics so that you will have, hopefully, better execution plans
begin dbms_stats.gather_schema_stats(ownname => 'TPCH', degree => 2, cascade => true); end; /
Conclusion
Now you are done you. You can either execute infamous TPC-H queries (you can find in TPC-H documentation) or your own test cases.