ORA_HASH and CLOB problem

This article focus on ORA_HASH function in Oracle. This function computes a hash value for a given expression.

SELECT
  ORA_HASH('test') oh,
  ORA_HASH('test1') oh1
FROM dual;

        OH        OH1
---------- ----------
2662839991 1531096017

Unfortunately for CLOB expression it computes hash  value randomly. Each iteration can give different result

SELECT
  ORA_HASH('test') oh,
  ORA_HASH(to_clob('test')) ohc,
  ORA_HASH('test1') oh1,
  ORA_HASH(to_clob('test1')) ohc1
FROM dual;

        OH        OHC        OH1       OHC1
---------- ---------- ---------- ----------
2662839991 3191322237 1531096017 2109387560

SELECT
  ORA_HASH('test') oh,
  ORA_HASH(to_clob('test')) ohc,
  ORA_HASH('test1') oh1,
  ORA_HASH(to_clob('test1')) ohc1
FROM dual;

        OH        OHC        OH1       OHC1
---------- ---------- ---------- ----------
2662839991 1732521620 1531096017 2558490382

ORA_HASH can only work with 4k characters or CLOB but for CLOB as proved previously returns different values per SQL run.

--fails for VARCHAR2 > 4k
set serveroutput on
DECLARE
  l_varchar2_32k  VARCHAR2(32767) := LPAD(' ', 32767, ' ');
  l_ora_hash      NUMBER;
BEGIN
  SELECT ORA_HASH(l_varchar2_32k) 
    INTO l_ora_hash
    FROM dual;
  
  dbms_output.put_line(l_ora_hash);
END;
/

ORA-01460: unimplemented or unreasonable conversion requested

In PL/SQL all calls for the same expression returns the same value but for another PL/SQL call they are different

set serveroutput on
DECLARE
  l_clob          CLOB := LPAD(' ', 32767, ' ');
  l_clob1         CLOB := LPAD(' ', 32767, ' ');
  l_ora_hash      NUMBER;
BEGIN
  SELECT ORA_HASH(l_clob) 
    INTO l_ora_hash
    FROM dual;
  
  dbms_output.put_line(l_ora_hash);

  SELECT ORA_HASH(l_clob1) 
    INTO l_ora_hash
    FROM dual;
  
  dbms_output.put_line(l_ora_hash);
END;
/

2768182392
2768182392

set serveroutput on
DECLARE
  l_clob          CLOB := LPAD(' ', 32767, ' ');
  l_clob1         CLOB := LPAD(' ', 32767, ' ');
  l_ora_hash      NUMBER;
BEGIN
  SELECT ORA_HASH(l_clob) 
    INTO l_ora_hash
    FROM dual;
  
  dbms_output.put_line(l_ora_hash);

  SELECT ORA_HASH(l_clob1) 
    INTO l_ora_hash
    FROM dual;
  
  dbms_output.put_line(l_ora_hash);
END;
/

52202303
52202303

Solution is to write own PL/SQL function that can work with CLOB and can calculate hash value using original ORA_HASH function. Idea is simple chunk CLOB on many 4k parts calculate ORA_HASH for each part and concatenate it as CLOB.

CREATE OR REPLACE FUNCTION ora_hash_clob
(
  p_clob IN CLOB
)
RETURN CLOB
IS
  TYPE t_ora_hash_tab  IS TABLE OF NUMBER INDEX BY binary_integer;
  l_ora_hash_tab       t_ora_hash_tab;
  l_line               VARCHAR2(4000);
  l_ora_hash_key       NUMBER;
  l_ora_hash_clob_key  CLOB;
BEGIN
  FOR i in 1 .. ceil(length(p_clob)/4000)
  LOOP
    l_line := to_char(substr(p_clob, (i-1)*4000+1,4000));

    SELECT ora_hash(l_line) 
      INTO l_ora_hash_key 
      FROM dual;

    l_ora_hash_tab(i) := l_ora_hash_key;
  END LOOP;
  
  FOR i IN 1..l_ora_hash_tab.count
  LOOP
    l_ora_hash_clob_key := l_ora_hash_clob_key || 
                           to_clob(l_ora_hash_tab(i));
  END LOOP;
  
  RETURN l_ora_hash_clob_key;
END;

Now it returns the same hash value for CLOB and VARCHAR for small expressions <= 4k. Second columns for each iteration returns different value.

SELECT
ORA_HASH_CLOB(to_clob('small string')) my_ohc,
ORA_HASH(to_clob('small string')) ohc,
ORA_HASH('small string') oh
FROM dual;

    MY_OHC        OHC         OH
---------- ---------- ----------
1163999085 2204313295 1163999085

Additionally it works with expressions bigger than 4k and returns always the same hash value

SELECT
 ORA_HASH_CLOB(to_clob(lpad(' ', 4000, ' '))||to_clob(' ')) my_ohc
FROM dual;

MY_OHC 
-----------------------
23879004511715534958

SELECT
 ORA_HASH_CLOB(to_clob(lpad(' ', 4000, ' '))||to_clob(' ')) my_ohc
FROM dual;

MY_OHC 
-----------------------
23879004511715534958

DBMS_CRYPTO

Nice alternative for ORA_HASH is DBMS_CRYPTO.HASH. It works nicely with CLOB values.

Each iteration returns the same value

SELECT 
 dbms_crypto.HASH(to_clob('test'),1) ohc,
 dbms_crypto.HASH(to_clob('test1'),1) ohc1
FROM dual;

OHC                              OHC1
-------------------------------- --------------------------------
DB346D691D7ACC4DC2625DB19F9E3F52 F79E002AC163078C673FA2C321E5E66F

SELECT 
 dbms_crypto.HASH(to_clob('test'),1) ohc,
 dbms_crypto.HASH(to_clob('test1'),1) ohc1
FROM dual;

OHC                              OHC1
-------------------------------- --------------------------------
DB346D691D7ACC4DC2625DB19F9E3F52 F79E002AC163078C673FA2C321E5E66F

Works with big CLOB and returns unique value

SELECT 
    dbms_crypto.hash(to_clob(lpad(' ', 4000, ' '))
                     ||to_clob(' '), 1) my_ohc 
  FROM dual;

MY_OHC
--------------------------------
FA865287791B369C89AFDAFE25942E69

SELECT 
 dbms_crypto.hash(to_clob(lpad(' ', 4000, ' '))
 ||to_clob(' '), 1) my_ohc 
 FROM dual;

MY_OHC
--------------------------------
FA865287791B369C89AFDAFE25942E69

Have a fun 🙂

Tomasz

9 thoughts on “ORA_HASH and CLOB problem

  1. The code doesn’t compile as given. l_ora_hash_final_key is never declared. It returns close to what is described numerically when that variable is changed to l_ora_hash_clob_key. However this is a CLOB and not the NUMBER desired.

  2. My advices:
    Add DETERMINISTIC qualifier to any hash key calculation function.
    Add PRAGMA UDF; too.
    You could replace the first loop by a single hierarchical SQL query for far better performances.
    You could also only have one loop, so you won’t need any collection to save the memory used.
    You should never concatenate two CLOB using || to avoid creation/deletion of temporary CLOB. Use DBMS_LOB.APPEND instead.
    One big advantage of ORA_HASH is that it returns an integer. Why did you decide to return a CLOB? To change this you could take advantage of the third argument of ORA_HASH that will combine already calculated hash value with the next 4k chunk.
    You can also talk about the faster STANDARD_HASH() function.

  3. As my DB admin had not given me permission to use the dbms_crpyto package I tried your ORA_HASH_CLOB solution. But I encounter two issues.

    1. If the CLOB contains special symbols as then the substring was longer than VARCHAR(4000 Byte).
    -> Changing l_line to NVARCHAR2(4000) fixed that issue.
    2. In my Oracle version the functions for substr and length had issue with the CLOBs as well so I changed them to use the dbms_lob package functions.

    create or replace FUNCTION ora_hash_clob
    (
    p_clob IN CLOB
    )
    RETURN CLOB
    IS
    TYPE t_ora_hash_tab IS TABLE OF NUMBER INDEX BY binary_integer;
    l_ora_hash_tab t_ora_hash_tab;
    l_line NVARCHAR2(4000);
    l_ora_hash_key NUMBER;
    l_ora_hash_clob_key CLOB;
    BEGIN
    FOR i in 1 .. ceil(dbms_lob.getlength(p_clob)/4000)
    LOOP
    l_line := to_char(dbms_lob.substr(p_clob, 4000, (i-1)*4000+1));

    SELECT ora_hash(l_line)
    INTO l_ora_hash_key
    FROM dual;

    l_ora_hash_tab(i) := l_ora_hash_key;
    END LOOP;

    FOR i IN 1..l_ora_hash_tab.count
    LOOP
    l_ora_hash_clob_key := l_ora_hash_clob_key ||
    to_clob(l_ora_hash_tab(i));
    END LOOP;

    RETURN l_ora_hash_clob_key;
    END;

Leave a Reply to Jim H Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.