Deterministic Long to UUID Converter (Java)

The scenario

You have been tasked to migrate the primary key of an entity from Long to UUID. Using UUIDs instead of numeric values for the identifier of an entity adds an extra layer of security to an application, enables the generation of IDs in the code-layer instead of at the database level, and makes it easier for SQL and NoSQL storage systems to interface.

It has been determined that, due to the vast number of foreign keys pointing to the entity’s primary key, a deterministic function is required so that the new UUID columns can be validated against the deprecated foreign key columns prior to dropping them.

(Obviously, if there are only a couple of foreign keys to consider, and it is deemed acceptable to perform the update in one transaction, you might simply iterate each entity and use a random UUID, updating all foreign keys at the same time, but without any system of post-transaction validation for additional checks and balances.)

The requirements

  • For any Long input, it should produce the same UUID output.
  • An optional “seed” should be available, so that an environment-dependent secret can be used, preventing hackers from guessing the UUIDs that correspond to low-order Longs.
  • We will be using the standard java.lang.Long and java.util.UUID.
  • Our project will use Maven.

Implementation

To implement our converter, we are going to start by targeting the following java.util.UUID method:

public static UUID nameUUIDFromBytes(byte[] name)

This method converts a byte array into a UUID.

We will need a way to convert the Long input into a byte array. For this there are many options, but probably the simplest approach is to use a pre-build library utility. One popular choice is Google Guava’s Longs.toByteArray.

We should add this to our pom.xml:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>33.0.0-jre</version>
</dependency>

Another dependency that we will be using is JUnit 5, to test our converter. We should add this to our pom.xml as well:

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter</artifactId>
    <version>5.8.2</version>
    <scope>test</scope>
</dependency>

Basic Long to UUID converter

We now have all that we need to implement our solution. Let’s create the converter:

package com.dharmacode.duc;

import com.google.common.primitives.Longs;

import java.util.UUID;

public class DeterministicUUIDConverter {

    public UUID convert(Long id) {
        byte[] idBytes = Longs.toByteArray(id);
        return UUID.nameUUIDFromBytes(idBytes);
    }

}

The method Longs.toByteArray returns a big-endian representation of the 64-bit long in an 8-element byte array.

Let’s create a test to check that it is deterministic (i.e. produces the same UUID each time for any Long):

package com.dharmacode.duc;

import org.junit.jupiter.api.Test;

import java.util.UUID;

import static org.junit.jupiter.api.Assertions.assertEquals;

class DeterministicUUIDConverterTest {

    private static final Long TEST_ID = 1L;

    @Test
    public void testIsDeterministicWithoutSeed() {
        DeterministicUUIDConverter converter = new DeterministicUUIDConverter ();
        UUID uuid1 = converter.convert(TEST_ID);
        UUID uuid2 = converter.convert(TEST_ID);
        assertEquals(uuid1, uuid2);
        System.out.println(uuid1);
    }

}

In the console, we can see the string-representation of the UUID:

fa5ad9a8-557e-3a84-8f23-e52d3d3adf77

This looks quite random, which is ideal for a primary key, just what we are looking for. How does the function create this randomness from the input of 1? Here is the code from the UUID class:

public static UUID nameUUIDFromBytes(byte[] name) {
    MessageDigest md;
    try {
        md = MessageDigest.getInstance("MD5");
    } catch (NoSuchAlgorithmException nsae) {
        throw new InternalError("MD5 not supported", nsae);
    }
    byte[] md5Bytes = md.digest(name);
    md5Bytes[6]  &= 0x0f;  /* clear version        */
    md5Bytes[6]  |= 0x30;  /* set to version 3     */
    md5Bytes[8]  &= 0x3f;  /* clear variant        */
    md5Bytes[8]  |= 0x80;  /* set to IETF variant  */
    return new UUID(md5Bytes);
}

As we can see, it uses the MD5 hash algorithm to hash the bytes. MD5 is commonly used for checksums and generates a 128 bit hash. This is contained in an array of 16 bytes (md5Bytes). The MD5 hash is suitable for UUID creation because a UUID internally consists of two 64-bit longs, as we can see in the private constructor:

private UUID(byte[] data) {
	long msb = 0;
	long lsb = 0;
	assert data.length == 16 : "data must be 16 bytes in length";
	for (int i=0; i<8; i++)
		msb = (msb << 8) | (data[i] & 0xff);
	for (int i=8; i<16; i++)
		lsb = (lsb << 8) | (data[i] & 0xff);
	this.mostSigBits = msb;
	this.leastSigBits = lsb;
}

Adding the seed

Although the conversion of the input value 1 looks random enough, it would be easy for someone with a little knowledge of the underlying system to guess the ID, because the value 1 will always produce the same MD5 hash, and therefore the same UUID. If we don’t add any additional randomisation, our migration will not have as much security benefit as it could have.

To address this issue, we will introduce a “seed”. The seed will be a secret string that is given to the converter at the time we perform the conversions. Of course, we would always need to use the same seed if we wanted to validate the conversion, or to reproduce it. The seed could be configured in a properties file that is read when the primary key migration is performed.

We will hash the seed using the MD5 hash algorithm, and then XOR the long bytes with the seed hash. Please note that the MD5 hash algorithm is unsuitable for security-related hashing, such as hashing passwords stored in a database. We are using it in this case because we are looking for a fast hash for our database migration that is just intended for “highly obfuscated” long-to-UUID conversion to prevent hackers from guessing IDs. Using a random seed that is kept secret will be sufficient for our purposes.

Here is our updated class:

package com.dharmacode.duc;

import com.google.common.primitives.Longs;

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.UUID;

public class DeterministicUUIDConverter {

    private final byte[] seedHash;

    public DeterministicUUIDConverter() {
        seedHash = null;
    }

    public DeterministicUUIDConverter(String seed) {
        MessageDigest md;
        try {
            md = MessageDigest.getInstance("MD5");
        } catch (NoSuchAlgorithmException nsae) {
            throw new InternalError("MD5 not supported", nsae);
        }
        seedHash = md.digest(seed.getBytes());
    }

    public UUID convert(Long id) {

        byte[] idBytes = Longs.toByteArray(id);

        if (seedHash != null) {
            // XOR each byte with the seed hash
            for (int i = 0; i < 8; i++) {
                idBytes[i] ^= seedHash[i];
            }
            for (int i = 8; i < 16; i++) {
                idBytes[i - 8] ^= seedHash[i];
            }
        }

        return UUID.nameUUIDFromBytes(idBytes);
    }

}

We have added a couple of constructors, one with no arguments to support the “no-seed” configuration, and a second with a String argument for the seed. The seed constructor hashes the seed into its bytes array representation.

The convert(Long id) function then behaves differently according to whether a seed is present. Each byte of the long bytes is XOR’ed with a corresponding byte in the seed hash. We perform two passes, to make use of the full entropy of the MD5 seed hash bytes.

Finally, we add a second test:

class DeterministicUUIDConverterTest {

    private static final Long TEST_ID = 1L;
    private static final String TEST_SEED = "lXpiPA3sZ3HMTo4o";

    // ...

    @Test
    public void testIsDeterministicWithSeed() {
        DeterministicUUIDConverter converter = new DeterministicUUIDConverter(TEST_SEED);
        UUID uuid1 = converter.convert(TEST_ID);
        UUID uuid2 = converter.convert(TEST_ID);
        assertEquals(uuid1, uuid2);
        System.out.println(uuid1);
    }

}

When we run this new test, in the console we can see the string-representation of the seeded UUID:

46fb98b7-320d-35ab-a4c4-86c446025cf3

Conclusion

In this article, we created a deterministic long-to-UUID converter in Java. We can use this to migrate Long ID primary keys to UUID primary keys in a database, for example. An attacker who knows that the IDs were created using this method would be able to quickly guess the UUIDs corresponding to the original Long IDs. To address this concern, we added a seed to make it very difficult for attackers to guess the IDs.

Leave a Reply

Your email address will not be published. Required fields are marked *