encode_sort_key
Creates an order-preserving composite binary key from multiple heterogeneous input columns. This function is essential for creating efficient sort keys that maintain the original sort order when compared lexicographically.
Syntaxβ
encode_sort_key(column1, column2, ..., columnN)
Parametersβ
column1, column2, ..., columnN: One or more columns of any supported data type. The function accepts a variable number of arguments.
Return valueβ
Returns a value of the VARBINARY type representing the composite sort key.
Supported data typesβ
The following data types are supported:
| Data Type | Description |
|---|---|
TINYINT | 8-bit signed integer |
SMALLINT | 16-bit signed integer |
INT | 32-bit signed integer |
BIGINT | 64-bit signed integer |
LARGEINT | 128-bit signed integer |
FLOAT | 32-bit floating point |
DOUBLE | 64-bit floating point |
VARCHAR | Variable-length string |
CHAR | Fixed-length string |
DATE | Date value |
DATETIME | Date and time value |
TIMESTAMP | Timestamp value |
The following complex types are not supported and will return an error:
JSONARRAYMAPSTRUCTHLLBITMAPPERCENTILE
Encoding strategyβ
The function uses different encoding strategies for different data types to ensure that lexicographic comparison of the resulting binary keys preserves the original sort order:
- Integer types: Big-endian byte order with sign bit flipped for signed types
- Floating-point types: Custom encoding to ensure correct sort order
- String types: Special encoding with 0x00 byte escaping and 0x00 0x00 terminator for non-last fields
- Date/Time types: Internal integer representation encoded as integral value
NULL handlingβ
- Each column gets a NULL marker (0x00 for NULL, 0x01 for NOT NULL) for every row
- NULL markers are added even for non-nullable columns to ensure consistent encoding
- A separator byte (0x00) is used between columns (except after the last column)
Examplesβ
Generated sort key columnβ
CREATE TABLE user_analytics (
user_id INT,
region VARCHAR(50),
score DOUBLE,
created_date DATE,
sort_key VARBINARY(1024) AS (
encode_sort_key(region, score, created_date)
)
) ORDER BY (sort_key);
JSON field extractionβ
CREATE TABLE json_data (
id INT,
json_content JSON,
sort_key VARBINARY(1024) AS (
encode_sort_key(
get_json_int(json_content, '$.priority'),
get_json_string(json_content, '$.category'),
get_json_double(json_content, '$.score')
)
)
) ORDER BY (sort_key);
Limitationsβ
Data type restrictionsβ
Complex types like JSON, ARRAY, MAP, and STRUCT cannot be directly encoded. Use JSON extraction functions to get primitive values:
-- Instead of: encode_sort_key(json_col)
-- Use: encode_sort_key(get_json_int(json_col, '$.field1'), get_json_string(json_col, '$.field2'))
Performance considerationsβ
- Each call to
encode_sort_keyrequires encoding all input columns - Binary keys can be significantly larger than original data
- Use generated columns to avoid repeated encoding
Key size limitationsβ
- Keep the number of columns reasonable (typically < 10)
- Prefer shorter string columns when possible
- Consider using hash functions for very long strings