Data Types Reference
Overview
Complete reference for all data types, internal structures, and type system in NeuronDB.
PostgreSQL Compatibility: 16, 17, 18
Vector Types
vector
PostgreSQL Type: vector
C Structure: Vector
Storage: Extended (varlena)
Base Type: Float32 (4 bytes per dimension)
The main vector type in NeuronDB. It uses float32 precision. This is the primary type for storing embeddings and performing vector operations.
Limits:
- Maximum Dimensions: 16,000
- Minimum Dimensions: 1
- Storage Overhead: 8 bytes (header + dimension)
Example usage
-- Create a vector
SELECT '[1.0, 2.0, 3.0]'::vector;
-- Create with dimension constraint
CREATE TABLE embeddings (
id SERIAL PRIMARY KEY,
embedding vector(384) -- Fixed 384 dimensions
);
-- Insert vector
INSERT INTO embeddings (embedding) VALUES ('[0.1, 0.2, 0.3]'::vector);halfvec
PostgreSQL Type: halfvec
C Structure: VectorF16
Base Type: Float16 (2 bytes per dimension)
Half-precision vector type providing 2x compression. Uses IEEE 754 half-precision floating point format.
Limits:
- Maximum Dimensions: 4,000
- Compression Ratio: 2x (compared to vector)
- Precision: ~3 decimal digits
Example usage
-- Convert vector to halfvec
SELECT vector_to_halfvec('[1.0, 2.0, 3.0]'::vector);
-- Cast between types
SELECT '[1.0, 2.0, 3.0]'::vector::halfvec;
-- Create table with halfvec
CREATE TABLE embeddings_fp16 (
id SERIAL PRIMARY KEY,
embedding halfvec(384)
);sparsevec
PostgreSQL Type: sparsevec
C Structure: SparseVector
Base Type: Sparse representation
Sparse vector type storing only non-zero values. Optimized for high-dimensional vectors with many zeros.
Limits:
- Maximum Non-Zero Entries: 1,000
- Maximum Dimensions: 1,000,000
- Model Types: BM25 (0), SPLADE (1), ColBERTv2 (2)
binaryvec
PostgreSQL Type: binaryvec
Base Type: Binary (1 bit per dimension)
Binary vector type for 32x compression using Hamming distance.
Features:
- Compression Ratio: 32x (compared to vector)
- Distance Metric: Hamming distance only
Internal C Structures
Vector Structure
typedef struct Vector {
int32 vl_len_; /* varlena header (required) */
int16 dim; /* number of dimensions */
int16 unused; /* padding for alignment */
float4 data[FLEXIBLE_ARRAY_MEMBER]; /* vector data */
} Vector;Memory Layout
| Offset | Size | Field |
|---|---|---|
| 0 | 4 | vl_len_ (varlena header) |
| 4 | 2 | dim (dimension count) |
| 6 | 2 | unused (padding) |
| 8 | 4*dim | data[] (float32 array) |
Total Size: offsetof(Vector, data) + sizeof(float4) * dim
Type Storage Formats
Storage Size Calculation
| Type | Bytes per Dimension | Overhead |
|---|---|---|
vector | 4 | 8 bytes |
halfvec | 2 | 8 bytes |
sparsevec | Variable | 16 bytes |
binaryvec | 0.125 (1 bit) | 8 bytes |
Type Casting Rules
Implicit Casts
vector→halfvec(implicit)vector→sparsevec(explicit only)
Explicit Casts
Type casting examples
-- Vector to halfvec
SELECT '[1.0, 2.0, 3.0]'::vector::halfvec;
-- Vector to sparsevec
SELECT vector_to_sparsevec('[0, 0, 1.5, 0]'::vector);
-- Vector to binary
SELECT vector_to_binary('[1.0, -1.0, 0.5]'::vector);Memory Layout
In-Memory Representation
- Vectors stored as contiguous float32 arrays
- Aligned to 8-byte boundaries for SIMD operations
- GPU transfers use same layout (zero-copy when possible)
TOAST Behavior
PostgreSQL automatically uses TOAST for large values:
- Inline storage: Vectors < 2KB (512 dimensions)
- Extended storage: Vectors ≥ 2KB (512+ dimensions)
- Compression: Enabled by default for extended storage
Quantization Formats
Quantization Types
- Scalar Quantization: int8, uint8 (4x compression)
- Product Quantization (PQ): 8x-16x compression
- Binary Quantization: 32x compression (Hamming distance)
- Ternary Quantization: 16x compression
When to Use Quantization
- Large datasets where storage is a concern
- Read-heavy workloads
- Acceptable precision loss for speed/storage trade-offs