Skip to content

Vector search that leaks across tenants

Problem

You ship find_similar_tickets(ticket_id) so agents triaging a new ticket see similar past tickets. It runs as SECURITY DEFINER because it touches message_embeddings. Months later, an agent in org A pulls up a ticket and the “similar tickets” panel shows a customer-support ticket from org B that happens to embed close in vector space.

The code

CREATE FUNCTION find_similar_tickets(p_ticket_id UUID, p_k INT DEFAULT 5)
RETURNS TABLE (...) SECURITY DEFINER AS $$
SELECT m.ticket_id, (me.embedding <-> ...) AS distance
FROM message_embeddings me
JOIN messages m ON m.id = me.message_id
WHERE m.ticket_id <> p_ticket_id
ORDER BY distance ASC LIMIT p_k;
$$;

Why review misses it

Two failure modes compound here. First, SECURITY DEFINER bypasses caller RLS — so RLS on tickets doesn’t save you. Second, the function reads as a “find the nearest neighbors” query, and reviewers don’t typically read those queries asking “do they cross a security boundary?” The org filter belongs in the function body, not in the policy layer.

The example test that passes

def test_returns_some_neighbors(db, org_with_tickets):
rows = db.query("SELECT * FROM find_similar_tickets(%s)", org_with_tickets["tickets"][0]["id"])
assert len(rows) > 0

Seeds one org. No cross-tenant leak possible.

The SqlProof property

@given(data=st.data())
def test_similar_tickets_are_all_in_the_input_org(supabase_proof, data):
dataset = data.draw(supabase_proof.dataset_strategy(
sizes={"organizations": 2, "customers": 2, "org_members": 2,
"tickets": 4, "messages": 4, "message_embeddings": 2},
columns={"message_embeddings.embedding": vector_strategy(384)},
))
# ... run as a member of the input ticket's org; verify returned ticket_ids are all from that org
assert cross_tenant == []

Two orgs is the minimum that makes the bug visible. Hypothesis generates them.

The vector_strategy(384) helper (in examples/inbox/tests/_helpers.py) is a workaround for sqlproof issue #69 — once the schema parser handles vector(N) natively, the columns= override can be dropped.

The counterexample

Illustrative — Hypothesis would print the actual draw and assertion:

Property failed: vector search leaked across tenants
AssertionError: vector search leaked across tenants: [{'id': UUID('...'), 'org_id': UUID('...')}]
Returned ticket from org B while input ticket was in org A

The fix

Resolve the input ticket’s org_id in a CTE and filter the search to it:

WITH input AS (
SELECT t.org_id, me.embedding FROM tickets t ...
WHERE t.id = p_ticket_id LIMIT 1
)
... WHERE t.org_id = (SELECT org_id FROM input)

For the general “SECURITY DEFINER bypasses RLS” pattern, see the RLS bug-classes reference.