fix(sl): parse user filter expressions as predicates, not projections (#307)

* fix(sl): parse user filter expressions as predicates, not projections

User-authored filters and segments were parsed in a projection context
(`SELECT {expr}`). On T-SQL a top-level `col = 'value'` projection is the
`alias = expression` aliasing syntax, so an equality filter parsed this way
became `'value' AS col` — dropping the comparison entirely and silently
skipping computed-column expansion (the column hid behind the alias).

Parse user fragments as predicates (`SELECT * WHERE {expr}`) at every parse
site — the parser cache, measure-filter CASE WHEN generation, computed-column
expansion, and measure-filter/segment column qualification. For plain
non-condition expressions the column set is identical, so this is a no-op
everywhere except the T-SQL alias case it fixes.

Add cross-dialect regression tests (tsql, postgres, snowflake, bigquery)
locking equality filters/segments to comparison shape and confirming `= 'x'`
now matches `IN ('x')` on T-SQL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Shorten T-SQL predicate comments

* docs(sl): tighten T-SQL predicate docstrings and AGENTS docstring rule

Trim the parser and regression-test docstrings to the 1-3 line bar and
extend the AGENTS.md comment guidance to cover docstrings explicitly.

* refactor(sl): route all filter parsing through parse_predicate

Consolidate the predicate-context parse into a single parse_predicate
helper and route every filter-parsing call site through it: measure
CASE-WHEN filters, segments, computed-column-in-filter, the
aggregate-locality HAVING rewrite, and the planner OR-mixing /
top-level-AND split. The locality and split paths still parsed user
filters in projection context, so a named-measure equality filter
compiled to `0 AS measure` on T-SQL. Add a locality regression test
covering the HAVING rewrite path.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
This commit is contained in:
Luca Martial 2026-06-19 01:47:44 -07:00 committed by GitHub
parent 4dae8c34dd
commit fb50c11d16
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 238 additions and 63 deletions

View file

@ -22,7 +22,11 @@ from semantic_layer.models import (
SemanticQuery,
SourceDefinition,
)
from semantic_layer.parser import ExpressionParser, quote_reserved_identifiers
from semantic_layer.parser import (
ExpressionParser,
parse_predicate,
quote_reserved_identifiers,
)
# DIALECT CONVENTION:
# User-authored measure `expr`, `filter`, and computed-column fragments must
@ -910,9 +914,7 @@ class QueryPlanner:
for c in source.columns:
col_to_source[c.name] = source_name
tree = sqlglot.parse_one(
f"SELECT {quote_reserved_identifiers(expr)}", read=self.dialect
)
condition = parse_predicate(expr, self.dialect)
def _qualify_column(node):
if (
@ -926,8 +928,8 @@ class QueryPlanner:
)
return node
transformed = tree.transform(_qualify_column)
return transformed.expressions[0].sql(dialect=self.dialect)
transformed = condition.transform(_qualify_column)
return transformed.sql(dialect=self.dialect)
def _detect_fan_out(
self,
@ -1254,14 +1256,7 @@ class QueryPlanner:
) -> None:
"""Raise an error if an OR expression mixes WHERE and HAVING conditions."""
try:
tree = sqlglot.parse_one(
f"SELECT * WHERE {quote_reserved_identifiers(clause)}",
dialect=self.dialect,
)
where = tree.find(exp.Where)
if not where:
return
inner = where.this
inner = parse_predicate(clause, self.dialect)
# Only check if the top level contains OR
or_parts: list[str] = []
@ -1295,14 +1290,7 @@ class QueryPlanner:
def _split_top_level_and(self, expr: str) -> list[str]:
"""Split a filter expression on top-level AND (not inside parentheses or strings)."""
try:
tree = sqlglot.parse_one(
f"SELECT * WHERE {quote_reserved_identifiers(expr)}",
dialect=self.dialect,
)
where = tree.find(exp.Where)
if not where:
return [expr]
inner = where.this
inner = parse_predicate(expr, self.dialect)
parts: list[str] = []
def _collect_and(node):