tsunami

log in
history
points to
BetterSQL ~ SQL Issues SQL Issues
add add

BetterSQL

Luke Breuer
2010-06-28 20:25 UTC

Overview
Purpose
To create a better SQL language that allows for metaprogramming-type tasks and removes some of the cruft required by Microsoft's TSQL.
Notes
  • This page assumes knowledge of SQL2005. In particular, Common Table Expressions (CTEs) and partition functions (like row_number()).
  • Suggestions further down on this page assume suggestions above have been implemented. For example, after I suggest == for comparison instead of =, I assume that is the case for later suggestions.
  • Issues that would be very hard to translate to TSQL have mostly been omitted from discussion.
  • Some of the issues mentioned in SQL Issues could be addressed in addition to the below items.
Details
Cursors
The = TABLE_NAME is only required if one wishes to name the variable differently than the column (case could probably be insensitive in most cases).
foreach @table_name[ = TABLE_NAME]
from    INFORMATION_SCHEMA.COLUMNS
where   ...
begin
    -- actual code goes here
end
Variable initialization (2008)
declare @table_name = 'My_Table'
Subqueries
Disallow returning a column from a subquery that exists only outside the subquery:
with A as (
    select  v = 1
),   B as (
    select  w = 2
)
select  count = count(*)
from    A
where   v in (select v from B)
(This returns a count of 1, which is almost surely unintended. The v in the subquery should be a w; if it is changed, a count of 0 will be returned.)
Row limiting
Allow this:
select  value
from    table
order by id
rows 10 to 19
Instead of requiring this:
select  value
from (
    select  value,
            row_number = row_number() over (order by id),
            id
    from    table
) t
where  row_number between 10 and 19
order by id
Windowing functions outside of the select clause
Let's say I want to show the top 10 [distinct] most recently hit pages on my website, displaying each page once, along with last access time and ip_address. I want to write this:
select  top 10
        item_fk,
        last_access,
        ip_address
from    History
where   row_number() over (partition by item_fk order by last_access desc) = 1
order by last_access desc
However, currently, I must write this:
select  top 10
        item_fk,
        last_access,
        ip_address
from (
    select  row_number = row_number() over (partition by item_fk order by last_access desc)
            item_fk,
            last_access,
            ip_address
    from    History
) t
where  row_number = 1
order by last_access desc
Implicit group by
group by is/should be implied when there are aggregated and non-aggregated columns selected in the same query. Perhaps the most common group by query type involves the non-aggregated columns in the select statement being exactly the columns in the group by statement. In other words, there is duplication. Instead of screaming when aggregated columns and non-aggregated columns are both selected without a group by, assume that lack of a group by indicates the user wants to group by all the non-aggregated columns that are selected.
Allow expressions in input lists to sprocs, raiserror
The following is invalid, which is ridiculous:
exec my_sproc @some_variable + 1
raiserror('prefix: ' + error_message(), 16, -1)
{} allowed to replace begin/end
Both should be allowed.
Column lists
  • allow inserting into a table by aliasing the columns of the select statement to match the columns of the table being inserted into, and removing the requirement for a column list
  • allow select * ~ id to select all columns but id
  • allow select col | col.DataType.IsDateTime to be used to grab all datetime/smalldatetime column values
  • allow insert syntax that allows for column-value pairs, just like update syntax
Joins
  • remove the need for join conditions if foreign keys are in place and the join is unambiguous
  • allow use of dot-notation instead of joins when the join would be unambiguous
== instead of =
This allows the elimination of set.
Booleans/bits
Allow boolean expressions and bits to be interchangeable. Instead of requiring
set @bit_val = case when @string like '%test%' then 1 else 0 end
allow
set @bit_val = @string like '%test%'
Regular Expressions
Perhaps make use of ~=, a la Perl.
  • RegexIsMatch
  • RegexMatchValue
  • RegexMatch (returns the table (group_num, group_name, index, value))
  • RegexSplit (returns the table (value, idx))
Strings
Make them 0-based!
Extension methods (like C#)
Allow @date_variable.FormatDateTime('arg'), which would invoke, say, a SQLCLR UDF that accepts (@dt datetime, @format varchar(32)). Allow this for built-in SQL functions (sometimes reordering arguments), like charindex, replace, datepart, etc.
Date/Time operations
@date_time.WithoutDays() truncates days and everything more granular (resulting in year & month)
Implicit grouping
If a statement has columns selected and aggregates selected, implicitly group on the non-aggregates.
Better treatment of null
  • introduce the operator =?, which would expand to
    (a == b or a is null and b is null)
  • introduce an equality operator that errors out if either side is null, or, introduce an equality operator that does not fail when either side is null (but returns unknown if either side is null) and change == to error out if either side is null
Ternary operator
a = b == c ? d : e
Trailing comma ,
create table allows a trailing comma after the last column definition; the same should be allowed for all other lists (select lists, in/not in predicates, among others)
Allow column aliases to be referenced
Currently, the only place a column alias can be referenced is in the order by clause. The reasoning is that a select-list column value is not known until that time. However, this is ridiculous — the query optimizer could clearly just duplicate the definition. To make analysis easy, it should be possible to reference column aliases anywhere after the alias definition (this prevents cyclic references).
Operators
  • bit shift operators << and >>
  • logical xor
Easier duplicate analysis
If I have a table with columns (a, b) and I want to find all rows where a shows up multiple times, I have to do something like the following:
;with A as (
    select a = 1, b = 1 union all
    select 1, 2 union all
    select 2, 1 union all
    select 3, 1 union all
    select 3, 1
)
select  a, b
from (
    select  *, count = count(*) over (partition by a)
    from    A
) t
where count > 1
order by a, b

-- OR

;with A as (
    select a = 1, b = 1 union all
    select 1, 2 union all
    select 2, 1 union all
    select 3, 1 union all
    select 3, 1
)
select  a, b
from (
    select  multi_a = a
    from    A
    group by a
    having count(*) > 1
) g
inner join A on a = multi_a
order by a, b
What I want is something like this:
;with A as (
    select a = 1, b = 1 union all
    select 1, 2 union all
    select 2, 1 union all
    select 3, 1 union all
    select 3, 1
)
select  a, b
from    A
order by a
duplicates a
(The duplicates clause would accept multiple columns, just like order by and group by.)
Scoped views
Common Table Expressions (CTEs) allow one to give a projection (select statement) a name; all the uses of with above are CTEs. What is missing is a way to make a CTE batch/procedure/function-scoped. An excellent example is when one wishes to use a CTE in multiple statements: currently this is not possible, requiring creating an actual view, creating a temporary table/table variable, or just duplicating the statement.
Scoped UDFs
See above section.
Simple auto-generated aliases
If an explicit alias is not specified:
  • max(col_name) -> max_col_name
  • row_number() over (...) -> row_number
  • isnull(column_name, 0) -> column_name
  • etc.
except ignore
Sometimes it is desirable to select n columns from one table, as long as, for any row, n - 1 of those values is not found in some other table. In code:
select  a,
        b,
        c
from    T
except ignore
select  a,
        c
from    S
The ignore is not required, but it does draw attention to the fact that the second select list is a strict subset of the first select list.
Character escapes
\t => char(9)
\r => char(13)
\n => char(10)
identity_insert
Instead of requiring the user to do this:
set identity_insert my_table on
insert my_table (id, name) values (1, 'muahaha')
set identity_insert my_table off
allow
insert with identity my_table (id, name) values (1, 'muahaha')
drop if exists
This gets old, very quickly:
if object_id('My_Table', 'U') is not null
    drop table My_Table
Why not allow this?
drop table My_Table if exists
Now, that syntax might cause issues, but a similar syntax would also be fine.
create or alter
Continuing with the above theme, it would be nice to have a statement that would primarily alter a DB object, but create it if it does not yet exist. An oft-used alternative is to delete the object if it exists, but that means that if the create statement contains invalid syntax, one will end up with the object dropped and not recreated — ugly!
zero_or_one(...) aggregation
There are multiple places where one desires to require that no more than one value shows up.
  1. there are situations with the pivot operator where one doesn't actually want to do any aggregation, but the syntax forces it
  2. one cannot easily convert a subquery to a left join without this; a subquery will error out if more than one result is returned, while a left join would just increase the number of results (which can be undesirable)
user aggregations that accept more than one parameter
JoinString(column, delimiter) would be nice.
subquery -> join
SQL Server often generates poor execution plans for subqueries where a left join would be more efficient. Why is the proper transformation not done more often? As a note, zero_or_one(...) will often be needed unless top is specified, in which case one can use row_number().
Hard to do with TSQL, but desirable
@db variables
This is invalid syntax, which can get annoying when I want to run a script against multiple DBs:
declare @db sysname = 'Northwind'

-- ...

select  *
from    @db..Person