Overview
Purpose
To create a better SQL language that allows for metaprogramming-type tasks and removes some of the cruft required by Microsoft's TSQL.
Notes
- This page assumes knowledge of SQL2005. In particular, Common Table Expressions (CTEs) and partition functions (like
row_number()).
- Suggestions further down on this page assume suggestions above have been implemented. For example, after I suggest
== for comparison instead of =, I assume that is the case for later suggestions.
- Issues that would be very hard to translate to TSQL have mostly been omitted from discussion.
- Some of the issues mentioned in SQL Issues could be addressed in addition to the below items.
Details
Cursors
The
= TABLE_NAME is only required if one wishes to name the variable differently than the column (case could probably be insensitive in most cases).
foreach @table_name[ = TABLE_NAME]
from INFORMATION_SCHEMA.COLUMNS
where ...
begin
-- actual code goes here
end
Variable initialization (2008)
declare @table_name = 'My_Table'
Subqueries
Disallow returning a column from a subquery that exists only outside the subquery:
with A as (
select v = 1
), B as (
select w = 2
)
select count = count(*)
from A
where v in (select v from B)
(This returns a count of
1, which is almost surely unintended. The
v in the subquery should be a
w; if it is changed, a count of
0 will be returned.)
Row limiting
Allow this:
select value
from table
order by id
rows 10 to 19
Instead of requiring this:
select value
from (
select value,
row_number = row_number() over (order by id),
id
from table
) t
where row_number between 10 and 19
order by id
Windowing functions outside of the select clause
Let's say I want to show the top 10 [distinct] most recently hit pages on my website, displaying each page once, along with last access time and ip_address. I want to write this:
select top 10
item_fk,
last_access,
ip_address
from History
where row_number() over (partition by item_fk order by last_access desc) = 1
order by last_access desc
However, currently, I must write this:
select top 10
item_fk,
last_access,
ip_address
from (
select row_number = row_number() over (partition by item_fk order by last_access desc)
item_fk,
last_access,
ip_address
from History
) t
where row_number = 1
order by last_access desc
Implicit group by
group by is/should be implied when there are aggregated and non-aggregated columns selected in the same query. Perhaps the most common
group by query type involves the non-aggregated columns in the
select statement being
exactly the columns in the group by statement. In other words, there is duplication. Instead of screaming when aggregated columns and non-aggregated columns are both selected without a
group by, assume that lack of a
group by indicates the user wants to
group by all the non-aggregated columns that are selected.
Allow expressions in input lists to sprocs, raiserror
The following is invalid, which is ridiculous:
exec my_sproc @some_variable + 1
raiserror('prefix: ' + error_message(), 16, -1)
{} allowed to replace begin/end
Both should be allowed.
Column lists
- allow inserting into a table by aliasing the columns of the select statement to match the columns of the table being inserted into, and removing the requirement for a column list
- allow
select * ~ id to select all columns but id
- allow
select col | col.DataType.IsDateTime to be used to grab all datetime/smalldatetime column values
- allow
insert syntax that allows for column-value pairs, just like update syntax
Joins
- remove the need for join conditions if foreign keys are in place and the join is unambiguous
- allow use of dot-notation instead of joins when the join would be unambiguous
== instead of =
This allows the elimination of
set.
Booleans/bits
Allow
boolean expressions and bits to be interchangeable. Instead of requiring
set @bit_val = case when @string like '%test%' then 1 else 0 end
allow
set @bit_val = @string like '%test%'
Regular Expressions
Perhaps make use of
~=, a la Perl.
- RegexIsMatch
- RegexMatchValue
- RegexMatch (returns the table
(group_num, group_name, index, value))
- RegexSplit (returns the table
(value, idx))
Strings
Make them 0-based!
Extension methods (like C#)
Allow
@date_variable.FormatDateTime('arg'), which would invoke, say, a SQLCLR UDF that accepts
(@dt datetime, @format varchar(32)). Allow this for built-in SQL functions (sometimes reordering arguments), like
charindex,
replace,
datepart, etc.
Date/Time operations
@date_time.WithoutDays() truncates days and everything more granular (resulting in year & month)
Implicit grouping
If a statement has columns selected and aggregates selected, implicitly group on the non-aggregates.
Better treatment of null
Ternary operator
a = b == c ? d : e
Trailing comma ,
create table allows a trailing comma after the last column definition; the same should be allowed for all other lists (
select lists,
in/
not in predicates, among others)
Allow column aliases to be referenced
Currently, the only place a column alias can be referenced is in the
order by clause. The reasoning is that a
select-list column value is not known until that time. However, this is ridiculous — the query optimizer could clearly just duplicate the definition. To make analysis easy, it should be possible to reference column aliases anywhere after the alias definition (this prevents cyclic references).
Operators
- bit shift operators
<< and >>
- logical xor
Easier duplicate analysis
If I have a table with columns
(a, b) and I want to find all rows where
a shows up multiple times, I have to do something like the following:
;with A as (
select a = 1, b = 1 union all
select 1, 2 union all
select 2, 1 union all
select 3, 1 union all
select 3, 1
)
select a, b
from (
select *, count = count(*) over (partition by a)
from A
) t
where count > 1
order by a, b
-- OR
;with A as (
select a = 1, b = 1 union all
select 1, 2 union all
select 2, 1 union all
select 3, 1 union all
select 3, 1
)
select a, b
from (
select multi_a = a
from A
group by a
having count(*) > 1
) g
inner join A on a = multi_a
order by a, b
What I want is something like this:
;with A as (
select a = 1, b = 1 union all
select 1, 2 union all
select 2, 1 union all
select 3, 1 union all
select 3, 1
)
select a, b
from A
order by a
duplicates a
(The
duplicates clause would accept multiple columns, just like
order by and
group by.)
Scoped views
Common Table Expressions (CTEs) allow one to give a projection (
select statement) a name; all the uses of
with above are CTEs. What is missing is a way to make a CTE batch/procedure/function-scoped. An excellent example is when one wishes to use a CTE in multiple statements: currently this is not possible, requiring creating an actual view, creating a temporary table/table variable, or just duplicating the statement.
Scoped UDFs
See above section.
Simple auto-generated aliases
If an explicit alias is not specified:
max(col_name) -> max_col_name
row_number() over (...) -> row_number
isnull(column_name, 0) -> column_name
- etc.
except ignore
Sometimes it is desirable to select
n columns from one table, as long as, for any row,
n - 1 of those values is not found in some other table. In code:
select a,
b,
c
from T
except ignore
select a,
c
from S
The
ignore is not required, but it does draw attention to the fact that the second
select list is a strict subset of the first
select list.
Character escapes
\t => char(9)
\r => char(13)
\n => char(10)
identity_insert
Instead of requiring the user to do this:
set identity_insert my_table on
insert my_table (id, name) values (1, 'muahaha')
set identity_insert my_table off
allow
insert with identity my_table (id, name) values (1, 'muahaha')
drop if exists
This gets old, very quickly:
if object_id('My_Table', 'U') is not null
drop table My_Table
Why not allow this?
drop table My_Table if exists
Now, that syntax might cause issues, but a similar syntax would also be fine.
create or alter
Continuing with the above theme, it would be nice to have a statement that would primarily
alter a DB object, but
create it if it does not yet exist. An oft-used alternative is to
delete the object if it exists, but that means that if the
create statement contains invalid syntax, one will end up with the object dropped and not recreated — ugly!
zero_or_one(...) aggregation
There are multiple places where one desires to require that no more than one value shows up.
- there are situations with the
pivot operator where one doesn't actually want to do any aggregation, but the syntax forces it
- one cannot easily convert a subquery to a
left join without this; a subquery will error out if more than one result is returned, while a left join would just increase the number of results (which can be undesirable)
user aggregations that accept more than one parameter
JoinString(column, delimiter) would be nice.
subquery -> join
SQL Server often generates poor execution plans for subqueries where a
left join would be more efficient. Why is the proper transformation not done more often? As a note,
zero_or_one(...) will often be needed unless
top is specified, in which case one can use
row_number().
Hard to do with TSQL, but desirable
@db variables
This is invalid syntax, which can get annoying when I want to run a script against multiple DBs:
declare @db sysname = 'Northwind'
-- ...
select *
from @db..Person