Tag Archives: boolean

Three valued logic – NULL in Sql Server


In SQL the logical expressions (predicates) can evaluate TRUE, FALSE and UNKNOWN. The third result is unique to “SQL world“ and is caused by NULL value.  This is called Three Valued Logic (3VL).   NULL represents the missing or UNKNOWN value.

The logic could be confusing because Boolean, two valued logic expressions can evaluate only to TRUE or FALSE – a comparison between two KNOWN values gives us TRUE or FALSE.

With NULL as a mark or a placeholder, we have an UNKNOWN value, so the question is how to compare a known with an unknown value or an unknown with an unknown.

The logical expressions with NULLs  can evaluate to NULL, TRUE and FALSE e.g.

SELECT IIF(1=NULL,'true','unknown')      AS  [1 = NULL]
      ,IIF(NULL=NULL,'true','unknown')   AS  [NULL = NULL]
      ,IIF(NULL+22<=33,'true','unknown') AS  [NULL*22 <= 33]

nullExpressionResult

The result of NULL negations can be confusing too

  • The opposite of True is False (NOT True = False)
  • The opposite of False is True (NOT False = True)
  • The opposite of UNKNOWN is UNKNOWN (NOT NULL = NULL)

Conjunctive(AND) predicates with Nulls evaluates:

  • True  AND NULL  => NULL
  • False AND NULL => False

and Disjunctive (OR) predicates with Null evaluates:

  • True OR NULL => True
  • False OR NULL => NULL
  • NULL OR NULL => NULL

The logic is confusing and it was source of an ongoing discussion about NULLS for a long time.
The important thing to note is to put the logic in the context of DB design and to understand how Sql Server treats NULLs in different scenarios. Knowing that we can  benefit from using the UNKNOWNS and avoid possible logical errors.

Sql Server is INCONSISTENT when evaluates the logical expressions with NULL values – The same expressions may produce different results TRUE, FALSE and sometimes UNKNOWN.
The following examples shows how SqlServer deals with NULLs in different elements of Tsql language.

 QUERY FILTERS (ON,WHERE,HAVING)

Create a couple of test tables:

CREATE TABLE #Products(ProductId INT IDENTITY(1,1)
                           PRIMARY KEY CLUSTERED
                       ,Name VARCHAR(500) NOT NULL
                       ,Price DECIMAL(8,2) NULL CHECK(Price > 0.00)           
                       ,ProductCategoryId INT NULL)
GO
CREATE TABLE #ProductCategories(ProductCategoryId INT
                                   UNIQUE CLUSTERED
                               ,Name VARCHAR(500) NOT NULL)
GO

..and add some data

INSERT INTO #Products(Name,Price, ProductCategoryId)
     SELECT tab.* 
     FROM (VALUES ('Lego',65.99,200)
                 ,('iPad',750.00,300)
                 ,('Gyroscope',NULL,NULL)
                 ,('Beer',3.55,NULL)
                 ,('PC',2500.00,300)) tab(name,price,catid)
GO
INSERT INTO #ProductCategories(ProductCategoryId,Name)
    SELECT tab.*
    FROM (VALUES (200,'KidsGames')
                ,(300,'Gadgets')
                ,(NULL,'Liquor')
                ,(500,'Camping')) tab(id,name)
GO

Test1: Select all the products over $500.00.
The product with the UNKNOWN(NULL) price will be excluded from the result-set as it was treated as FALSE. Only these rows for which the predicate evaluates to TRUE will be included in the final result-set.

SELECT * 
FROM #Products 
WHERE Price  > 500

Null1

We’ll get the similar result if we try to select all products with not unknown price like:

SELECT * 
FROM #Products 
WHERE Price <> NULL;

The result will be an empty set. Not NULL is still UNKNOWN and is treated as FALSE.

The UNKNOWN result is treated as FALSE when evaluated by ON filter.

SELECT p.*,pc.* 
FROM #Products p
  INNER JOIN #ProductCategories pc
    ON p.ProductCategoryId = pc.ProductCategoryId

As we can see from the result-set, the product “Gyroscope” is excluded from the result as NULL = NULL evaluated to UNKNOWN and was treated as FALSE.

Null2

CHECK Constraint

In case of CHECK constraint, SQL Server treats UNKNOWN result as TRUE.
The product “Gyroscope” has UNKNOWN price. The CHECK constraint on the price column allows only those values which evaluates to TRUE when compared with 0.  In this case Sql Server evaluated;

NULL >0 => True

opposed to the previous example when

.. NULL > 500 => False

To confirm the CHECK constraint behavior we can insert a new product into #Products table without getting a Check constraint error:

INSERT INTO #Products(Name,Price, ProductCategoryId)
    SELECT 'Microsoft Surface Pro',NULL,300

This can make sense if we put the NULL logic into the context of the DB design, e.g. All products must have price greater than $0 or NULL if the price hasn’t been decided yet.

Null3UNIQUE Constraint and Set operators

It was mentioned earlier that  NULL = NULL evaluates to UNKNOWN. This means that NULLs are distinct( no two NULLs are equal)

The UNIQUE constraint treats NULL as a distinct value in the context of all other values in the column i.e ProductCategoryId can have only one NULL value and the distinct integers. However, allowing only single NULL value(inserting another NULL value will throw Violation of UNIQUE KEY constraint error), the constraint treats all UNKNOWN values(NULLS) as equal or not distinct.

INSERT INTO #ProductCategories
        SELECT NULL, 'Musical Instruments'

UniqueConstraintError

The table #ProductCategories has an UNIQUE constraint (enforced by a clustered index) on ProductCategoryId column.

The SET operators UNION, UNION ALL* , EXCEPT,  INTERSECT

Sql Server’s  Set operators treat NULLs as non distinct, in other words  as equal.
e.g. Select the rows from tables @Tab1 and @Tab2 that are common for both tables.

DECLARE @Tab1 TABLE (ID INT, Val VARCHAR(5))
DECLARE @Tab2 TABLE (ID INT, Val VARCHAR(5))

INSERT INTO @Tab1(ID,Val)
    SELECT t1.id, t1.Val
    FROM (VALUES (1,NULL),(2,NULL),(3,NULL),(4,'Dean'),(5,'Peter'),(6,'Bill')) t1(id,Val)

INSERT INTO @Tab2(ID,Val)
    SELECT t2.id, t2.Val
    FROM (VALUES (1,NULL),(2,NULL),(3,NULL),(4,'Dean'),(5,'Bahar'),(6,'Nick')) t2(id,Val)

SELECT ID AS keyValue, Val AS Value FROM @Tab1
INTERSECT 
SELECT ID,Val FROM @Tab2

NullsIntersection

As we can see from the result-set, the SET operator treats NULL values as equal e.g  Element (1,NULL ) = Element (1,NULL) and therefore belongs to “the red zone”, whereas e.g Element(5,Bahar) <>(5,Peter) and therefore was left out from the result-set.

*UNION ALL is a multi set operator that combines two sets into one non distinct set (takes all elements from both sets)

 Aggregate Functions

All Aggregate functions except COUNT(*) ignore NULLs.

DECLARE @Tab3 TABLE (ID INT NULL)
INSERT INTO @Tab3
  SELECT t.* 
  FROM (VALUES(1),(2),(NULL),(NULL),(3) ) t(i)

  SELECT  AVG(ID)   AS [AVG(ID)]
         ,COUNT(ID) AS [COUNT(ID)]
         ,SUM(ID)   AS [SUM(ID)] 
         ,MAX(ID)   AS [MAX(ID)]
         ,MIN(ID)   AS [MIN(ID)]
         ,'***'AS [***],COUNT(*) AS [COUNT(*)]
          from @Tab3

NullAgg

Note: COUNT(col1) eliminate NULLs as the other aggregate functions. However, if we try something like: SELECT COUNT(NULL+1) the function returns 0 eliminating the result of the expression, which is NULL 🙂
Another interesting thing with this, is that SELECT COUNT(NULL) throws an error – Null does not have any data type (COUNT() always does implicit conversion to INT) and e.g. SELECT COUNT(CONVERT(varchar(max),NULL)) returns 0 – now COUNT() has a datatype to work with.

Cursors (Order by)

When a query requests sorting the result-set becomes cursor – sets are not ordered. Sql Server treats NULL values as distinct and sorts the unknowns first if the sort is Ascending (Descending sort order puts NULL values  last)

SELECT * 
FROM #Products
ORDER by ProductCategoryId

NullOrder

*Just a small digression. One way to show NULL values last when  sorting in Ascending direction is to use tsql’s ability to sort record-set by non-selected columns and/or expressions.

SELECT *  ---,[NullsLast]= IIF(ProductCategoryId IS NULL,1,0)
FROM #Products
ORDER BY  CASE 
            WHEN ProductCategoryId IS NULL THEN 1
            ELSE 0
          END 
          ,ProductCategoryId

The NULL expressions results can explain the “unexpected” results  when used in anti-semi joins, but that will be covered in a separate post.

Conclusion

Three valued logic is unique to DB wold. ANSI SQL Standard provides rules/guidelines about NULL values. However, different RDBMS may handle NULLs  differently in different situations. It is important to understand how Sql Server treats NULL values in different elements of tsql language. Using NULLs  in the context of DB design can make 3VL work for you.

Thanks for reading.

Dean Mincic

Short–Circuit evaluation in Sql Server


The Short-Circuit(SC) or The minimal evaluation is the way programing languages evaluates arguments involved in Boolean expressions.
SC means that the arguments are evaluated from left to right until the value of the expression is determined. The rest of the arguments (if any left) will not be evaluated.
For example, c# evaluates the arguments (if logical operators && and || are used) from left to right.The evaluation of the expression stops as soon as the outcome is determined.

..
if(object_A !=null && object_A.methodA()) == "some value"{
 //Perform some action ;
}

With SqlServer(and RDBMS systems in general) things are different when it comes to SC. In short, in Sql Server we can not relay on the minimal evaluation. Essentially, this is because of declarative nature of TSQL language. In TSQL we almost always describe only WHAT to do (e.g Give me the list of all orders processed on a particular day from the specific geographic area made by Customers who..etc) but not HOW to do it (Query optimiser-QO and other Sql Server engine components) will find the optimal way to fulfill your request.
Therefore, the order of evaluation of the query predicates may be sequential, from left to right, but it’s not guaranteed. QO may change the ordering and/or push the predicate evaluation down the query tree e.g to the data access operator (e.g index seek).
The following example shows a few case scenarios of how Sql Server evaluates predicates(logical expressions).

Create a test table:

USE tempdb
GO
IF EXISTS(SELECT 1 FROM tempdb.sys.objects 
          WHERE object_id=OBJECT_ID('TestShortCircuit'))
    DROP TABLE dbo.TestShortCircuit
GO
CREATE TABLE dbo.TestShortCircuit(
     CurrencyId      NCHAR(3) NOT NULL 
    ,Name            NVARCHAR(30) NOT NULL
    ,CONSTRAINT PK_CurrencyId PRIMARY KEY CLUSTERED(CurrencyId))
GO
INSERT INTO dbo.TestShortCircuit(CurrencyId,Name)
    SELECT tab.id, tab.name
    FROM (VALUES  (N'AUD',N'Australian Dollar')
                 ,(N'EUR',N'EURO')
                 ,(N'HA!',N'98765')
                 ,(N'USD',N'US Dollar')
                 ) tab(id,name)
GO

Test 1: Conjunctive predicates (multiple predicates separated by AND Logical operators) and predicate evaluation sequence.

SELECT * 
FROM dbo.TestShortCircuit
WHERE  Name = 98765 AND CurrencyId = N'HA!'
--WHERE  Name = 98765
/*Msg 245, Level 16, State 1, Line 28
Conversion failed when converting the nvarchar value 'Australian Dollar' to data type int.*/

The query executes with no error.

Test1Plan
Figure 1: Conjunctive predicates – second expression evaluated first

Switching the predicate places will have no effect to the QO’s reasoning.
If we ran the same query without the second expression the query would fail due to conversion error.

Successful query: Query optimiser knows that the second expression, CurrencyId = N’HA!’ includes Primary Key constraint on the column(enforced by an unique clustered index) and therefore can evaluate to true maximum once. QO decides to “push” second predicate (CurrencyId = N’HA!’e) down to the query tree to the Index Seek data access operator making it “Seek Predicate”.

If the Seek operator does not find clustered N’HA!’ the WHERE filter evaluates to false making the first expression redundant. The first expression is shown in the Predicate section of the execution plan only for consistency purpose.

On the other hand , if the operator finds the row, the first predicate will be evaluated (Name=98765). QO will then try to convert the column value(N’98765′) to INT and than to compare with the given argument(98765) . If the conversion succeed(like in the example) the query will execute with no errors. However, if the conversion fails, the query will return error

--this will return conversion error
SELECT * 
FROM dbo.TestShortCircuit 
WHERE  Name = 98765 AND CurrencyId = N'USD'

Test 1 shows that we cannot know which expression will be evaluated first(if evaluated at all) and we must not relay on the minimal evaluation.

Test 2: Disjunctive predicates (multiple predicates separated by OR Logical operators) and predicate evaluation sequence).

The query below will execute with no conversion errors.

SELECT * 
FROM dbo.TestShortCircuit 
WHERE N'A'= N'A' OR Name = 98765

The successful execution of the query may lead us to believe that Sql Server uses minimal evaluation, but it’s not the case.

If the predicates switch the places, the query will still run with no errors. This wouldn’t be possible if Name=98765 predicate evaluated first, due to conversion error. The answer lies, again, in the way QO handles the query. In this case, during the optimisation, QO simplifies the query by excluding all predicates knowing that A=A will always evaluate to true for all rows. It simply uses table scan operator to return all rows.

Test1PlanORFigure 2: Disjunctive predicates, query simplification and SC

Test2 again shows that we the sequence of predicate evaluation is controlled by QO and is not short-circuited.

Sidenote:  the query below will fail due to conversion error (the order of the predicates is irrelevant). The QO cannot use clustered index seek because of the OR operator.  Even if we had an index on Name column, the index wouldn’t be used because of the implicit conversion on the column.

SELECT * 
FROM dbo.TestShortCircuit 
WHERE CurrencyId = N'HA!' OR Name = 98765

Conclusion:

It is not recommended to relay on the minimal evaluation in Sql Server (I would say in any DB engine) because of declarative nature of SQL language. Query Optimiser,  one of Sql Server’s relational engine components, will transform/simplify the query and during the process may reorganise the “WHERE” predicates making the order of the logical expressions irrelevant.

Thanks for reading.

Dean Mincic