1. Introduction
Microsoft works so hard to release new versions of SQL Server with all the
improvements and new features every 2 years. In this article, I want
to review 5 features that haven’t been utilized by us. I don’t want to call
them new features because some of them exist for a long time, we just don’t
realize they are there or we just don’t know how to use them.
2. Features
2.1 Feature #1: Table-Valued Parameter
Table-Valued Parameter is introduced in SQL
Server 2008
Table-valued parameters are declared by using
user defined table types. You can use table-valued parameters to send multiple
rows of data to a Transact-SQL statement or a routine, such as a stored
procedure or function, without creating a temporary table or many parameters.
Table-valued parameters are like parameter
arrays in OLE DB and ODBC, but offer more flexibility and closer integration
with Transact-SQL. Table-valued parameters also have the benefit of being able
to participate in set based operations.
Where to use it?
One place to use it is to pass unlimited parameters to a stored
procedure. For example, we have such stored procedure
usp_process_all. It is designed to let user can delete selected
documents. There is a @document_ids parameter that is in varchar(max) type. A
better design is to store all selected document_id in a table value parameter
and pass it into the stored procedure.
CREATE PROCEDURE usp_process_all
@document_ids
varchar(max),
AS
BEGIN
...
END
|
2.2 Feature #2: CLR in SQL Server
CLR in
SQL Server is introduced in SQL Server 2008
The common language
runtime (CLR) is the heart of the Microsoft .NET Framework and provides the
execution environment for all .NET Framework code. Code that runs within the
CLR is referred to as managed code. The CLR provides various functions and
services required for program execution, including just-in-time (JIT)
compilation, allocating and managing memory, enforcing type safety, exception
handling, thread management, and security.
With the CLR hosted in
Microsoft SQL Server (called CLR integration), you can author stored
procedures, triggers, user-defined functions, user-defined types, and
user-defined aggregates in managed code. Because managed code compiles to
native code prior to execution, you can achieve significant performance
increases in some scenarios.
You can create stored
procedure, trigger, user-defined function in SQL Server. But, SQL has very
limited ability to handle complicate situations, such as check if date is in a
string. The solution is to create a stored procedure, trigger, or function with
C#.NET
To search something in memo field
SELECT TOP 1000 memo
FROM [product]
WHERE memo MATCH('$abc.*')
|
2.3 Feature #3: Improved Date and Time Types
Date, Time, DateTime2, and DateTimeOffset are
introduced in SQL Server 2008
There are Date, Time,
DateTime2 and DateTimeOffSet added in SQL Server 2008. time, datetime2 and
datetimeoffset provide more seconds precision. Datetimeoffset provides time
zone support for globally deployed applications.
Date
Defines a date in SQL
Server
Time
Defines a time of a day.
The time is without time zone awareness and is based on a 24-hour clock.
DateTime2
Defines a date that is
combined with a time of day that is based on 24-hour clock. datetime2 can be
considered as an extension of the existing datetime type that has a larger date
range, a larger default fractional precision, and optional user-specified
precision.
DateTimeOffset
SQL Server added a new
data type named “datetimeoffset”. This is similar to the old datetime data
type, with the following significant differences:
-
Internally, the time is stored in unambiguous
UTC format
-
The local time zone offset is stored along with
the UTC time, which allows the time to be displayed as a local time value (or
converted to any other time zone offset)
-
The data type is capable of storing more precise
times than datetime
Sometimes, we need to store datetime in UTC format. And then, we can use DateTimeOffset.
2.4 Feature #4: Table Partition
Table Partition is introduced in SQL Server 2005 Enterprise Edition
What is table partition?
In order to answer this
question, we need to answer following questions.
What are partitions and
why use them? The simple answer is: To improve the scalability and
manageability of large tables and tables that have varying access patterns.
Typically, you create tables to store information about an entity, such as
customers or sales, and each table has attributes that describe only that
entity. While a single table for each entity is the easiest to design and
understand, these tables are not necessarily optimized for performance,
scalability, and manageability, particularly as the table grows larger.
What constitutes a large
table? While the size of a very large database (VLDB) is measured in hundreds
of gigabytes, or even terabytes, the term does not necessarily indicate the
size of individual tables within the database. A large database is one that
does not perform as desired or one in which the operational or maintenance
costs have gone beyond predefined maintenance or budgetary requirements. These
requirements also apply to tables; a table can be considered large if the
activities of other users or maintenance operations have a limiting affect on
availability. For example, the sales table is considered large if performance
is severely degraded or if the table is inaccessible during maintenance for two
hours each day, each week, or even each month. In some cases, periodic downtime
is acceptable, yet it can often be avoided or minimized by better design and
partitioning implementations. While the term VLDB applies only to a database,
for partitioning, it is more important to look at table size.
In addition to size, a
table with varying access patterns might be a concern for performance and
availability when different sets of rows within the table have different usage
patterns. Although usage patterns may not always vary (and this is not a
requirement for partitioning), when usage patterns do vary, partitioning can
result in additional gains in management, performance, and availability. Again,
to use the example of a sales table, the current month's data might be
read-write, while the previous month's data (and often the larger part of the
table) is read-only. In a case like this, where data usage varies, or in cases
where the maintenance overhead is overwhelming as data moves in and out of the
table, the table's ability to respond to user requests might be impacted. This,
in turn, limits both the availability and the scalability of the server.
Additionally, when large
sets of data are being used in different ways, frequent maintenance operations
are performed on static data. This can have costly effects, such as performance
problems, blocking problems, backups (space, time, and operational costs) as
well as a negative impact on the overall scalability of the server.
How can partitioning
help? When tables and indexes become very large, partitioning can help by
partitioning the data into smaller, more manageable sections. The table
partition feature focuses on horizontal partitioning, in which large groups of
rows will be stored in multiple separate partitions. The definition of the
partitioned set is customized, defined, and managed by your needs. Microsoft
SQL Server 2005 allows you to partition your tables based on specific data
usage patterns using defined ranges or lists. SQL Server 2005 also offers
numerous options for the long-term management of partitioned tables and indexes
by the addition of features designed around the new table and index structure.
Furthermore, if a large
table exists on a system with multiple CPUs, partitioning the table can lead to
better performance through parallel operations. The performance of large-scale
operations across extremely large data sets (for instance many million rows)
can benefit by performing multiple operations against individual subsets in
parallel. An example of performance gains over partitions can be seen in
previous releases with aggregations. For example, instead of aggregating a
single large table, SQL Server can work on partitions independently, and then
aggregate the aggregates. In SQL Server 2005, queries joining large datasets
can benefit directly from partitioning; SQL Server 2000 supported parallel join
operations on subsets, yet needed to create the subsets on the fly. In SQL
Server 2005, related tables (such as Order and OrderDetails tables) that are
partitioned to the same partitioning key and the same partitioning function are
said to be aligned. When the optimizer detects that two partitioned and aligned
tables are joined, SQL Server 2005 can join the data that resides on the same
partitions first and then combine the results. This allows SQL Server 2005 to
more effectively use multiple-CPU computers.
History of partition
Partitioning Objects in
Releases before SQL Server 7.0 (This is designed by application designer. We
are still doing this today)
Partitioned View in SQL
Server 7.0 (Only allow SELECT)
Partitioned View in SQL
Server 7.0 (Also allow INSERT, UPDATE, DELETE)
Partition Table in SQL
Server 2005
Application may stores a lot of data. For example,
one of our application has a table that contains more 1 million records. The application performance
becomes unbearable when user tries to review data in those big tables. We did a
lot of table optimization and designed archive data to store old data. If we
use table partition, then we don’t need to create archive database and manually
move old data into it anymore.
2.5 Feature #5: Pagination
When is
it introduced?
Table Partition is introduced in SQL Server
2005 Enterprise Edition
Pagination is a common use case throughout
client and web applications everywhere. Google shows you 10 results at a time,
your online bank may show 20 bills per page, and bug tracking and source
control software might display 50 items on the screen.
Based on the indexing of
the table, the columns needed, and the sort method chosen, paging can be
relatively painless. If you're looking for the "first" 20 customers
and the clustered index supports that sorting (say, a clustered index on an
IDENTITY column or DateCreated column), then the query is going to be pretty
efficient. If you need to support sorting that requires non-clustered indexes,
and especially if you have columns needed for output that aren't covered by the
index (never mind if there is no supporting index), the queries can get more
expensive. And even the same query (with a different @PageNumber parameter) can
get much more expensive as the @PageNumber gets higher – since more reads may
be required to get to that "slice" of the data.
Some will say that
progressing toward the end of the set is something that you can solve by
throwing more memory at the problem (so you eliminate any physical I/O) and/or
using application-level caching (so you're not going to the database at all).
Let's assume for the purposes of this post that more memory isn't always
possible, since not every customer can add RAM to a server that's out of memory
slots, or just snap their fingers and have newer, bigger servers ready to go.
Especially since some customers are on Standard Edition, so are capped at 64GB
(SQL Server 2012) or 128GB (SQL Server 2014), or are using even more limited editions
such as Express (1GB) or whatever they're calling Azure SQL Database this week
(many different servicing tiers).
SQL Server 2012 added
OFFSET / FETCH that provides more linear paging performance across the entire
set, instead of only being optimal at the beginning.
We
have a long history of trying to tackle the pagination challenge in a web application.
In the beginning, there is ASP.NET default pagination. We retrieve
all the data from database and load them into DataGrid. If the search result
has 100K records, even we just want to show the first page, we will transfer
the 100K records from database server to application, load them into DataGrid,
then transfer the 100K records from application to client browner. This
generates a lot of network traffic and makes application very slow.
Then, we start to use custom paging in DataGrid, so we will only
load the first data into DataGrid and tell DataGrid what’s the total number of
records. In order to do so, we created two routings to get current page and
total number. Because there is not always a record number to let us get the
current page, so we need to load data into a temp table in SQL Server, and then
based on the auto-generated record number to get the current page.
SQL Server 2005
introduced Window function, it can be used to avoid using temp table get record
number, so we are able to get the current page much quicker. However, window
page is not optimized for pagination, so there is still performance impact.
3. References
(1)
Table Value Parameters in SQL Server 2008 and
.NET (C#) (https://www.mssqltips.com/sqlservertip/2112/table-value-parameters-in-sql-server-2008-and-net-c/)
(2)
CLR Assembly RegEx Functions for SQL Server by
Example (https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/ )
(3)
Choosing Between DateTime, DateTimeOffset,
TimeSpan, and TimeZoneInfo (https://msdn.microsoft.com/en-us/library/bb384267%28v=vs.110%29.aspx)
(4)
Partitioned Tables and Indexes in SQL Server
2005 (https://technet.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx#sql2k5parti_topic1)
No comments:
Post a Comment