2/9/2012 9:11:50 AM
 Eric Moreau Posts: 2
|
Hi
I have downloaded and installed SQL Sentry 1.3 (was using 1.2 before).
I was looking at a particular query and saw a big number in the "Actual Data Size" column that I don't know how to interpret.
I have a table defined like this (column name obfuscated):
Column_name Type Computed Length Prec Scale Nullable A UT_icID no 4 10 0 no B int no 4 10 0 no C int no 4 10 0 no D UT_icID no 4 10 0 no E int no 4 10 0 no F datetime no 8 yes G datetime no 8 yes H UT_Name no 512 yes I UT_Symbol no 64 yes J nvarchar no -1 yes K datetime no 8 no L datetime no 8 no
When I run a "select top 1" on this table, I can see 4,368 in the "Actual Data Size" column. The DataLength of my column H is 8, column I is 8 and column J is 0. How do you come up to 4,368?
if I run a "select top 100", the size shows 436,800 when I know that 99% of the column J is NULL.
If I run a "select top 1" on this table commenting out the column J (which is defined as a nvarchar(max)), the size shows 354.
How to interpret this value?
|
|
0
• permalink
|
2/9/2012 4:52:57 PM
 Aaron Bertrand Moderator Posts: 57
|
Hi Eric,
Can you describe what base types are used to define the alias types UT_icID, UT_Name and UT_Symbol? Can you show a few rows of sample data so I'm sure we're looking at the same thing?
The 4,368 is the *estimated* row size, and this comes directly from SQL Server in the plan XML that we retrieve (it is not something we calculate). For variable (<8000 bytes) SQL Server is going to assume that it is 1/2 the declared size on average; in the case of max I think it will always show 4,000b as the average. When generating the plan SQL Server does not go look at the table, inspect all of the pages, and calculate how much data is actually there - it just assumes that all rows will be populated with this average size (and with the max size for fixed data types, plus some negligible row overhead that must be read off the page when reading rows). I'm not going to profess to be able to account for every single byte, but this explains why when you comment out column J you lose *about* 4,000b - regardless of how much data is actually in that row, the figure is based on the estimates/averages.
I am looking into why we don't show a different number under Actual Data Size.
Hope this helps (as a start), Aaron
-- Aaron Bertrand, Senior Consultant SQL Sentry, Inc.
|
|
0
• permalink
|
2/10/2012 10:20:19 AM
 Aaron Bertrand Moderator Posts: 57
|
Hi Eric,
On further investigation, the "Actual Data Size" was not calculated the way I expected, where we would make some interpolation into the raw data coming across (and admittedly it would be outrageously expensive to do so). It is actually just taking the estimated row size (provided to us in the XML from SQL Server) and multiplying by the number of *actual* rows. So if you have a VARCHAR(MAX) column, it's is always going to calculate estimated and actual data size at 4K * each row count - so the actual data size and estimated data size will only be different when the actual row count varies from the estimated row count. Unfortunately even in an actual plan SQL Server makes no effort to determine the size of the data it is producing, for the same reasons we don't attempt it - data profiling is very expensive and complicated.
So I'll concede that "Actual Data Size" might imply the wrong thing, since it actually represents "Estimated Data Size Based on Actual Row Count" but I think the latter is a horribly hard to read label, and I can't think of a more concise way to express the definition. We include the data to serve as a more realistic guideline than what you have with estimated, but I do acknowledge that it is not perfectly accurate.
Hope the information is useful. Aaron
-- Aaron Bertrand, Senior Consultant SQL Sentry, Inc.
|
|
0
• permalink
|
2/10/2012 10:23:28 AM
 Eric Moreau Posts: 2
|
Ok at least I have an explanation. I will ignore/hide this column as it is misleading!
|
|
0
• permalink
|
2/10/2012 10:26:32 AM
 Aaron Bertrand Moderator Posts: 57
|
Eric,
I understand. I will repeat that it is more accurate than just relying on estimated data size, especially in cases where the estimated and actual row counts vary greatly. I know that it can be helpful in some cases, particularly when you do populate the columns with relatively average amounts of data (and don't have a high proportion of NULLs). I'm hoping that SQL Server develops more features on data profiling in future releases.
Cheers, Aaron
-- Aaron Bertrand, Senior Consultant SQL Sentry, Inc.
|
|
0
• permalink
|