I have a table with 490 M rows and 55 GB of table space, so about 167 bytes per row. The table has three columns: a
DATETIME2(0), and a
SMALLINT. The average length of the text in the
VARCHAR field is about 21.5, so the raw data should be around 32 bytes per row: 22+2 for the
VARCHAR, 6 for the
DATETIME2, and 2 for the 16-bit integer.
Note that the space above is data only, not indices. I'm using the value reported under Properties | Storage | General | Data space.
Of course there must be some overhead, but 135 bytes per row seems like a lot, especially for a large table. Why might this be? Has anyone else seen similar multipliers? What factors can influence the amount of extra space required?
For comparison, I tried creating a table with two
INT fields and 1 M rows. The data space required was 16.4 MB: 17 bytes per row, compared to 8 bytes of raw data. Another test table with an
INT and a
VARCHAR(100) populated with the same text as the real table uses 39 bytes per row (44 K rows), where I would expect 28 plus a little.
So the production table has considerably more overhead. Is this because it's larger? I'd expect index sizes to be roughly N * log(N), but I don't see why the space required for actual data to be non-linear.
Thanks in advance for any pointers!
All of the fields listed are
NOT NULL. The real table has a clustered PK on the
VARCHAR field and the
DATETIME2 field, in that order. For the two tests, the first
INT was the (clustered) PK.
If it matters: the table is a record of ping results. The fields are URL, ping date/time, and latency in milliseconds. Data is constantly appended, and never updated, but data is deleted periodically to cut it down to just a few records per hour per URL.
A very interesting answer here suggests that, for an index with much reading and writing, rebuilding may not be beneficial. In my case, the space consumed is a concern, but if write performance is more important, one may be better off with flabby indices.
Have the data types changed over time? Have variable-length columns been removed? Have the indexes been defragmented often but never rebuilt? Have a lot of rows been deleted or have a lot of variable-length columns been updated significantly? Some good discussion here.
The on-disk structures have overhead:
Taking 2 x 4 bytes int columns, you have
Wow 17 bytes!
You can the same for your second test table which has more overhead like your original one:
Why the difference? In addition (I won't link to these)
@updateusage = 'true'
After discussions in the comments on the original question, it appears in this case the lost space is caused by the choice of clustered key, which has led to massive fragmentation.
Always worth checking the state of fragmentation via sys.dm_db_index_physical_stats in these situations.
Edit: Following update in comments
The average page density (prior to rebuild of the clustered index) was 24%, which fits perfectly with the original question. The pages were only 1/4 full, so the total size was 4x the raw data size.