January 19, 2011

Difference between DELETE and TRUNCATE

DELETE logs the data for each row affected by the statement in the transaction log and physically removes the row from the file, one row at a time. The recording of each affected row can cause your transaction log grow massively if you are deleting huge numbers of rows. However, when you run your databases in full recovery mode, detailed logging is necessary for SQL Server to be able to recover the database to the most recent state, should a problem arise. The fact that each row is logged explains why DELETE statements can be slow.

TRUNCATE is faster than DELETE due to the way TRUNCATE "removes" rows. Actually, TRUNCATE does not remove data, but rather deallocates whole data pages and removes pointers to indexes. The data still exists until it is overwritten or the database is shrunk. This action does not require a lot of resources and is therefore very fast. It is a common mistake to think that TRUNCATE is not logged. This is wrong. The deallocation of the data pages is recorded in the log file. Therefore, BOL refers to TRUNCATE operations as "minimally logged" operations. You can use TRUNCATE within a transaction, and when this transaction is rolled-back, the data pages are reallocated again and the database is again in its original, consistent state.

Some limitations do exist for using TRUNCATE.

· You need to be db_owner, ddl_admin, or owner of the table to be able to fire a TRUNCATE statement.

· TRUNCATE will not work on tables, which are referenced by one or more FOREIGN KEY constraints.

So if TRUNCATE is so much faster than DELETE, should one use DELETE at all? Well, TRUNCATE is an all-or-nothing approach. You can't specify just to truncate those rows that match a certain criteria. It's either all rows or none.

We can have triggers associated with DELETE operation.
We cannot have triggers associated with TRUNCATE operation.

You can, however, use a workaround here. Suppose you want to delete more rows from a table than will remain. In this case you can export the rows that you want to keep to a temporary table, run the TRUNCATE statement, and finally reimport the remaining rows from the temporary table. If your table contains a column with the IDENTITY property defined on it, and you want to keep the original IDENTITY values, be sure to enabled IDENTITY_INSERT on the table before you reimport from the temporary table. Chances are good that this workaround is still faster than a DELETE operation.

You can also set the recovery mode to "Simple" before you start this workaround, and then back to "Full" one it is done. However, keep in mind that is this case, you might only be able to recover to the last full backup.

No comments:

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...