T-SQL Bulk Insert and Indexes

One of the projects I’m working on involves Microsoft SQL Server’s dialect of SQL, T-SQL. I have some large files that I need to insert into a table. I was wondering about the performance implications of creating column indexes first (on my empty table) and then using bulk insert to transfer data from file to table, vs. the alternative of inserting data first (into the initially empty table) and then creating indexes on the table. I ran a few little experiments and found that inserting data first and then creating indexes was in general significantly faster than creating indexes and then inserting data. However, using the approach of inserting data first and then creating indexes, as I increased the size of the data I wanted to insert, at some point I eventually ran out of machine memory. So my hypothesis is that for "moderate" amounts of data (where the definition of moderate depends of the amount of memory your SQL Server server has) it is better, performance-wise, to insert data first and then create indexes later. But for "large" amounts of data this approach is not feasible and so you should create indexes first and then insert data later.
This entry was posted in Software Test Automation. Bookmark the permalink.