Using SQL Bulk Insert with a Format File

Every once in a while I want to insert data from a large text file into SQL table and the structure of the text file does not exactly match the structure of the SQL table. It always takes me a little bit of time to figure out how to construct a format file. For example suppose I had a SQL table like:
create table tblCars
[cid] char(3) primary key,
[make] varchar(20) not null,
[model] varchar(20) null,
[year] int null
And suppose I had a text file like:
Notice that the text file has 5 fields, including a color field and a transmission field which are not in the table, and that the table has a model column not represented in the file, and that the order of the table columns does not match the order of the file fields. A possible .fmt format file to map fields to columns is:
1 SQLCHAR  0  3   ","     1   cid     SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR  0  12  ","     4   year    ""
3 SQLCHAR  0  20  ","     0   dummy   SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR  0  20  ","     2   make    SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR  0  12  "\r\n"  0   dummy   SQL_Latin1_General_CP1_CI_AS
And the format file could be used like:
bulk insert dbCars..tblCars from ‘C:\cars.txt’
with (formatfile = ‘C:\cars.fmt’)
In the format file, the 10.0 is the SQL Server version (SQL Server 2008). The 5 is the number of fields in each line of the textg file. The 1,2,3,4,5 are each field in the file. The SQLCHAR means all data in the source file is text data. The 0,0,0,0,0 are "prefix lengths" which is a tricky topic but for inserting text file data can all be 0. The 3,12,20,20,12 are maximum number of characters in each field. Bedcause the text file is delimited these values could all be 0 too. The 1,4,0,2,0 tell Bulk Insert which column the field goes in, so field 2 goes into column 4. A value of 0 means do not put the field into the table. The cid,year,dummy,make,dummy are the column names. Dummy is not a keyword, but you have to put some name there so ‘dummy’ or ‘notused’ are reasonable choices. The last column of the format file is the collating sequence, and I don’t fully understand it, especially because you can use "" for all values.
This entry was posted in Software Test Automation. Bookmark the permalink.