Storing Test Case Data in Excel 2007

When writing software test automation, some of the main ways you can store test case data include embedding the data directly into your test harness, storing as a simple text file, storing as an XML file, storing in a SQL database, and storing in an Excel file. Excel 2007 uses an interesting new file format called the "Open XML File Format" that changes the way in which you access data programmatically. The new file format is essentially XML (which provides a non-proprietary, easy to use format) which is compressed (which provides an efficient small size). Suppose I have test case data stored in an Excel 2007 worksheet named TestCases.xlsx and which looks like this:
 
caseID arg1 arg2 run      expected
001      3.5   4.1   TRUE   7.6
002      2.4   1.3   FALSE  3.7
003      5.2   3.3   TRUE   8.5
 
The .xlsx file is actually a .zip file so if I rename the file as TestCases.zip and then extract the compressed data using WinZip or some other utility, I will see three directories (_rels, docProps, xl) and one file named [Content_Types].xml. The actual data is contained in files sharedStrings.xml (located in the xl directory) and sheet1.xml (located in a subdirectory named \xl\worksheets). The sheet1.xml file contains XML including:
 
<sheetData>
  <row r="1" spans="1:5">
  <c r="A1" t="s">
    <v>0</v>
    </c>
  <c r="B1" t="s">
    <v>1</v>
    </c>
(etc.)
 
This means row 1 has 5 columns. The column at row A1 contains a string (t="s") whose value is index 0 (<v>0</v>). All string values are stored in a sharedStrings.xml file so that duplicate string need only be stored once. The sharedStrings.xml contains in part:
 
<si>
  <t>caseID</t>
</si>
<si>
  <t>arg1</t>
</si>
(etc.)
 
This means the string at [0] is "caseID", the string at [1] is "arg1", and so on. File sheet1.xml also contains:
 
<row r="2" spans="1:5">
  <c r="A2" s="1" t="s">
    <v>5</v>
  </c>
  <c r="B2">
    <v>3.5</v>
  </c>
(etc.)
 
This means the value at cell B2 is 3.5 — non-strings are simply stored as is. In short, Excel 2007 data is stored as compressed XML. To programmatically access Excel 2007 data you can use standard XML parsing techniques, and although the process is simple in principle the details can be a bit messy. 
This entry was posted in Software Test Automation. Bookmark the permalink.