SEER*Prep reads data from text files and creates a SEER*Stat database containing that data. Future versions of the software will allow you to define your own format for the input files. For the current version, input files must be created in one of the currently supported formats documented below. The text data files must also adhere to the following rules:
- Text data files must be fixed length, with the exception of the NAACCR 22 CSV formats. The record length is specified below for each of the supported fixed-width file formats.
If your data files are not the correct format, you must modify the files before using
SEER*Prep.
- For fixed-width records, you may be able to use the Windows FixLen program provided in SEER*Prep Utilities.
- For NAACCR XML format records, there are several options for converting NAACCR XML to CSV:
- Refer to Example 1 in How to Use SEER*Prep for instructions using SAS, or
- File*Pro software can also convert to CSV.
- All numeric variables must be formatted using a defined length, with leading zeros when appropriate. For example, a value of 1 in a variable with length=2 must be stored as "01".
- Fixed-width input data must be stored in either text or compressed text files.
- If the input file is a text file then it must be named with a .txd extension.
- A compressed format may be used to reduce the disk space required to store the data. Gzip, a free utility, creates files using the only compression format supported by SEER*Prep.
- SEER*Prep requires gzipped data files to have a .txd.gz extension.
- NAACCR CSV input data requires NAACCR XML IDs in the header row. CSV files cannot be compressed.
SEER*Prep Database Description Files
In order to convert text data into a SEER*Stat database, SEER*Prep requires a complete description of the text files. This information is stored in a SEER*Prep Database Description (DD) file, including variable locations and valid values for each variable. Incidence and mortality description files also contain file format information for optional population data which are used to generate rates. DD files for the currently supported file formats are installed with the SEER*Prep software.
SEER*Prep can be used to generate input file documentation from the DD files. At any time, you can use SEER*Prep to generate a report containing the file descriptions by selecting Generate Input File Description from the File menu.
Supported File Formats
The available formats and required record lengths for the fixed-width input data are described below. The latest version of SEER*Prep includes database description files with the installation.
Incidence:
- NAACCR 23 CSV Format: For County Populations (DD, 1.3 MB) (updated 04/03/2024) and Census-Tract Populations (DD, 1.3 MB) (updated 04/03/2024)
- Global (DD, 392 KB) (updated 04/03/2024) - 334 Byte Format (for the difference between the NAACCR and the Global formats see the FAQs.)
Mortality:
- Mortality (DD, 311 KB) (updated 11/28/2022) - 58 Byte Format (66 Byte if summarized mortality counts rather than individual death records)
- NAACCR 23 CSV Format: For Incidence-Based Mortality (DD, 1.3 MB) (updated 04/03/2024)
Auxiliary Data:
- Expected Survival (DD, 11 KB) (updated 5/19/2006) - 29 Byte Format
- Standard Populations (DD, 14 KB) (updated 12/29/2020) - 14 Byte Format
- Populations Only (DD, 254 KB) (updated 11/28/2022) - 26 Byte Format
- Census tract attributes (DD, 4.0 MB) (updated 2/28/2023) - 246 Byte Format