Spreadsheet Data Upload

Eureka! currently supports uploading data into i2b2 from an Excel spreadsheet with the xlsx extension. The spreadsheet's contents must conform to a specific format and the data must be represented in a certain way for Eureka! to recognize the data and load it into i2b2 properly.

Creating a Spreadsheet

Eureka! comes with a sample spreadsheet that conforms to the proper format. To get it, click the Submit Job button in the button bar, and then click the Download Sample Spreadsheet link. This spreadsheet contains synthetically generated data. Little attempt was made to make the data look like it came from real patients - it is there purely as an example. We highly recommend that when creating your own datasets that, instead of beginning with a blank spreadsheet, you start with this spreadsheet and replace the data with your own.

The sample spreadsheet has multiple tabs. The tabs each represents a kind of database table. Each row is a record and each column represents an attribute. The tabs are "linked" to each other through primary and foreign key relationships as described below that link patients to encounters, encounters to providers, etc. Note that while some tabs contain columns for storing data that are HIPAA identifiers (e.g., patient name), it is YOUR responsibility not to put real patient data in those fields. For more information on what constitutes a HIPAA identifier, consult your institution's Institutional Review Board (IRB).

The tabs are as follows:

Patient
contains patient name (name_first and name_last) and demographics (dob, language, marital_status, race and gender). The patient_key field represents an unique identifier or key for each patient. There should be one row or record per patient.
Provider
represents healthcare providers in the dataset. It has columns for each provider's first name (name_first) and last name (name_last). The provider_key column represents a unique primary key for each provider. There should be one row or record per provider.
Encounter
contains the encounter start and end date/time (start_ts and end_ts), the type of the encounter (encounter_type) and the encounter's discharge status (discharge_disp). The encounter_key column contains a unique id for each encounter. The patient_key column contains the key of the patient in the patient tab whose encounter this was. The provider_key column contains the key of the provider (from the provider tab) that was the healthcare provider of record for the encounter. There should be one row or record per encounter in your dataset. There may be multiple encounters for a given patient.
eCPT
contains billing codes for procedures from the Current Procedural Terminology (CPT). There are columns for the procedure time (obx_ts) and the procedure code (represented as the code prefixed by CPT:, e.g., CPT:75505). The event_key column represents a primary key for each procedure. The encounter_key contains the key of the encounter in which the procedure occurred. Eureka! knows about the CPT codes listed in the metadataCPT tab. There may be multiple CPT codes for a given encounter.
eICD9D
contains billing codes for diagnoses from the International Classification of Diseases (ICD-9-CM). There are columns for the diagnosis time (obx_ts) and the diagnosis code (represented as the code prefixed by ICD9:, e.g., ICD9:V44.1). The event_key column represents a primary key for each diagnosis code. The encounter_key contains the key of the encounter in which the diagnosis was recorded. Eureka! knows about the ICD9-CM diagnosis codes listed in the metadataICD9D tab. There may be multiple ICD-9-CM diagnosis codes for a given encounter.
eICD9P
contains billing codes for procedures from the Internal Classification of Diseases (ICD-9-CM). There are columns for the procedure time (obx_ts) and the procedure code (represented as the code prefixed by ICD9:, e.g., ICD9:55.02). The event_key column represents a primary key for each procedure. The encounter_key column contains the key of the encounter in which the procedure occurred. Eureka! knows about the ICD9-CM procedure codes listed in the metadataICD9P tab. There may be multiple ICD-9-CM procedure codes for a given encounter.
eMEDS
contains medication orders. The coding system is non-standard. Available codes, descriptive strings and the classes of medications that they represent may be found in the metadataMEDS tab. There are columns for the order time (obx_ts) and the code for the order (entity_id). The event_key column represents a primary key for each order. The encounter_key column represents the key for the encounter in which the order was made. There may be multiple medication orders for a given encounter.
eLABS
contains laboratory test results. The coding system is non-standard. Available codes, descriptive strings and classes of tests may be found in the metadataLABS tab. There are columns for the time of the test (obx_ts), the test code (entity_id), the result in string (result_str) and numerical (result_num) formats, the units of the result (units), and a flag (flag, may be blank, high, normal or low). The event_key column represents a primary key for each test result. The encounter_key column represents the key for the encounter in which the laboratory test was performed. There may be multiple laboratory test results for a given encounter.
eVITALS
contains vital signs. The coding system is non-standard. Available codes, descriptive strings and classes of vital signs may be found in the metadataVITALS tab. There are columns for the time of the observation (obx_ts), the vital sign code (entity_id), the result in string (result_str) and numerical (result_num) formats, the units (units) and a flag (flag, may be blank, high, normal or low). The event_key column represents a primary key for each vital sign observation. The encounter_key column represents the key for the encounter in which the vital sign was observed. There may be multiple vital sign observations for a given encounter.
Metadata Tabs
contain the coding systems and terminologies that are supported by Eureka! for representing the various data types. These tabs all have the same column structure. There is one row per code in the terminology. The entity_id column represents the code as represented in Eureka!. They follow a convention of coding_system_name:code. The description column contains the name of each code as it will appear in i2b2. The hierarchical_location column represents the coding system's hierarchy as it will appear in i2b2's query user interface.

The spreadsheets must have these tabs and column names, or the upload will fail and will not be loaded into i2b2. The cells in each column must have the same value category (e.g. Text, Number) as in the sample preadsheet, or the upload may fail. If codes are used that are not in the appropriate metadata tab, you will be shown a warning on the upload page but the upload will succeed except with the data with unknown codes omitted.

Uploading a Spreadsheet

Once your spreadsheet is created, upload the spreadsheet as described in Working with Phenotypes.