Working with Phenotypes

What are Phenotypes?

Clinical phenotypes are manifestations of disease, treatment and response as represented in the electronic health record (EHR). EHRs have a large number of data elements representing diagnoses, procedures, laboratory test results, medication orders, and more that together reflect what diagnoses and treatments a patient has had, and what the patient's treatment responses have been. The problem is, the medical record typically does not represent these phenotypes as distinct data elements. They must be inferred from combinations of data. Clinicians do this in their head as they browse and scan a medical record. Personnel using the medical record for secondary purposes such as reseach and quality improvement do this during the process of chart abstraction. The chart abstraction process' manual nature inherently limits the volume of EHR data that can be used in a research or quality analysis.

Eureka! aims in part to automate chart abstraction. In other words, it automatically computes the data elements defined for a research or quality improvement study from the data elements in the medical record. We call such study data elements derived data elements. The Phenotype Editor screens in Eureka! are where you can specify your study's data elements and how they should be computed, either from EHR data elements or other study data elements that you have already specified. Eureka's phenotype editor supports creating four types of user-defined data elements (Category, Sequence, Frequency, and Value Threshold), described below. Together, these may be combined to specify clinical phenotypes. Examples of combinations include a custom grouping or category of diagnosis codes and a frequency threshold on the number of code values from your custom category that appear in a patient's data, or scanning for a blood pressure result that is high and at least two hypertension diagnosis codes within 6 months. The Phenotype Editor screens guide you through creating these derived data elements. Having defined your derived data elements of interest, you can then compute those data elements in a dataset of interest, and load the data from the dataset and computed data values into i2b2, a database system with an easy to use user interface for querying and exporting data.

Editing Phenotypes

The Phenotype Editor screen may be accessed by clicking the Editor link at the top of the Eureka! user interface. This screen contains a list of the derived data elements that you have specified already, and it provides a Create New Element link to create new derived data elements. Icons to the left of each derived data element in your list provide for editing and deleting the adjacent element.

Select derived data element type

Clicking on Create New Element opens up a dropdown for selecting a data element type. There are four types:

Category
Category data elements allow specifying clinically significant groupings and hierarchies of data elements.
Sequence
Sequence data elements allow specifying two or more data elements that must occur in a specified temporal order.
Frequency
Frequency data elements allow specifying the number of times a data element must be present in or computed from a patient’s data.
Value threshold
Value threshold data elements allow specifying lower and/or upper limits on one or more numerical observation data elements such as laboratory test results or vital signs.

Name and Describe your Derived Data Element

After selecting the data element type, a form will open for specifying your derived data element that is specific to the type that you selected. All of these forms, however, start with fields for specifying a name and an optional description.

Select Elements from the Concept Explorer

All of these forms have a concept explorer on the left side of the screen for selecting concepts representing EHR data and any other derived data elements you have already created. These are found in the System and User Defined tabs, respectively. The System tab represents EHR data as a hierarchy of clinical concepts. The User Defined tab displays the derived data elements that you have previously specified as a list. Because you can define derived data elements in terms of other derived data elements, you can build up data elements representing fairly sophisticated interpretations of EHR data. For example, you can specify a frequency derived data element on a previously defined value threshold data element such as at least two blood pressure values over 130/80.

In general, you drop concepts from the concept explorer into one or more blue boxes on the right side of the form. Clicking the icon to the left of the dropped concept will remove the concept from the box. Underneath those blue boxes, you may specify additional constraints on the concept. You may specify a duration constraint (for example, 2 days to 1 month). You also may specify a particular property value, if the dropped concept has properties (for example, Encounter with type INPATIENT). Make sure to check the checkboxes next to the constraint fields that you intend to use. You may drop only one concept into these blue boxes unless otherwise specified below.

Category data element

Drag and drop one or more concepts from the concept explorer into the blue box to the right to specify the members of your category. You may drop in previously created categories from the User-defined tab to create category hierarchies. If you make a mistake, just click the icon to the left of a category member to remove it. Once created, you may drop a category into any blue box where one of the category's members is allowed.

Sequence data element

Drag and drop concepts from the concept explorer into the blue boxes for the primary data element (at the top of the form) and the first related data element. Computed sequence derived data values will have the start and finish time of the primary data element value from which it is computed. Optionally select duration and property constraints as described above. Then, specify the temporal relationship between these concepts (before or after) and whether the second concept must be a specified minimum and/or maximum time distance away from the first (for example, before by 2 days to 2 months). Click the Add to sequence link to specify additional temporal relationships between the primary and related data elements.

Frequency data element

Drag and drop a concept from the concept explorer into the blue box. Select the count of the data values represented by the dropped concept. Then, select whether you only are interested in the first n data values, or you are interested in any time at least n values occur. If you drop a value threshold into the blue box, a consecutive check box will appear. If checked, the count will apply only to consecutive values of the data element being thresholded. Optionally select duration and property constraints as described above. Finally, you may specify that the n data values must have a minimum and/or maximum time distance between consecutive values. The start and finish times of frequency derived data values match the temporal extent of the n data values from which they were computed.

Value threshold data element

Drag and drop a concept representing a data element that has a numerical observation from the the concept tree into the blue box labeled Drop Thresholded Data Element Here. Specify upper and/or lower thresholds on the value of that data element in the provided form fields. Click the Add threshold link to specify thresholds on additional data elements. If you specify more than one threshold, use the Value thresholds selector at the top of the form to specify whether your derived data element should be computed if any of the specified thresholds is found or only if all of them are found. In some situations, you may want a threshold to apply only in one or more clinical contexts such as patients with a particular diagnosis. You may specify a context by dragging one or more data elements from the concept explorer into the Drop Contextual Data Element Here boxes. You may specify that these contextual data elements be within a certain time distance before or after the data element values being thresholded. Computed value threshold derived data values have the same timestamp as the data from which they were computed.

Review and Save

After saving, your specified derived data element will appear in your derived data element list on the main editor screen and in the concept explorer. It also will be computed for any subsequent data processing job that you submit.

Phenotyping a Dataset

Phenotyping a dataset involves selecting a data source, selecting a data destination, optionally selecting a date range, and clicking the Start button. Time required to complete phenotyping job depends on the size of the source data, the selected date range, and the number of derived data elements that you have defined. To phenotype dataset, navigate to the Submit Job screen, accessed by clicking the Submit Job link at the top of the Eureka! user interface.

Select a Data Source

Your Eureka! deployment's administrator is responsible for configuring your user account to access the data sources you need. Data sources may be an Excel spreadsheet (see Spreadsheet Data Upload or a relational database system that Eureka! has been configured by your administrator to access. When you select a data source from the dropdown list, form fields may appear for you to complete, such as the name of a spreadsheet file on your local computer. Required form fields are marked by an asterisk.

Select a Data Destination

Data destinations are where your data and computed derived data element values will be stored. Again, your Eureka! deployment's administrator is responsible for configuring your user account to have access to the data destinations you need. Currently, the only data destination type supported is an i2b2 project (see below).

Specify a Date Range

Specifying a date range is optional. If you do not specify one, the entire dataset from the selected data source will be loaded into the selected data destination. The same concept browser as in the phenotype editing screens appears in the job upload screen for you to select a data element to which to apply a date range. Drag the selected element into the blue box. Then, specify your date range of interest using the form fields.

Start a Phenotyping Job

Click Start to begin a job. The job's status will be updated at the top of the screen. You may leave this screen or even logout of Eureka!, and the job will continue running in the background. When the job is complete, you may click the I2b2 link at the top of the screen to go to i2b2 to see your phenotyped data. The Eureka! demonstration website synchronizes account information between Eureka! and i2b2, so you can use your Eureka! username and password to sign in to i2b2. This may not be the case with a local deployment of Eureka!, so check with your support personnel.

Browse your Data and Phenotypes in i2b2

I2b2 is a database system with an easy to user web interface for selecting patients meeting specified criteria, and downloading and visualizing data from patients who meet those criteria. Its user interface, like Eureka!'s phenotype editing screens, has a concept tree, and concepts are dragged and dropped into boxes to construct queries. I2b2's query tool lacks many of the temporal pattern features of Eureka! -- it expects complex derived variables and phenotypes to have been computed already prior to or during data loading. When you load data into i2b2 using Eureka!, any derived data elements that you have defined in Eureka! will appear in the i2b2 concept tree in theUser-defined Derived Variables folder. You may drag and drop them into i2b2 queries just like you would any other concepts in the tree.