AIW Project Description


The AIW is the umbrella project for Eureka! and other software systems for performing analytics with electronic health record data. The AIW provides an architecture and implementation for integrating clinical data warehouses with clinical, administrative and research datasets. It enables complex clinical data to be loaded into widely available tools for statistical analysis, analytics and query. The AIW works with existing relational databases. It provides for mapping data to standard or locally developed terminologies and ontologies. It can compute derived variables representing clinical phenotypes. These may include categories, classifications, and temporal patterns. AIW can load data and derived variables into Excel and SAS, or into an instance of i2b2. AIW has been applied in quality improvement efforts at Emory Healthcare related to reducing rates of hospital readmissions. It also has been applied in the research sphere to compute phenotypes of interest in cardiovascular and cancer research.

Background and Aims

Changes in the United States’ healthcare delivery model have led to renewed interest in data-driven methods for managing quality using Electronic Health Record (EHR) systems. Hospitals are evaluated using metrics for mortality, length of stay, hospital readmissions within 30 days and others. Hospitals need to track and understand their performance.

EHR data presents substantial challenges in leveraging it for quality improvement. Comorbidities and clinical conditions that may be associated with metric scores are often represented indirectly as billing codes and patterns in clinical observations and events.


  • Develop methods for representing and computing the existence of clinical phenotypes as patterns in EHR data.
  • Develop a data mining pipeline for rapid identification of associations between outcomes of interest and phenotypes.
  • Develop predictive models of outcomes of interest that leverage clinical phenotyping.

Clinical research projects similarly could leverage phenotyping in recruitment and data analysis.

Clinical Phenotypes

We define clinical phenotypes in a temporal abstraction ontology that supports specifying categories of codes and clinical events, classifications of numerical values, and frequency and sequential patterns in clinical data.

Clinical phenotypes
A phenotype, Chemotherapy within 180 days before Surgery, specified in the Protege ontology editor

System Architecture

The AIW supports querying clinical and administrative data through a semantic layer that generates queries of source systems, transforms source data into a common data model, computes phenotypes specified as above, and supports loading or exporting data and phenotypes into existing data analysis and query systems. A processing pipeline supports leveraging data and phenotypes in data mining tools.

AIW architecture
AIW architecture

Extract, Compute Phenotypes, and Load

A front-end web interface to AIW, Eureka! Clinical Analytics, manages the phenotyping process:

  • The AIW can extract data from databases or from an Excel spreadsheet.
  • AIW computes phenotypes that are specified in the temporal abstraction ontology.
  • AIW loads data and phenotypes into an i2b2 project or exports data into a delimited file.
  • Delimited file output is flexible and supports specifying columns for aggregations and basic statistics (e.g., counts, mean, max, min).
AIW data processing
AIW data processing

Prepare Data and Phenotypes for Data Mining Tools

A Python-based pipeline implements transforms for data mining scientists to prepare AIW-generated delimited files for loading into data mining tools (SAS, R, Orange, etc.):

  • Extract columns: removes columns from dataset.
  • Code to category: replaces codes with categories or adds categories from coding hierarchies (e.g., replace medication name with medication class, replace specific diagnosis with more general diagnosis).
  • Generate statistics: calculates frequencies for distinct column values.
  • Expand columns: pivots one or more columns and appends to dataset.
  • Truncate headers: creates unique but shorter column headers for compatibility with SAS.
  • Replace nulls: replaces null values with another representation for compatibility with R, SAS etc.
AIW post-processing
AIW post-processing