5 - Databases to Support OMOP Analysis

Task: Creation and validation of databases to support OMOP analyses

    a. For certain specific analyses of different outcomes of interest, access to the an entire database will not be necessary. In these instances, OMOP will fund creation of datasets that conform to the common data model from multiple data owners, chosen from among a subset of those characterized above.

    These databases will typically be a subset of the entire data holder source database. Typically, the database will include: a) a stratified random sample of the entire population, b) all individuals exposed to drugs of interest, and c) all individuals with outcomes of interest. This construction obviates the need to work with the entire databases, since most data isn’t needed to support the aims of the particular experiment.

    In other cases, however—such as hypothesis-generating (data mining) experiments—access to the entire database is desirable, and will be pursued by OMOP.

    At the discretion of the data owner, OMOP may either take possession of the datasets and perform analyses itself, OR Data owners can maintain possession of their data, if they agree to timely execution of programs provided by OMOP investigators. These programs will return HIPAA compliant data in a form which protects patient privacy and which conforms to the data privacy policies of the data owners. Data owners can inspect the output to ensure that it contains no other information. If the data holder retains possession and performs analyses, OMOP will fund the data holder’s storage of the data and execution of the required research. Data owners who retain possession must allow on-site audits of the data sets if OMOP requests this.

    b. Creation of a testing environment
    OMOP will fund creation of a testing environment for investigators and analysts to use when developing data checking and analytic programs to run against the common data model. The testing environment will contain a database consisting of a few thousand records of synthetic data adhering to the common data model that is made available to authorized investigators. The database will not contain any identifiable or linkable information.