Anonymised Datasets Available On TRUST

Anonymised Real-world Datasets

Population datasets (e.g. socio-economic, birth and death data)

Clinical data (e.g. diagnosis data, medication data, laboratory results, healthcare financing data and radiology data)

Lifestyle (data from wearables)

Chronic disease screening data

Anonymised Strategic Research Datasets

Genomic data

Longitudinal population cohorts

Longitudinal disease cohorts

To browse TRUST data catalogue (with information on the data elements, formats, period etc.), researchers must first register to be a TRUST Member. There are data concierges and analytics support to assist users when using TRUST.

To gain access to the TRUST database:
1

Apply to be a TRUST member. Please check that your organisation has a Data Request Agreement with TRUST.

2

Access TRUST data catalogue in the members area.

3

Submit a request for the data from the data catalogue in the members area. The request will be reviewed by the TRUST Data Access Committee (DAC). Requestors must have the relevant IRB approval (or waiver) before submitting a request for data.

4

Receive a separate TRUST portal user account, which will allow access and analysis on the data requested upon approval by TRUST Data Access Committee (DAC).

5

To extract analytical outputs, the TRUST data concierge will vet the output according to what DAC had approved. All analytical output must be aggregated (e.g., visualisation, charts, etc).

6

Notify TRUST before any manuscript submission (including news, articles, publications, speeches) to journal publisher for verification.

TRUST Features

Operating Environment: Windows 10 Desktop.

On-demand Jupyter notebook interfaces and AWS S3 storage.

Provide a range of compute resources via notebook interface, including multicore virtual CPUs and Spark cluster.

Analytical Tools Available

Python, R, PySpark and SparkR. notebook kernels.

Hail Genomic data exploration and analysis framework (e.g. GWAS, gender prediction).

Python, R and Spark libraries for data science (e.g. Scikit-learn, Carat, pyspark. sql).

OMOP-ATLAS to enable UI based clinical data exploration.

Scale of Analyses That Can Be Supported

Analysis requiring average compute resource

e.g. Variant re-classification of hereditary diseases

  • Data: Longitudinal population cohort genomic and clinical data (e.g. diagnosis, lab tests)
  • Analysis: Data filtering, aggregation, statistical analyses, clustering
  • Analytical insights generated: Known and novel clinical association with population specific disease variants, distributions and visualisation
Analysis requiring higher compute resource

e.g. GWAS analysis

  • Data: Longitudinal population cohort genomic and traits data
  • Analysis: Running GWAS
  • Analytical insights generated: Associated genetic factors, significances and visualisation