OMIC Data Service

 

Introduction

In conjunction with the purchase of Array Suite (Array Studio and Array Server), Omicsoft offers an OMIC data service. This service comprises of three main components:

Affymetrix GEO Data service

This service currently includes over 6000 fully analyzed public domain datasets from the Gene Expression Omnibus (GEO). For each GEO Series, Omicsoft's statisticians and biologists have:
  • Parsed the sample annotation for the study. Omicsoft tries to use standardized language, but mainly cleans up the sample annotation uploaded by the original biologist
  • Performed both a PCA and Correlation QC. Omicsoft has not removed any samples from a GEO series, and leaves this to the user's discretion
  • Performed a statistical analysis based on the design of the experiment. The design of the experiment is determined by a combination of manually curating the sample annotation, as well as reading abstracts and the original papers
  • Performed RMA normalization, scaled to a value of 500 for each project (normalization type can be customized for an additional cost.
  • Created default visualizations based on the sample annotation and design of experiment

 

As a result of the processing discussed above, the deliverable is the 6000+ GSE series projects, in Omicsoft's format. Each GSE series is identified as an Omicsoft project, and is uploaded to the user's company/university-hosted Array Server. Each customer can, at their discretion, further modify the projects as needed

 

Oncology OMIC Data Service

Omicsoft's Oncologogy OMIC Data Service provides access, for the user's installation of Array Suite, to the following OMIC-based Oncology datasets (excludes NGS datasets):
  • TCGA Expression, Methylation, CNV/CGH, and miRNA Datasets
  • COSMIC dataset
  • GSK Cell Line Project dataset

Each TCGA dataset is processed using Level 3 data, while the COSMIC dataset and GSK Cell line project datasets were processed using the raw data. Results are provided in Omicsoft's .osprj format, and fully indexed and searchable via Array Suite.

 

Omicsoft also provides interesting correlation "duplex" datasets. This includes comparison of Copy Number and Expression, as well as comparison of Methylation and Expression

 

Access to the clinical data from TCGA is provided, given that the customer has access to the controlled data for TCGA

 

Omicsoft's OMIC data services are updated every 6 months, or sooner as needed.

 

Customized Data Service

Omicsoft can deliver customized OMIC data services for projects not covered by either of the two sections above. This is done on a per-customer basis, and allows for modification of pipelines (given an additional cost), as well as expands to other platforms not covered above. In the past, Omicsoft has delivered thousands of OMIC datasets via its customization pipeline. Users can manually choose projects from a variety of public domain sources, with the data being delivered with the same methods described above

 

For more information on cost and requirements, contact sales@omicsoft.com