Biological data warehouse

InterMine is an open source data warehouse built specifically for the integration and analysis of complex biological data. Developed by the Micklem lab at the University of Cambridge, InterMine enables the creation of biological databases accessed by sophisticated web query tools. Parsers are provided for integrating data from many common biological data sources and formats, and there is a framework for adding your own data. InterMine includes an attractive, user-friendly web interface that works 'out of the box' and can be easily customised for your specific needs, as well as a powerful, scriptable web-service API to allow programmatic access to your data.


    Complex data integration

    InterMine was developed with the complexity of biological data in mind. The data model is flexible and extensible, and a range of data parsers is provided to facilitate the data loading. A sophisticated identifier resolution system updates all identifiers to the most current version using a priority system, and multiple post-processing checks ensure the consistency of the data integration.

    Fast and flexible querying

    Complex queries can be constructed flexibly to mine across the integrated datasets, enabling researchers to answer sophisticated biological questions. The query optimisation method is constructed around the use of precomputed tables, meaning that the data schema does not need to be denormalized to optimise query speed. A user's query workflow can also be automated using InterMine web services.

    Existing mines

    A number of different data warehouses powered by InterMine already exist. These include:

    Getting Started

    by Julie

    Setting up your own InterMine instance is easy. Check out our step-by-step guide to creating your own Mine, loading and processing datasets, and using the web app to query & export the data. You can also try out an existing InterMine instance on the Amazon cloud.

    Using Web Services

    by Alex

    The InterMine web services expose our complete API, and can be used for automating workflows, as well as enabling the development of external applications. While any language that can parse HTTP can be used, we provide client libraries in 5 programming languages - Python, Perl, Java, JavaScript and Ruby.