This page is designed to help new users understand the GBrowse interface. The main interface is broken into several sections which we will discuss in more detail. Opossumbase GBrowse tutorial. Opossumbase GBrowse has eight top-level sections: Instructions - Basic search and navigation instructions.
|Published (Last):||2 June 2004|
|PDF File Size:||14.68 Mb|
|ePub File Size:||14.87 Mb|
|Price:||Free* [*Free Regsitration Required]|
GBrowse is a mature web-based genome browser that is suitable for deployment on both public and private web sites.
It supports most of genome browser features, including qualitative and quantitative wiggle tracks, track uploading, track sharing, interactive track configuration, semantic zooming and limited smooth track panning. As of version 2. GBrowse was among the first web-based genome browsers [ 1 ] and was the first to be widely used outside its site of origin. Originally developed for use with WormBase www. Support for next-generation sequencing NGS data was introduced in version 2.
NGS data can be uploaded directly to the browser, linked to via a URL or manually added to the server. Uploaded and linked sequencing data can be made public or shared selectively with collaborators. Version 2. Places where the read sequence differs from the reference sequence are highlighted.
GBrowse is intended for environments in which groups wish to display and share genome annotations in a format that can be accessed casually without preinstallation of desktop software. Hence, it is suitable for installation on public web sites, as well as the web sites of small-to-medium collaborations among several geographically separate groups.
It is particularly well suited to collaborative environments in which some annotation tracks are public while others are restricted to individuals or groups, as GBrowse provides a highly configurable track-level security model that is able to integrate with a variety of popular enterprise authentication systems gmod. These include Chado [ 2 ], a database schema for genomic data, several genome synteny browsers [ 3—5 ], the Galaxy workflow engine [ 6 ], the Apollo genome editor [ 7 ], the MAKER genome annotation pipeline [ 8 ] and the BioMart federated data mining engine [ 9 ].
GBrowse is well supported by a mailing list, a WIKI, a help desk and both physical and online tutorials. As of , major new features were not being added to GBrowse and development prioritized bug, performance and stability fixes. GBrowse is a web application that is divided between code that runs on the web server and on the web browser client.
The server side of GBrowse is written in Perl with a little C code thrown in to accelerate critical functions. The server manages a series of databases containing genome annotation information, receives requests from the web browser to view regions of interest and renders these regions as PNG, SVG or PDF images.
These VMs allow the user to bring up a starter genome in minutes and to start building on top of it immediately. This section describes the process of installing GBrowse, configuring a data source and loading NGS tracks. GBrowse will run on any recent Linux distribution and hardware.
These provide you with basic setups for the human, worm, fly and yeast genomes which you can then build on. The most hassle-free installation method is to run GBrowse in one of the prebuilt virtual machines.
This will provide you with full functionality and performance without making any modifications to your own system. Local installation requires you to have the VirtualBox machine virtualization software installed. To obtain it, go to www. During import you may receive a warning message about the VM not meeting compliance checks, but this may be safely ignored. Go to gmod. Download the file to your local disk. This will take you to a login window. This will take you to a desktop in which all administrative functions are enabled.
You may use secure shell ssh to log into the VM from the host machine using the IP address Remote ssh access from non-host machines is disabled by default, but you can enable it by configuring an ethernet bridge adaptor as described in Chapter 6 of the Virtual Box manual www. This gives you an Internet-connected server with essentially no set up required and considerable flexibility.
The downside is that you pay a fixed charge for every hour the server is running. However, the cost is not very much 8—12 cents per hour , and so this method is a great way to try the system out with little investment of time or effort, particularly if you are already an EC2 user. You will need to have an EC2 account, which you can set up in a few minutes by visiting aws.
You will need a ssh client to log into the GBrowse server. If your desktop runs Windows, then you will need to install a suitable ssh client. I recommend PuTTY www. Go to the GBrowse VMs page at gmod. The wizard will prompt you for a number of properties of the virtual machine to launch. The most important of these is the Instance Type, which controls the number and speed of CPUs and the amount of memory that the VM will have.
Faster instances cost more per hour. Later during the instance creation process, you will be asked to select the ssh keypair to use for logins; choose the one you created during registration. This will be the hostname you use for web access to GBrowse2 and ssh access to the server. This will bring you to a page that lists the starter genomes as well as pointers to the GBrowse tutorial and documentation. To log into the machine in order to administer GBrowse, you will use ssh.
Find the location of your public ssh keypair and log in like this:. This will take you to a command line prompt. This can be used to add the human hg19 genome build to the VirtualBox edition, which because of the size of the data, does not include a preinstalled version.
The command to use is. Where the first argument is the UCSC build name, and the second optional argument is a description to use for the database. This is recommended if you work frequently with non-UCSC data sources, such as the model organism databases or Ensembl, and is how the default databases on the Amazon and VirtualBox VMs were created.
The process for doing this is described in detail in the GBrowse online documentation at gmod. The rest of this article focuses on the process of installing NGS files. We will first discuss uploading. For this example, we use a small 5. Depending on your network speed, it will take about 20 s to upload and fully process this file. When the processing is finished, summary information about the upload will appear Figure 2.
Summary information about an uploaded SAM file. The summary information includes the name and description of the uploaded data, which can be edited by clicking on the respective fields, and information about the date and size of the upload. You may now click on the Browser menu item to return to the main genome browser view. This is a low coverage RNA sequencing experiment, and so you may have to zoom out a bit in order to see the data. This will display the gene icl-l as well as a histogram of coverage of the uploaded SAM file Figure 3.
Uploaded SAM file in histogram mode. To view this region in more detail, zoom in on it by clicking on the ruler at the top of the panel and dragging across the coverage region. Do this repeatedly until the histogram is replaced by the reads themselves. Mismatches and deletions relative to the reference genome are shown in red, while insertions are shown in green. Clicking on one of the reads brings up an information page which shows details about the read and a text representation of the alignment.
This will bring up a dialog that allows you to change colors, size, the presence or absence of read names and various other features. The advantage of this over SAM format is that processing will be quicker because the server does not have to convert it into BAM internally.
Uploaded files are inaccessible to other GBrowse users unless they are explicitly shared. However, the files are readable by anyone who can log into the virtual machine. They can get access to the track by clicking on this link in the received email. In this mode, you can share the track with a specific set of named collaborators. You may repeat this process multiple times to add additional collaborators. With a slight modification of the above recipe, you can upload a NGS file in a way that allows it to become listed as a public track.
The only difference is that you must log into GBrowse as the administrator before uploading the file s. Note that it is recommended you change the admin password before making the server public. The GBrowse VM page tells you how to do this. This feature allows the browser to fetch alignment data on as as-needed basis, allowing you to view the data right away.
For this to work, the alignment data must be in sorted BAM format, must have been indexed against the correct genome build and must be placed on a Web or FTP server at a location where the GBrowse server can reach it via the network. If you are using the VirtualBox VM, this means that the Web or FTP server may either be a public internet site, or may be located on your LAN including on the host machine that runs the virtual machine.
For this example, we are going to use an indexed BAM file from the genomes project, a high-coverage Illumina sequence from an anonymous individual, mapped onto chromosome 1 of the GRCh37 build of the reference genome. After the data are installed, the script will restart the web server. If you refresh your browser, you will find the database installed. If you are using the Amazon VM, then the human reference data have already been installed for you, and the proceeding step is not required.
With your web browser, navigate to GBrowse and select H. This will pull down a text box. Cut and paste the following URL into the box:. You may now view any region of the chromosome 1, although I suggest that you limit the region to less than kb to avoid network timeouts. This will show a coverage histogram across the gene. This will show the paired-end read alignment details Figure 4. Note that the paired-end read relationships are shown by default.
Opossumbase GBrowse tutorial
This is a quick tutorial to take you through the main features and gotchas of GBrowse. During most of the tutorial, we will be using the "in-memory" GBrowse database no relational database required! Later we will show how to set up a genome size database using the berkeleydb and MySQL adaptors. We will be working with simulated Volvox genome annotation data. The Basics We will be using a file-based database which allows GBrowse to run directly off text files. To prepare this database for use, find the GBrowse databases directory which was created in your Apache web server directory at the time of installation.
Using GBrowse 2.0 to visualize and share next-generation sequence data
This is an extensive tutorial to take you through the main features and gotchas of configuring GBrowse as a server. Later we will show how to set up a genome size database using the berkeleydb and MySQL adaptors. Important: This tutorial is designed for GBrowse 2. The Basics We will be working with simulated Volvox genome annotation data.
Generic Genome Browser: A Tutorial
Genomes Viewable in GBrowse