Frequently Asked Questions: Assembly Releases and Versions
|
|
|
List of UCSC genome releases |
|
|
|
|
Question:
"How do UCSC's release numbers correspond to those of other organizations, such as NCBI?"
Response:
SPECIES |
UCSC VERSION |
RELEASE DATE |
RELEASE NAME |
STATUS |
VERTEBRATES | | | | |
Human | hg19 | Feb. 2009 | Genome Reference Consortium GRCh37 | Available |
| hg18 | Mar. 2006 | NCBI Build 36.1 | Available |
| hg17 | May 2004 | NCBI Build 35 | Available |
| hg16 | Jul. 2003 | NCBI Build 34 | Available |
| hg15 | Apr. 2003 | NCBI Build 33 | Archived |
| hg13 | Nov. 2002 | NCBI Build 31 | Archived |
| hg12 | Jun. 2002 | NCBI Build 30 | Archived |
| hg11 | Apr. 2002 | NCBI Build 29 | Archived |
| hg10 | Dec. 2001 | NCBI Build 28 | Archived |
| hg8 | Aug. 2001 | UCSC-assembled | Archived |
| hg7 | Apr. 2001 | UCSC-assembled | Archived |
| hg6 | Dec. 2000 | UCSC-assembled | Archived |
| hg5 | Oct. 2000 | UCSC-assembled | Archived |
| hg4 | Sep. 2000 | UCSC-assembled | Archived |
| hg3 | Jul. 2000 | UCSC-assembled | Archived |
| hg2 | Jun. 2000 | UCSC-assembled | Archived (data set only) |
| hg1 | May 2000 | UCSC-assembled | Archived (data set only) |
Cat | felCat4 | Dec. 2008 | NHGRI catChrV17e | Available |
| felCat3 | Mar. 2006 | Broad Institute Release 3 | Available |
Chicken | galGal3 | May 2006 | WUSTL Gallus-gallus-2.1 | Available |
| galGal2 | Feb. 2004 | WUSTL Gallus-gallus-1.0 | Available |
Chimp | panTro3 | Oct. 2010 | CGSC Build 2.1.3 | Available |
| panTro2 | Mar. 2006 | CGSC Build 2.1 | Available |
| panTro1 | Nov. 2003 | CGSC Build 1.1 | Available |
Cow | bosTau6 | Nov.
2009 | University of Maryland v3.1 | Available |
| bosTau4 | Oct. 2007 | Baylor College of Medicine HGSC Btau_4.0 | Available |
| bosTau3 | Aug. 2006 | Baylor College of Medicine HGSC Btau_3.1 | Available |
| bosTau2 | Mar. 2005 | Baylor College of Medicine HGSC Btau_2.0 | Available |
| bosTau1 | Sep. 2004 | Baylor College of Medicine HGSC Btau_1.0 | Archived |
Dog | canFam2 | May 2005 | Broad Institute v2.0 | Available |
| canFam1 | Jul. 2004 | Broad Institute v1.0 | Available |
Elephant | loxAfr3 | Jul. 2009 | Broad loxAfr3 | Available |
Fugu | fr3 | Oct. 2011 | JGI v5.0 | Available |
| fr2 | Oct. 2004 | JGI v4.0 | Available |
| fr1 | Aug. 2002 | JGI v3.0 | Available |
Gibbon | nomLeu1 | Jan. 2010 | Gibbon Genome Sequencing Consortium Nleu1.0 | Available |
Gorilla | gorGor3 | May 2011 | Wellcome Trust Sanger Institute gorGor3.1 | Available |
Guinea pig | cavPor3 | Feb. 2008 | Broad cavPor3 | Available |
Horse | equCab2 | Sep. 2007 | Broad EquCab2 | Available |
| equCab1 | Jan. 2007 | Broad EquCab1 | Available |
Lamprey | petMar1 | Mar. 2007 | WUSTL v3.0 | Available |
Lizard | anoCar2 | May 2010 | Broad AnoCar2 | Available |
| anoCar1 | Feb. 2007 | Broad AnoCar1 | Available |
Marmoset | calJac3 | Mar. 2009 | WUSTL Callithrix_jacchus-v3.2 | Available |
| calJac1 | Jun. 2007 | WUSTL Callithrix_jacchus-v2.0.2 | Available |
Medaka | oryLat2 | Oct. 2005 | NIG v1.0 | Available |
Microbat | myoLuc2 | Jul. 2010 | Broad myoLuc2.0 | Available |
Mouse | mm10 | Dec. 2011 | Genome Reference Consortium GRCm38 | Available |
| mm9 | Jul. 2007 | NCBI Build 37 | Available |
| mm8 | Feb. 2006 | NCBI Build 36 | Available |
| mm7 | Aug. 2005 | NCBI Build 35 | Available |
| mm6 | Mar. 2005 | NCBI Build 34 | Archived |
| mm5 | May 2004 | NCBI Build 33 | Archived |
| mm4 | Oct. 2003 | NCBI Build 32 | Archived |
| mm3 | Feb. 2003 | NCBI Build 30 | Archived |
| mm2 | Feb. 2002 | MGSCv3 | Archived |
| mm1 | Nov. 2001 | MGSCv2 | Archived |
Naked mole-rat | hetGla1 | Jul. 2011 | Beijing Genomics Institute HetGla_1.0 | Available |
Opossum | monDom5 | Oct. 2006 | Broad Institute release MonDom5 | Available |
| monDom4 | Jan. 2006 | Broad Institute release MonDom4 | Available |
| monDom1 | Oct. 2004 | Broad Institute release MonDom1 | Available |
Orangutan | ponAbe2 | Jul. 2007 | WUSTL Pongo_albelii-2.0.2 | Available |
Panda | ailMel1 | Dec. 2009 | BGI-Shenzhen AilMel 1.0 | Available |
Pig | susScr2 | Nov. 2009 | SGSC Sscrofa9.2 | Available |
Platypus | ornAna1 | Mar. 2007 | WUSTL v5.0.1 | Available |
Rabbit | oryCun2 | Apr. 2009 | Broad Institute release oryCun2 | Available |
Rat | rn4 | Nov. 2004 | Baylor College of Medicine HGSC v3.4 | Available |
| rn3 | Jun. 2003 | Baylor College of Medicine HGSC v3.1 | Available |
| rn2 | Jan. 2003 | Baylor College of Medicine HGSC v2.1 | Archived |
| rn1 | Nov. 2002 | Baylor College of Medicine HGSC v1.0 | Archived |
Rhesus | rheMac2 | Jan. 2006 | Baylor College of Medicine HGSC v1.0 Mmul_051212 | Available |
| rheMac1 | Jan. 2005 | Baylor College of Medicine HGSC Mmul_0.1 | Archived |
Sheep | oviAri1 | Feb. 2010 | ISGC Ovis aries 1.0 | Available |
Stickleback | gasAcu1 | Feb. 2006 | Broad Release 1.0 | Available |
Tetraodon | tetNig2 | Mar. 2007 | Genoscope v7 | Available |
| tetNig1 | Feb. 2004 | Genoscope v7 | Available |
Turkey | melGal1 | Dec.
2009 | Turkey Genome Consortium v2.01 | Available |
Wallaby | macEug2 | Sep. 2009 | Tammar Wallaby Genome Sequencing Consortium Meug_1.1 | Available |
X. tropicalis | xenTro3 | Nov. 2009 | JGI v.4.2 | Available |
| xenTro2 | Aug. 2005 | JGI v.4.1 | Available |
| xenTro1 | Oct. 2004 | JGI v.3.0 | Available |
Zebra finch | taeGut1 | Jul. 2008 | WUSTL v3.2.4 | Available |
Zebrafish | danRer7 | Jul. 2010 | Sanger Institute Zv9 | Available |
| danRer6 | Dec. 2008 | Sanger Institute Zv8 | Available |
| danRer5 | Jul. 2007 | Sanger Institute Zv7 | Available |
| danRer4 | Mar. 2006 | Sanger Institute Zv6 | Available |
| danRer3 | May 2005 | Sanger Institute Zv5 | Available |
| danRer2 | Jun. 2004 | Sanger Institute Zv4 | Archived |
| danRer1 | Nov. 2003 | Sanger Institute Zv3 | Archived |
| | | | |
DEUTEROSTOMES | | | | |
C. intestinalis | ci2 | Mar. 2005 | JGI v2.0 | Available |
| ci1 | Dec. 2002 | JGI v1.0 | Available |
Lancelet | braFlo1 | Mar. 2006 | JGI v1.0 | Available |
S. purpuratus | strPur2 | Sep. 2006 | Baylor College of Medicine HGSC v. Spur 2.1 | Available |
| strPur1 | Apr. 2005 | Baylor College of Medicine HGSC v. Spur_0.5 | Available |
| | | | |
INSECTS | | | | |
A. mellifera | apiMel2 | Jan. 2005 | Baylor College of Medicine HGSC v.Amel_2.0 | Available |
| apiMel1 | Jul. 2004 | Baylor College of Medicine HGSC v.Amel_1.2 | Available |
A. gambiae | anoGam1 | Feb. 2003 | IAGP v.MOZ2 | Available |
D. ananassae | droAna2 | Aug. 2005 | Agencourt Arachne release | Available |
| droAna1 | Jul. 2004 | TIGR Celera release | Available |
D. erecta | droEre1 | Aug. 2005 | Agencourt Arachne release | Available |
D. grimshawi | droGri1 | Aug. 2005 | Agencourt Arachne release | Available |
D. melanogaster | dm3 | Apr. 2006 | BDGP Release 5 | Available |
D. melanogaster | dm2 | Apr. 2004 | BDGP Release 4 | Available |
| dm1 | Jan. 2003 | BDGP Release 3 | Available |
D. mojavensis | droMoj2 | Aug. 2005 | Agencourt Arachne release | Available |
| droMoj1 | Aug. 2004 | Agencourt Arachne release | Available |
D. persimilis | droPer1 | Oct. 2005 | Broad Institute release | Available |
D. pseudoobscura | dp3 | Nov. 2004 | Flybase Release 1.0 | Available |
| dp2 | Aug. 2003 | Baylor College of Medicine HGSC Freeze 1 | Available |
D. sechellia | droSec1 | Oct. 2005 | Broad Release 1.0 | Available |
D. simulans | droSim1 | Apr. 2005 | WUSTL Release 1.0 | Available |
D. virilis | droVir2 | Aug. 2005 | Agencourt Arachne release | Available |
| droVir1 | Jul. 2004 | Agencourt Arachne release | Available |
D. yakuba | droYak2 | Nov. 2005 | WUSTL Release 2.0 | Available |
| droYak1 | Apr. 2004 | WUSTL Release 1.0 | Available |
| | | | |
NEMATODES | | | | |
C. brenneri | caePb2 | Feb. 2008 | WUSTL 6.0.1 | Available |
| caePb1 | Jan. 2007 | WUSTL 4.0 | Available |
C. briggsae | cb3 | Jan. 2007 | WUSTL Cb3 | Available |
| cb1 | Jul. 2002 | WormBase v. cb25.agp8 | Available |
C. elegans | ce6 | May 2008 | WormBase v. WS190 | Available |
| ce4 | Jan. 2007 | WormBase v. WS170 | Available |
| ce2 | Mar. 2004 | WormBase v. WS120 | Available |
| ce1 | May 2003 | WormBase v. WS100 | Archived |
C. japonica | caeJap1 | Mar. 2008 | WUSTL 3.0.2 | Available |
C. remanei | caeRem3 | May 2007 | WUSTL 15.0.1 | Available |
| caeRem2 | Mar. 2006 | WUSTL 1.0 | Available |
P. pacificus | priPac1 | Feb. 2007 | WUSTL 5.0 | Available |
| | | | |
OTHER | | | | |
Sea Hare | aplCal1 | Sep. 2008 | Broad Release Aplcal2.0 | Available |
Yeast | sacCer3 | April 2011 | SGD April 2011 sequence | Available |
| sacCer2 | June 2008 | SGD June 2008 sequence | Available |
| sacCer1 | Oct. 2003 | SGD 1 Oct 2003 sequence | Available |
| |
|
|
Initial assembly release dates |
|
|
|
|
Question:
"When will the next assembly be out?"
Response:
UCSC does not produce its own genome assemblies, but instead obtains them from
standard sources. For example, the human assembly is obtained
from NCBI. Because of this, you can expect us to release a new version of a
genome soon after the assembling organization has released the version.
A new assembly release initially consists of the genome sequence and a
small set of aligned annotation tracks. Additional annotation tracks are added
as they are obtained or generated. Bulk downloads of the data are typically
available in the first week after the assembly is released in the browser.
| |
|
|
Data sources - UCSC assemblies |
|
|
|
|
Question:
"Where does UCSC obtain the assembly and annotation data
displayed in the Genome Browser?"
Response:
All the assembly data displayed in the UCSC Genome
Browser are obtained from external sequencing centers.
To determine the data source and version for a given
assembly, see the assembly's description on the Genome
Browser Gateway page or
the List of UCSC Genome Releases.
The annotations accompanying an assembly are obtained
from a variety of sources. The UCSC Genome Bioinformatics
Group generates several of the tracks; the remainder are
contributed by collaborators at other sites. Each track
has an associated description page that credits the
authors of the annotation.
For detailed information about the individuals and
organizations who contributed to a specific assembly,
see the Credits
page.
| |
|
|
Comparison of UCSC and NCBI human assemblies |
|
|
|
|
Question:
"How do the human assemblies displayed in the UCSC
Genome Browser differ from the NCBI human assemblies?
Response:
Recent human assemblies displayed in the Genome
Browser (hg10 and higher) are identical to the NCBI
assemblies.
| |
|
|
Differences between UCSC and NCBI mouse assemblies |
|
|
|
|
Question:
"Is the mouse genome assembly displayed in the UCSC
Genome Browser the same as the one on the NCBI website?"
Response:
The mouse genome assemblies featured in the UCSC
Genome Browser are the same as
those on the NCBI web site with one difference: the UCSC
versions contain
only the reference strain data (C57BL/6J). NCBI provides
data for several additional strains in their builds.
| |
|
|
Accessing older assembly versions |
|
|
|
|
Question:
"I need to access an older version of a genome assembly that's no
longer listed in the Genome Browser menu. What should I do?"
Response:
In addition to the assembly versions currently available in the Genome Browser,
you can access older versions of the browser through our archives. To view an
older version, click the Archives link on the Genome Browser home page.
| |
|
|
Frequency of GenBank data updates |
|
|
|
|
Question:
"How frequently does UCSC update its databases with new
data from GenBank?"
Response:
Daily and weekly incremental updates of mRNA, RefSeq,
and EST data are in place for several of the
more recent Genome Browser assemblies.
Assemblies that are not on an incremental update
schedule are updated whenever we load a new assembly or
make a major revision to a table.
Data are updated on the following schedule:
-
Native and xeno mRNA and refSeq tracks: updated
daily
-
EST data: updated weekly on Saturday morning
-
Downloadable data files: updated weekly on Saturday
morning
-
Outdated sequences - removed once per quarter
Mirror sites are not required to use an
incremental update process, and should not experience
problems as a result of these updates.
| |
|
|
Coordinate changes between assemblies |
|
|
|
|
Question:
"I noticed that the chromosomal coordinates for a particular gene that I'm
looking at have changed since the last time I used your browser. What happened?"
Response:
A common source of confusion for users arises from
mixing up different assemblies. It is very
important to be aware of which assembly you are looking at. Within the Genome
Browser display, assemblies are labeled by organism and date. To look up the
corresponding UCSC database name or NCBI build number, use the
release table.
UCSC database labels are of the form hgn,
panTron, etc. The letters designate the organism,
e.g. hg for human genome or panTro for
Pan troglodytes. The number denotes the UCSC
assembly version for that organism. For example, ce1
refers to the first UCSC assembly of the
C. elegans genome.
The coordinates of your favorite gene in one assembly may
not be the same as those in the next release of the
assembly unless the gene happens to lie on a completely
sequenced and unrevised chromosome. For information on
integrating data from one assembly into another, see the
Converting positions between
assembly versions section.
| |
|
|
Converting positions between assembly versions |
|
|
|
|
Question:
"I've been researching a specific area of the human genome
on the current assembly, and now you've just released a
new version. Is there an easy way to locate
my area of interest on the new assembly?"
Response:
See the section on converting coordinates for information on assembly migration tools.
| |
|
|
Missing annotation tracks |
|
|
|
|
Question:
"Why is my favorite annotation track missing from your latest release?"
Response:
The initial release of a new genome assembly typically contains a small subset
of core annotation tracks. New tracks are added as they are generated. In many
cases, our annotation tracks are contributed by scientists not affiliated with
UCSC who must first obtain the sequence, repeatmasked data, etc. before they
can produce their tracks. If you have need of an annotation that has not appeared
on an assembly within a month or so of its release, feel free to send an inquiry
to genome@soe.ucsc.edu.
Messages sent to this address will be posted to the
moderated genome mailing list, which is archived on a public
Web-accessible pipermail archive. This archive may be
indexed by non-UCSC sites such as Google.
| |
|
|
What next with the human genome? |
|
|
|
|
Question:
"Now that the human genome is "finished", will there be any more releases?"
Response:
Rest assured that work will continue. There will be updates to the assembly over the next
several years. This has been the case for all other finished (i.e. essentially complete) genome
assemblies as gaps are closed. For example, the C. elegans genome has been
"finished" for several years, but small bits of sequence are still being
added and corrections are being made. NCBI will continue to coordinate the human
genome assemblies in collaboration with the individual chromosome coordinators, and
UCSC will continue to QC the assembly in conjunction with NCBI (and, to a lesser extent,
Ensembl). UCSC, NCBI, Ensembl, and others will display the new releases on their
sites as they become available.
| |
|
|
Mouse strain used for mouse genome sequence |
|
|
|
|
Question:
"What strain of mouse was used for the Mus musculus genome?"
Response:
C57BL/6J.
| |
|
|
UniProt (Swiss-Prot/TrEMBL) display changes |
|
|
|
|
Question:
"What has UCSC done to accommodate the changes to
display IDs recently introduced by UniProt (aka
Swiss-Prot/TrEMBL)?"
Response:
Here is a detailed description of the database changes
we have made to accommodate the UniProt changes. If
you are using the proteinID field in our
knownGene
table or the Swiss-Prot/TrEMBL display ID for indexing
or cross-referencing other data, we strongly suggest
you transition to the UniProt accession number.
These changes will also affect anyone who is
mirroring our site.
-
The latest UniProt Knowledgebase (Release 46.0,
Feb. 1st, 2005) was parsed and the results were
stored in a newly created database sp050201.
-
A corresponding database, proteins050201,
was constructed based on data in sp050201
and other protein data sources.
-
Two new symbolic database pointers, uniProt
and proteome, have been created to point to
the two new databases mentioned above. Some parts of
our programs use the data in these two
DBs.
uniProt ---> sp050201
proteome ---> proteins050201
-
The existing protein symbolic database pointers,
swissProt and proteins remain
unchanged. Some parts of our programs still use these
two pointers and the data in their associated protein
databases.
swissProt ---> sp041115
proteins ---> proteins041115
-
Two new tables, spOldNew and
uniProtAlias, have been added to the proteome
database.
The spOldNew table contains three columns:
- acc -- primary accession number
- oldDisplayId -- old display ID
- newDisplayId -- new display ID
The uniProtAlias table contains four columns:
- acc -- UniProt accession number
- alias -- alias (could be acc, old and new
display IDs, etc.)
- aliasSrc -- source of the alias type
- aliasSrcDate -- date of the source data
The aliases include primary accessions, secondary
accessions new display IDs, old display IDs, and old
display IDs corresponding to new secondary accessions.
-
Three new functions have been added to
kent/src/hg/spDb.c:
char *oldSpDisplayId(char *newSpDisplayId);
/* Convert from new Swiss-Prot display ID to old display ID */
char *newSpDisplayId(char *oldSpDisplayId);
/* Convert from old Swiss-Prot display ID to new display ID */
char *uniProtFindPrimAcc(char *id);
/* Return primary accession given an alias. */
The uniProtFindPrimAcc() function is enabled
by the new uniProtAlias table.
We anticipate additional changes down the road and
may eventually merge the two sets of protein DB
pointers into one set.
Currently, the proteinID field of the
knownGene table
for existing genome releases (hg15, hg16,
hg17, mm3, mm4, mm5, rn2, and rn3) uses old
Swiss-Prot/TrEMBL display IDs (pre-1 Feb. '05). In
the future, we may change this field to show the
UniProt accession number. Should we choose not to
change the content of the proteinID field,
we may consider adding a new field,
uniProtAcc.
If you have any questions about these changes and
their impact on your work, please email us at
genome@soe.ucsc.edu.
Mirror sites may send questions to
genome-mirror@soe.ucsc.edu.
Messages sent to these addresses will be posted to the
moderated mailing lists, which are archived on a public
Web-accessible pipermail archive. This archive may be
indexed by non-UCSC sites such as Google.
| |
|
|
| |