Upozornenie: Prezeranie týchto stránok je určené len pre návštevníkov nad 18 rokov!
Zásady ochrany osobných údajov.
Používaním tohto webu súhlasíte s uchovávaním cookies, ktoré slúžia na poskytovanie služieb, nastavenie reklám a analýzu návštevnosti. OK, súhlasím









A | B | C | D | E | F | G | H | CH | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

WormBase
 

WormBase
Content
DescriptionWormBase: a comprehensive resource for nematode research.
OrganismsCaenorhabditis elegans
Contact
Primary citationPMID 19910365
Access
Websitewww.wormbase.org

WormBase is an online biological database about the biology and genome of the nematode model organism Caenorhabditis elegans and contains information about other related nematodes.[1][2] WormBase is used by the C. elegans research community both as an information resource and as a place to publish and distribute their results. The database is regularly updated with new versions being released every two months. WormBase is one of the organizations participating in the Generic Model Organism Database (GMOD) project.

Contents

WormBase comprises the following main data sets:

In addition, WormBase contains an up-to-date searchable bibliography of C. elegans research and is linked to the WormBook project.

Tools

WormBase offers many ways of searching and retrieving data from the database:

  • WormMart, Wiki - was[3] a tool for retrieving varied information on many genes (or the sequences of those genes). This was the WormBase implementation of BioMart.[4]
  • WormMine, Wiki - as of 2016,[3] the primary data mining facility. This is the WormBase implementation of InterMine.[5]
  • Genome Browser - browse the genes of C. elegans (and other species) in their genomic context
  • Textpresso - a search tool that queries published C. elegans literature (including meeting abstracts) and a subset of nematode literature.

Sequence curation

Sequence curation at WormBase refers to the maintenance and annotation of the primary genomic sequence and a consensus gene set.

Genome sequence

Even though the C. elegans genome sequence is the most accurate and complete eukaryotic genome sequence, it has continually needed refinement as new evidence has been created. Many of these changes were single nucleotide insertions or deletions, however several large mis-assemblies have been uncovered. For example, in 2005 a 39 kb cosmid had to be inverted. Other improvements have come from comparing genomic DNA to cDNA sequences and analysis of RNASeq high-throughput data. When differences between the genomic sequence and transcripts are identified, re-analysis of the original genomic data often leads to modifications of the genomic sequence. The changes in the genomic sequence pose difficulties when comparing chromosomal coordinates of data derived from different releases of WormBase. There is a coordinate re-mapping program and mapping data are available to aid these comparisons. [6]

Gene structure models

All the gene-sets of the WormBase species were initially generated by gene prediction programs. Gene prediction programs give a reasonable set of gene structures, but the best of them only predict about 80% of the complete gene structures correctly. They have difficulty predicting genes with unusual structures, as well as those with a weak translation start signal, weak splice sites or single exon genes. They can incorrectly predict a coding gene model where the gene is a pseudogene and they predict the isoforms of a gene poorly, if at all.

The gene models of C. elegans, C. briggsae, C. remanei, and C. brenneri genes are manually curated. The majority of gene structure changes have been based on transcript data from large scale projects such as Yuji Kohara's EST libraries, Mark Vidal's Orfeome project (worfdb.dfci.harvard.edu/) Waterston and Hillier's Illumina data and Makedonka Mitreva's 454 data. However, other data types (e.g. protein alignments, ab initio prediction programs, trans-splice leader sites, poly-A signals and addition sites, SAGE and TEC-RED transcript tags, mass-spectroscopic peptides, and conserved protein domains) are useful in refining the structures, especially where expression is low and so transcripts are not sufficiently available. When genes are conserved between the available nematode species, comparative analysis can also be very informative.

WormBase encourages researchers to inform them via the help-desk if they have evidence for an incorrect gene structure. Any cDNA or mRNA sequence evidence for the change should be submitted to EMBL/GenBank/DDBJ; this helps in the confirmation and evidence for the gene model as WormBase routinely retrieve sequence data from these public databases. This also makes the data public, allowing appropriate reference and acknowledgement to the researchers.

When any change is made to a CDS (or Pseudogene), the old gene model is preserved as a ‘history’ object. This will have a suffix name like: “AC3.5:wp119”, where ‘AC3.5’ is the name of the CDS and the ‘119’ refers to the database release in which the change was made. The reason for the change and the evidence for the change are added to the annotation of the CDS – these can be seen in the Visible/Remark section of the CDS's ‘Tree Display’ section on the WormBase web site.

Gene nomenclature

Genes

In WormBase, a Gene is a region that is expressed or a region that has been expressed and is now a Pseudogene. Genes have unique identifiers like ‘WBGene00006415’. All C. elegans WormBase genes also have a Sequence Name, which is derived from the cosmid, fosmid or YAC clone on which they reside, for instance F38H4.7, indicating it is on the cosmid ‘F38H4’, and there are at least 6 other genes on that cosmid. If a gene produces a protein that can be classified as a member of a family, the gene may also be assigned a CGC name like tag-30 indicating that this is the 30th member of the tag gene family. Assignment of gene family names is controlled by WormBase. [7] Before publication, requests for names should be made in WormBase.[8]

There are a few exceptions to this format, like the genes cln-3.1, cln-3.2, and cln-3.3 which all are equally similar to the human gene CLN3. Gene GCG names for non-elegans species in WormBase have the 3-letter species code prepended, like Cre-acl-5, Cbr-acl-5, Cbn-acl-5.

A gene can be a Pseudogene, or can express one or more non-coding RNA genes (ncRNA) or protein-coding sequences (CDS).

Pseudogenes

Pseudogenes are genes that do not produce a reasonable, functional transcript. They may be pseudogenes of coding genes or of non-coding RNA and may be whole or fragments of a gene and may or may not express a transcript. The boundary between what is considered a reasonable coding transcript is sometimes subjective as, in the absence of other evidence, the use of weak splice sites or short exons can often produce a putative, though unsatisfactory, model of a CDS. Pseudogenes and genes with a problematic structure are constantly under review in WormBase and new evidence is used to try to resolve their status.

CDSs

Coding Sequences (CDSs) are the only part of a Gene's structure that is manually curated in WormBase. The structure of the Gene and its transcripts are derived from the structure of their CDSs.

CDSs have a Sequence Name that is derived from the same Sequence Name as their parent Gene object, so the gene ‘F38H4.7’ has a CDS called ‘F38H4.7’. The CDS specifies coding exons in the gene from the START (Methionine) codon up to (and including) the STOP codon.

Any gene can code for multiple proteins as a result of alternative splicing. These isoforms have a name that is formed from the Sequence Name of the gene with a unique letter appended. In the case of the gene bli-4 there are 6 known CDS isoforms, called K04F10.4a, K04F10.4b, K04F10.4c, K04F10.4d, K04F10.4e and K04F10.4f.

It is common to refer to isoforms in the literature using the CGC gene family name with a letter appended, for example pha-4a, however this has no meaning within the WormBase database and searches for pha-4a in WormBase will not return anything. The correct name of this isoform is either the CDS/Transcript name: F38A6.1a, or even better, the Protein name: WP:CE15998.

Gene transcripts

The transcripts of a gene in WormBase are automatically derived by mapping any available cDNA or mRNA alignments onto the CDS model. These gene transcripts will therefore often include the UTR exons surrounding the CDS. If there are no available cDNA or mRNA transcripts, then the gene transcripts will have exactly the same structure as the CDS that they are modelled on.

Gene transcripts are named after the Sequence Name of the CDS used to create them, for example, F38H4.7 or K04F10.4a.

However, if there is alternative splicing in the UTRs, which would not change the protein sequence, the alternatively spliced transcripts are named with a digit appended, for example: K04F10.4a.1 and K04F10.4a.2. If there are no isoforms of the coding gene, for example AC3.5, but there is alternative splicing in the UTRs, there will be multiple transcripts named AC3.5.1 and AC3.5.2, etc. If there are no alternate UTR transcripts the single coding_transcript is named the same as the CDS and does not have the .1 appended, as in the case of K04F10.4f.

Operons

Groups of genes which are co-transcribed as operons are curated as Operon objects. These have names like CEOP5460 and are manually curated using evidence from the SL2 trans-spliced leader sequence sites.

Non-coding RNA genes

There are several classes of non-coding RNA gene classes in WormBase:

  • tRNA genes are predicted by the program ‘tRNAscan-SE’.
  • rRNA genes are predicted by homology with other species.
  • snRNA genes are mainly imported from Rfam.
  • piRNA genes are from an analysis of the characteristic motif in these genes.
  • miRNA genes have mainly been imported from miRBase. They have the primary transcript and the mature transcript marked up. The primary transcript will have a Sequence name like W09G3.10 and the mature transcript will have a letter added to this name like W09G3.10a (and if there are alternative mature transcripts, W09G3.10b, etc.).
  • snoRNA genes are mainly imported from Rfam or from papers.
  • ncRNA genes that have no obvious other function but which are obviously not protein-coding and are not pseudogenes are curated. Many of these have conserved homology with genes in other species. A few of these are expressed on the reverse sense to protein-coding genes.

There is also one scRNA gene.

Transposons

Transposons are not classed as genes and so do not have a parent gene object. Their structure is curated as a Transposon_CDS object with a name like C29E6.6.

Other species

The non-elegans species in WormBase have genomes that have been assembled from sequencing technologies that do not involve sequencing cosmids or YACs. These species therefore do not have sequence names for CDSs and gene transcripts that are based on cosmid names. Instead they have unique alphanumeric identifiers constructed like the names in the table below.

Genes names
Species Example Gene name
C. briggsae CBG00001
C. remanei CRE00001
C. brenneri CBN00001
C. japonica CJA00001
Pristionchus pacificus PPA00001

Proteins

The protein products of gene are created by translating the CDS sequences. Each unique protein sequence is given a unique identifying name like WP:CE40440. Examples of the protein identifier names for each species in WormBase is given in the table, below.

Genes names
Species Example Protein name
C. elegans WP:CE00001
C. briggsae BP:CBP00001
C. remanei RP:RP00001
C. brenneri CN:CN00001
C. japonica JA:JA00001
Pristionchus pacificus PP:PP00001
Heterorhabditis bacteriophora HB:HB00001
Brugia malayi BM:BM00001
Meloidogyne hapla MH:MH00001
Meloidogyne incognita MI:MI00001
Haemonchus contortus HC:HC00001

It is possible for two CDS sequences from separate genes, within a species, to be identical and so it is possible to have identical proteins coded for by separate genes. When this happens, a single, unique identifying name is used for the protein even though it is produced by two genes.

ParaSite

WormBase ParaSite[9] is a sub-portal for approximately 100 draft genomes of parasitic helminths (nematodes and platyhelminthes) developed at the European Bioinformatics Institute and Wellcome Trust Sanger Institute. All genomes are assembled and annotated. Additional information such as protein domains and Gene Ontology terms are also available. Gene trees allow the alignment of orthologues between parasitic worms, other nematodes and non-worm comparator species. A BioMart data-mining tool is offered to permit large scale access to the data.

WormBase management

WormBase is a collaboration among the European Bioinformatics Institute, Wellcome Trust Sanger Institute, Ontario Institute for Cancer Research, Washington University in St. Louis, and the California Institute of Technology. It is supported by the grant P41-HG002223 from the National Institutes of Health and the grant G0701197 from the British Medical Research Council .[10] Caltech carries out the biological curation and develops the underlying ontologies, the EBI carries out sequence curation and computation as well as database builds, the Sanger is primarily involved in curation and display of parasitic nematode genomes and genes, and the OICR develops the website and main data mining tools.

Notes and references

  1. ^ Harris, TW; et al. (12 November 2009). "WormBase: a comprehensive resource for nematode research". Nucleic Acids Res. 38 (Database issue): D463–7. doi:10.1093/nar/gkp952. PMC 2808986. PMID 19910365.
  2. ^ Williams, G. W.; Davis, P. A.; Rogers, A. S.; Bieri, T.; Ozersky, P.; Spieth, J. (2011). "Methods and strategies for gene structure curation in WormBase". Database. 2011: baq039. doi:10.1093/database/baq039. PMC 3092607. PMID 21543339.
  3. ^ a b "WormMart Sunset Period: to be retired 01 Jan 2016". Blog. WormBase. 13 November 2015.
  4. ^ "WormMart". Data mining. WormBase.
  5. ^ "WormMine". Data mining. WormBase.
  6. ^ "Converting Coordinates between releases". Retrieved 21 September 2023.
  7. ^ "WormBase Gene Nomenclature". Wormbase.
  8. ^ "Gene Name / Gene Class Name Proposal Submission Form". Retrieved 21 September 2023.
  9. ^ "WormBase ParaSite". Retrieved 21 September 2023.
  10. ^ "WormBaseWiki:Copyrights - WormBaseWiki". www.wormbase.org. Archived from the original on 27 September 2006.

External links

See also

Zdroj:https://en.wikipedia.org?pojem=WormBase
>Text je dostupný pod licencí Creative Commons Uveďte autora – Zachovejte licenci, případně za dalších podmínek. Podrobnosti naleznete na stránce Podmínky užití.

čítajte viac o WormBase


čítajte viac na tomto odkaze: WormBase



Hladanie1.

Special:EditPage/WormBase
Talk:WormBase
Help:Maintenance template removal
Wikipedia:Content forking#Article spinoffs: .22Summary style.22 meta-articles and summary sections
Wikipedia:Handling trivia#Recommendations for handling trivia
Wikipedia:What Wikipedia is not
Help:Maintenance template removal
Wikipedia:Make technical articles understandable
Help:Maintenance template removal
Help:Maintenance template removal
File:Database.png
Organism
Caenorhabditis elegans
PMID (identifier)
Biological database
Model organism
Caenorhabditis elegans
Generic Model Organism Database
Caenorhabditis elegans
Caenorhabditis briggsae
Caenorhabditis remanei
Caenorhabditis brenneri
Caenorhabditis angaria
Pristionchus pacificus
Haemonchus contortus
Meloidogyne hapla
Meloidogyne incognita
Brugia malayi
Onchocerca volvulus
RNAi
Interactome
Homology (biology)
Model Organism Databases
WormBook
TEC-RED
TRNA
RRNA
SnRNA
Rfam
PiRNA
MiRNA
MiRBase
SnoRNA
Caenorhabditis briggsae
Caenorhabditis remanei
Caenorhabditis brenneri
Caenorhabditis japonica
Pristionchus pacificus
Heterorhabditis bacteriophora
Brugia malayi
Meloidogyne hapla
Meloidogyne incognita
Haemonchus contortus
Nematode
Platyhelminthes
European Bioinformatics Institute
Wellcome Trust Sanger Institute
Gene Ontology
BioMart
European Bioinformatics Institute
Wellcome Trust Sanger Institute
Ontario Institute for Cancer Research
Washington University in St. Louis
California Institute of Technology
National Institutes of Health
Medical Research Council (UK)
Doi (identifier)
PMC (identifier)
PMID (identifier)
Doi (identifier)
PMC (identifier)
PMID (identifier)
index.php/Data mining:WormMart
Wormbase
index.php/WormBaseWiki:Copyrights
index.php/WormBaseWiki:Copyrights
index.php/Main Page
Twitter (identifier)
Q3570042#P2002
Flybase
Xenbase
Template:Bioinformatics
Template talk:Bioinformatics
Special:EditPage/Template:Bioinformatics
Bioinformatics
GenBank
European Nucleotide Archive
DNA Data Bank of Japan
China National GeneBank
UniProt
UniProt#UniProtKB.2FSwiss-Prot
UniProt#UniProtKB.2FTrEMBL
Protein Information Resource
BioNumbers
Protein Data Bank
Ensembl genome database project
InterPro
KEGG
Gene ontology
Barcode of Life Data System
Saccharomyces Genome Database
FlyBase
VectorBase
Rat Genome Database
PHI-base
The Arabidopsis Information Resource
GISAID
Zebrafish Information Network
BLAST (biotechnology)
Bowtie (sequence analysis)
Clustal
EMBOSS
HMMER
MUSCLE (alignment software)
Phylogenetic Assignment of Named Global Outbreak Lineages
SAMtools
Short Oligonucleotide Analysis Package
TopHat (bioinformatics)
ExPASy
Rosalind (education platform)
Broad Institute
Computational Biology Department
COSBI
Database Center for Life Science
DNA Data Bank of Japan
European Bioinformatics Institute
European Molecular Biology Laboratory
Flatiron Institute
J. Craig Venter Institute
Max Planck Institute of Molecular Cell Biology and Genetics
National Center for Biotechnology Information
National Institute of Genetics
Netherlands Bioinformatics Centre
Philippine Genome Center
Scripps Research
Swiss Institute of Bioinformatics
Wellcome Sanger Institute
Whitehead Institute
African Society for Bioinformatics and Computational Biology
Australia Bioinformatics Resource
EMBnet
International Nucleotide Sequence Database Collaboration
International Society for Biocuration
International Society for Computational Biology
International Society for Computational Biology Student Council
Institute of Genomics and Integrative Biology
Japanese Society for Bioinformatics
Basel Computational Biology Conference
European Conference on Computational Biology
Intelligent Systems for Molecular Biology
International Conference on Bioinformatics
International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics
ISCB Africa ASBCB Conference on Bioinformatics
Pacific Symposium on Biocomputing
Research in Computational Molecular Biology
CRAM (file format)
FASTA format
FASTQ format
NeXML format
Nexus file
Pileup format
SAM (file format)
Stockholm format
Variant Call Format
General feature format
Computational biology
List of biobanks
List of biological databases
Molecular phylogenetics
Sequencing
Sequence database
Sequence alignment
Category:Bioinformatics
Category:Bioinformatics
Template:Wellcome Trust
Template talk:Wellcome Trust
Special:EditPage/Template:Wellcome Trust
Wellcome Trust
Francis Crick Institute
Gurdon Institute
Science Learning Centres
Wellcome Trust Centre for Cell-Matrix Research
Wellcome Trust Centre for Gene Regulation and Expression
Wellcome Trust Centre for Human Genetics
Wellcome Trust Centre for Neuroimaging
Wellcome Trust Centre for Stem Cell Research
Wellcome Sanger Institute
Wellcome Trust Centre for the History of Medicine
Wellcome Research Laboratories
File:Wellcome.jpg
1000 Genomes Project
Big Picture (magazine)
Cambridge Biomedical Campus
Cancer Genome Project
ChEMBL
Coalition for Epidemic Preparedness Innovations
COSMIC cancer database
DECIPHER
Diamond Light Source
ELife
Ensembl
Farmcare
Genome Reference Consortium
Human Genome Project
MEROPS
Pfam
Rfam
UK Biobank
Wellcome Collection
Wellcome Genome Campus
Wellcome Library
Eliza Manningham-Buller
Michael Ferguson (biochemist)
Tobias Bonhoeffer
Q24716612
Bryan Grenfell
Naguib Kheraj
Fiona Powrie
Jeremy Farrar
Peter Williams (physician)
Bridget Ogilvie
Michael Dexter
Mark Walport
Damon Buffini
William Castell
Dominic Cadbury
Harold Cook (medical historian)
Kay Davies
Oliver Franks, Baron Franks
Roger Gibbs
Henry Hallett Dale
Richard Hynes
Anne Mandall Johnson
Roy Porter
Q21165217
David Steel (businessman)
David Ian Stuart
John Sulston
Henry Wellcome
Fellow
Postdoctoral researcher
Wellcome Book Prize
Wellcome Trust Principal Research Fellow
Category:Wellcome Trust
WormBase
WormBase
Main Page
Wikipedia:Contents
Portal:Current events
Special:Random
Wikipedia:About
Wikipedia:Contact us
Special:FundraiserRedirector?utm source=donate&utm medium=sidebar&utm campaign=C13 en.wikipedia.org&uselang=en
Help:Contents
Help:Introduction
Wikipedia:Community portal
Special:RecentChanges
Wikipedia:File upload wizard
Main Page
Special:Search
Help:Introduction
Special:MyContributions
Special:MyTalk
WormBase
Special:EntityPage/Q3570042#sitelinks-wikipedia
WormBase
Talk:WormBase
WormBase
WormBase
Special:WhatLinksHere/WormBase
Special:RecentChangesLinked/WormBase
Wikipedia:File Upload Wizard
Special:SpecialPages
Special:EntityPage/Q3570042
WormBase
WormBase
Main Page
Wikipedia:Contents
Portal:Current events
Special:Random
Wikipedia:About
Wikipedia:Contact us
Special:FundraiserRedirector?utm source=donate&utm medium=sidebar&utm campaign=C13 en.wikipedia.org&uselang=en
Help:Contents
Help:Introduction
Wikipedia:Community portal
Special:RecentChanges
Wikipedia:File upload wizard
Main Page
Special:Search
Help:Introduction
Special:MyContributions
Special:MyTalk
WormBase
Special:EntityPage/Q3570042#sitelinks-wikipedia
WormBase
Talk:WormBase
WormBase
WormBase
Special:WhatLinksHere/WormBase
Special:RecentChangesLinked/WormBase
Wikipedia:File Upload Wizard
Special:SpecialPages
Special:EntityPage/Q3570042
WormBase
WormBase
Main Page
Wikipedia:Contents
Portal:Current events
Special:Random
Wikipedia:About
Wikipedia:Contact us
Special:FundraiserRedirector?utm source=donate&utm medium=sidebar&utm campaign=C13 en.wikipedia.org&uselang=en
Help:Contents
Help:Introduction
Wikipedia:Community portal
Special:RecentChanges
Wikipedia:File upload wizard
Main Page
Updating...x




Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.