U.S. flag

An official website of the United States government

Display Settings:

Items per page

PMC Full-Text Search Results

Items: 1 to 20 of 7473

1.
Figure 4

Figure 4. From: Identification of novel differentially expressed genes in retinas of STZ‐induced long‐term diabetic rats through RNA sequencing.

Validation of DEGs. (a) Up‐regulated DEGs. (b) Down‐regulated DEGs. GeneBank ID and version number: Hba‐a1: Gene ID: 287167, Refseq: NC_005109.4 Tnnt2: Gene ID: 24837, Refseq: NC_005112.4 Edn2: Gene ID: 24324, Refseq: NC_005104.4, F9: Gene ID: 24946, Refseq: NC_005120.4, Eqtn: Gene ID: 500502, Refseq: NC_005104.4, Ankar: Gene ID: 501138, Refseq: NC_005108.4, Cntnap5b: Gene ID: 301650, Refseq: NC_005112.4, Lad1: Gene ID: 313325, Refseq: NC_005112.4, Asb15: Gene ID: 500050, Refseq: NC_005103.4, Tsga10ip: Gene ID: 361707, Refseq: NC_005100.4, Ltk: Gene ID: 311337, Refseq: NC_005102.4, Impad1: Gene ID: 312952, Refseq: NC_005116.4, Loxhd1: Gene ID: 291427, Refseq: NC_005117.4, Crygc: Gene ID: 24277, Refseq: NC_005108.4, Crygd: Gene ID: 24278, Refseq: NC_005108.4, Piwil1: Gene ID: 363912, Refseq: NC_005111.4, RT1‐Bb: Gene ID: 309622, Refseq: NC_005119.4, H3f3c: Gene ID: 100360868, Refseq: NC_005106.4, Col3a1: Gene ID: 84032, Refseq: NC_005108.4, Pmel: Gene ID: 362818, Refseq: NC_005106.4, Lgsn: Gene ID: 316304, Refseq: NC_005108.4, RT1‐Ba: Gene ID: 309621, Refseq: NC_005119.4, Crygf, Gene ID: 689947, Refseq: NC_005108.4, Cryga: Gene ID: 684028, Refseq: NC_005108.4, Crygb: Gene ID: 301468, Refseq: NC_005108.4, Fam111a, Gene ID: 499322, Refseq: NC_005100.4

Xindan Xing, et al. Mol Genet Genomic Med. 2020 Mar;8(3):e1115.
2.
Fig. S2.

Fig. S2. From: Noncanonical DNA-binding mode of repressor and its disassembly by antirepressor.

Sequence alignments of Rep (A) and Ant (B). Multialignment of S. Typhimurium Rep (UniProtKB accession no. T1S9Z0) against Rep from Salmonella phage epsilon15 (UniProtKB accession no. Q858D7), Escherichia phage phiV10 (UniProtKB accession no. Q286X7), Escherichia phage TL-2011b (UniProtKB accession no. G9L6A6), Salmonella phage SPN1S (UniProtKB accession no. H2D0H9), Salmonella phage SPN9TCW (UniProtKB accession no. M1F232), E. coli O118:H16 str. 2009C-4446 (UniProtKB accession no. A0A028E2H1), E. coli KTE235 (Ensembl Bacteria accession no. ELD77068), E. coli UMEA 3290-1 (Ensembl Bacteria accession no. ESK17367), E. coli MS 196-1 (UniProtKB accession no. D8BV09), E. coli O7:K1 str. CE10 (RefSeq accession no. YP_006144959), E. coli NA114 (RefSeq accession no. YP_006141898), E. coli MS 185-1 (RefSeq accession no. WP_000836295), E. coli 908519 (UniProtKB accession no. V0VMF3), Salmonella enterica subsp. enterica serovar Bareilly str. CFSAN000179 (EMBL-WGS accession no. KDQ92900), S. enterica subsp. enterica serovar Agona str. 632182-2 (Ensembl Bacteria accession no. ESH97837), Citrobacter koseri (UniProtKB accession no. A0A0A5IUU2), Enterobacter sp. MGH 15 (RefSeq accession no. WP_032636571), Enterobacter cloacae (RefSeq accession no. WP_032671854), Escherichia vulneris NBRC 102420 (UniProtKB accession no. A0A090UX00), Pantoea sp. GM01 (UniProtKB accession no. J2L6Z7), Serratia grimesii (UniProtKB accession no. A0A084YWL3), Serratia marcescens BIDMC 80 (Ensembl Bacteria accession no. EZQ69436), Candidatus Sodalis pierantonio str. SOPE (UniProtKB accession no. W0HLH1), Yersinia kristensenii (UniProtKB accession no. A0A088L5A4), S. enterica subsp. enterica serovar Newport str. CVM 19470 (Ensembl Bacteria accession no. EJA85693), S. enterica subsp. enterica serovar Enteritidis str. 3402 (UniProtKB accession no. V7Y6W8), Citrobacter werkmanii NBRC 105721 (UniProtKB accession no. A0A090TWG5), Citrobacter freundii GTC 09629 (Ensembl Bacteria accession no. EOD57340), S. enterica subsp. enterica serovar Newport str. CVM 19443 (Ensembl Bacteria accession no. EJA65811), E. coli O127:H6 (strain E2348/69/EPEC) (UniProtKB accession no. B7UGT4), Klebsiella oxytoca (RefSeq accession no. WP_032745644), Enterobacter aerogenes (RefSeq accession no. WP_032715140), Klebsiella pneumoniae (EMBL CDS accession no. AHM80771), Cronobacter sakazakii CMCC 45402 (UniProtKB accession no. V5TYG0), Cronobacter malonaticus (RefSeq accession no. WP_032982558), Cronobacter turicensis 564 (UniProtKB accession no. K8BK39), and C. malonaticus (RefSeq accession no. WP_032983317). Multialignment of Ant (UniProtKB accession no. T1SA45) against Ant from Salmonella phage epsilon15 (UniProtKB accession no. Q858F6), Escherichia phage phiV10 (UniProtKB accession no. Q286Z4), Escherichia phage TL-2011b (UniProtKB accession no. G9L6E3), Salmonella phage SPN1S (UniProtKB accession no. H2D0F7), Salmonella phage SPN9TCW (UniProtKB accession no. M1EZ64), S. enterica subsp. enterica serovar Enteritidis str. 3402 (UniProtKB accession no. V7Y2G6), S. enterica subsp. enterica serovar Newport str. CVM 19470 (Ensembl Bacteria accession no. EJA85653), S. enterica subsp. enterica serovar Agona str. 632182-2 (Ensembl Bacteria accession no. ESH97805), C. freundii GTC 09479 (EMBL-WGS accession no. EMF20587), E. coli O127:H6 (UniProtKB accession no. B7UGQ3), C. malonaticus (RefSeq accession no. WP_032986301), E. vulneris NBRC 102420 (UniProtKB accession no. A0A090V1M3), K. pneumoniae subsp. pneumoniae KPNIH10 (UniProtKB accession no. A0A0E1DF28), C. sakazakii CMCC 45402 (UniProtKB accession no. V5U0K5), Enterobacter sp. MGH 15 (Ensembl Bacteria accession no. EUM54292), K. oxytoca (RefSeq accession no. WP_032745680), S. enterica subsp. enterica serovar Newport str. CVM 19443 (Ensembl Bacteria accession no. EJA65772), E. cloacae (RefSeq accession no. WP_032671830), C. koseri (UniProtKB accession no. A0A0A5IVX8), S. enterica subsp. enterica serovar Bareilly str. CFSAN000179 (Ensembl Bacteria accession no. KDQ92930), E. coli NA114 (RefSeq accession no. YP_006141911), E. coli 908519 (UniProtKB accession no. V0V944), E. coli UMEA 3290-1 (EMBL-WGS accession no. ESK17333), E. coli KTE235 (EMBL-WGS accession no. ELD77031), Y. kristensenii (UniProtKB accession no. A0A088L842), E. coli O7:K1 str. CE10 (RefSeq accession no. YP_006144928), E. aerogenes UCI 15 (RefSeq accession no. WP_032723151), C. malonaticus (RefSeq accession no. WP_032983289), C. turicensis 564 (UniProtKB accession no. K8BRJ5), E. coli O7:K1 (strain IAI39/ExPEC) (RefSeq accession no. YP_002408599), E. coli O1:K1/APEC (RefSeq accession no. YP_853609), C. werkmanii NBRC 105721 (UniProtKB accession no. A0A090TUF1), Pantoea sp. GM01 (UniProtKB accession no. J2LY07), S. marcescens BIDMC 80 (Ensembl Bacteria accession no. EZQ69401), and E. coli FCP1 (RefSeq accession no. WP_025651152). Secondary structure elements were assigned by PyMOL (The PyMOL Molecular Graphics System, www.pymol.org). Every tenth residue is marked by a black dot. Strictly 100% conserved residues are highlighted in red. Cylinders above the sequences denote α-helices. A dotted line denotes disordered regions.

Minsik Kim, et al. Proc Natl Acad Sci U S A. 2016 May 3;113(18):E2480-E2488.
3.
Supplementary Fig. 3

Supplementary Fig. 3. From: The Lepidopteran endoribonuclease-U domain protein P102 displays dramatically reduced enzymatic activity and forms functional amyloids.

Partial multiple alignment of P102 sequence (residues 72-349) with homologs from insect orders except Lepidoptera. The name of the organism and the number of the first amino acid of the aligned region are indicated. The accession numbers for sequences retrieved from NCBI are the following: Aedes aegypti (RefSeq ID: XP_001660282.1), Anopheles darlingi (GenBank ID: EFR21399.1), A. gambiae (RefSeq ID: XP_311978.5), Apis mellifera (RefSeq ID: XP_003251941.1), Camponotus floridanus (GenBank ID: EFN64513.1), Culex quinquefasciatus (RefSeq ID: XP_001864562.1), C. tarsalis (GenBank ID: ACJ64345.1), Drosophila ananassae (RefSeq ID: XP_001963598.1), D. erecta (RefSeq ID: XP_001977186.1), D. melanogaster (RefSeq ID: NP_572668.1), D. mojavensis (RefSeq ID: XP_002010982.1), D. persimilis (RefSeq ID: XP_002025808.1), D. pseudoobscura pseudoobscura (RefSeq ID: XP_001354551.2), D. simulans (RefSeq ID: XP_002106640.1), D. virilis (RefSeq ID: XP_002056809.1), Glossina morsitans morsitans (GenBank ID: ADD19839.1), Harpegnathos saltator (GenBank ID: EFN80024.1), Nasonia vitripennis (RefSeq ID: XP_001603664.1), Solenopsis invicta (GenBank ID: EFZ16938.1), Tribolium castaneum (RefSeq ID: XP_968069.1). Three sequences of Coleoptera (Brassicogethes aeneus, Nicrophorus vespilloides and Phaedon cochleariae), were retrieved from in-house transcriptome databases Identical and conserved amino acids are shaded in light gray and very light gray, respectively. Residues involved in the enzymatic activity are indicated by arrows.

Mariarosa Pascale, et al. Dev Comp Immunol. 2014 Nov;47(1):129-139.
4.
Figure 1

Figure 1. From: The Most Frequently Used Sequencing Technologies and Assembly Methods in Different Time Segments of the Bacterial Surveillance and RefSeq Genome Databases.

(A) The growth of the RefSeq microbial genomic databases and the database of bacterial genomes excluded from RefSeq for the reason “derived from surveillance project.” Dotted lines represent number of “complete genomes.” The data for 2020 includes genomes submitted before April 17. (B) Relative proportions between the different assembly levels in the bacterial RefSeq genome database. (C) The most frequently used sequencing techniques in the bacterial RefSeq database. (D) Relative proportions between the different Illumina platforms in the bacterial RefSeq genome database. (E) Relative proportions between sequencing techniques used in bacterial RefSeq divided by years. (F) Frequencies of pseudogenes in bacterial RefSeq genomes reported to be produced by one technique alone. (G) Relative proportions between genomes produced by a single sequencing technique and combinations of techniques. (H) Relative proportions between the most frequently used combinations of sequencing techniques in the bacterial RefSeq genome database. (I) Histogram of the reported sequence depth (coverage) used in the bacterial RefSeq genome database and in the bacterial surveillance project genome database.

Bo Segerman. Front Cell Infect Microbiol. 2020;10:527102.
5.
FIG 1

FIG 1. From: Conserved Outer Tegument Component UL11 from Herpes Simplex Virus 1 Is an Intrinsically Disordered, RNA-Binding Protein.

Alignment and characteristics of UL11 homolog sequences. (A) Sequences of UL11 homologs from herpesviruses aligned to HSV-1 UL11 with HSV-1 residue numbers marked. Human virus sequences used include HSV-1 strain 17 UL11 (RefSeq accession no. YP_009137085.1), HSV-2 strain HG52 UL11 (RefSeq YP_009137162.1), VZV strain Dumas ORF49 (RefSeq NP_040171.1), EBV strain B95-8 BBLF1 (RefSeq YP_401686.1), CMV strain AD169 UL99 (RefSeq P13200.3), HHV-6A strain Uganda-1102 U71 (RefSeq NP_042964.1), HHV-6B strain Z29 U71 (RefSeq NP_050250.1), HHV-7 strain JI U71 (RefSeq YP_073811.1), and KSHV strain GK18 ORF38 (RefSeq YP_001129391.1). Other representative animal virus sequences used include Marek’s disease virus (MDV) strain Md5 UL11 (RefSeq YP_001033939.1), pseudorabies virus (PRV) composite strain UL11 (RefSeq YP_068364.1), murine cytomegalovirus (MCMV) strain Smith UL99 (RefSeq YP_214100.1), saimiriine herpesvirus 2 (SaHV-2) ORF38 (RefSeq NP_040240.1), and equine herpesvirus 2 (EHV-2) strain 86/67 myristoylated tegument protein (RefSeq NP_042635.1). All sequences show the NCBI reference sequence accession number or code in parentheses. Conserved residues are marked with an asterisk. Myristoylated glycines and palmitoylated cysteines, experimentally determined or predicted, are shown in italicized type in cyan text. Acidic clusters, experimentally defined or predicted, are boxed in red. Groups of basic residues are boxed in blue. Beta strands and alpha helices predicted by PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) are highlighted in yellow and light teal, respectively. Disorder-promoting residues (A/G/S/P/Q/E/R/K) are colored in magenta. Residues predicted to be disordered by DISOPRED3 (http://bioinf.cs.ucl.ac.uk/psipred/) are underlined in gray. (B) Representation of disorder by residue in HSV-1 UL11 as predicted by FoldUnfold (http://bioinfo.protres.ru/ogu/). Residues predicted to be in natively folded regions are shown in blue, and residues predicted to be in unfolded regions are shown in red. Residues scoring below the threshold (black line) but surrounded by folded residues are to be treated as folded and are shown in cyan.

Claire M. Metrick, et al. mBio. 2020 May-Jun;11(3):e00810-20.
6.
FIGURE 1.

FIGURE 1. From: Cus2 enforces the first ATP-dependent step of splicing by binding to yeast SF3b1 through a UHM–ULM interaction.

A putative UHM–ULM interaction between the U2 snRNP proteins Cus2 and Hsh155. (A,B) Domain organizations of (A) Tat-SF1/Cus2 and (B) SF3b1/Hsh155. (C) The sequences of the UHM region of Cus2 (Saccharomyces cerevisiae, NCBI RefSeq NP_014113) aligned with homologs including human Tat-SF1 (NCBI RefSeq NP_001156752), Mus musculus (NCBI RefSeq NP_083647), Xenopus laevis (NCBI RefSeq NP_001083090), Drosophila melanogaster (NCBI RefSeq NP_649313), Arabidopsis thaliana (NCBI RefSeq NP_197130), Caenorhabditis elegans (NCBI RefSeq NP_490765), and Dictyostelium discoideum (NCBI RefSeq XP_636172). The residues are colored by identity: <50%, white; 50%, yellow; 63%, chartreuse; 75%, lime; 88% green; 100%, forest. The Cus2 D204 and F253 residues mutated in this study are marked by asterisks. The Tat-SF1 UHM secondary structure elements assigned using the Kabsch and Sander algorithm are indicated schematically above the alignment (). Characteristic sequences of the UHM domain are underlined and bold. The ribonucleoprotein consensus motifs (RNP1 and RNP2) are underlined, and characteristic UHM residues that diverge from the RRM consensus are bold. (D) The sequences of the Hsh155 ULM (S. cerevisiae, NCBI RefSeq NP_014015) aligned with the ULMs of human SF3b1 (NCBI RefSeq NP_036565) that bind Tat-SF1 (ULM1, ULM4, and ULM5) and homologs including M. musculus (NCBI RefSeq NP_112456), X. laevis (NCBI RefSeq NP_001084150), D. melanogaster (NCBI RefSeq NP_608534), A. thaliana (NCBI RefSeq NP_201232), C. elegans (NCBI RefSeq NP_497853), and D. discoideum (NCBI RefSeq XP_643385). Matching ULMs were identified by BLAST (). The D. discoideum homolog appears to contain two rather than five ULMs, and Hsh155 contains a single ULM. Residues are colored by sequence identity as in C. Characteristic sequences of the ULM are underlined and bold. For the Hsh155 homolog, residues outside the boundary of the peptide used in this study are italicized, and the bound residues lack regular secondary structure elements. The Hsh155 R100 and W101 residues mutated in this study are marked by asterisks. (E) Yeast two-hybrid screen showing interaction between the UHM of Cus2 or Tat-SF1 and the ULM of Hsh155. Sequences encoding wild-type and mutant Cus2 proteins were fused to the DNA-binding domain of the GAL4 transcription factor and tested against constructs expressing either wild-type Hsh155 or mutant Hsh155 proteins fused to the activation domain of GAL4. Growth on SCD-His medium supplemented with 25 mM 3-AT indicates interaction. A black vertical line indicates that part of the plate was removed for clarity. (F) Cus2 directly and specifically binds the Hsh155 ULM region in GST pull-down assays. The retained proteins were resolved by SDS-PAGE and stained with Coomassie Brilliant Blue. The sizes of molecular weight standards (STD) and subunits are indicated. GST is a control for nonspecific binding. GB1Cus2 includes an N-terminal GB1-tag to improve solubility and reduce nonspecific binding to the resin. Hsh155ULM includes residues 86–129. The lanes are labeled: GB1Cus2, input Cus2; GSTHsh155ULM or GST, GST or GST-fusion protein used for bait; W, final wash of binding reaction; E, elution.

Jason Talkish, et al. RNA. 2019 Aug;25(8):1020-1037.
7.
Figure 2.

Figure 2.Incomplete 3' UTRs annotations contribute to discrepancies in RNA-seq analysis.. From: An improved zebrafish transcriptome annotation for sensitive and comprehensive detection of cell type-specific genes.

(A, B) Log10 average expression as quantified using indicated annotation for (A) kdrlpos- or (B) pdgfrbpos-enriched genes identified as such only in RefSeq and lacking an Ens95 3' UTR annotation. Expression levels for genes from each annotation with matched NCBI ID are shown in each case. Data are normally distributed (Shapiro-Wilks test), paired t-test, p values are indicated; n = 3 (i.e. each point represents an average value from three separate RNA-seq replicates). (C) UCSC browser image of slc7a5 locus on the minus strand showing 3' UTR annotations from Ens95 and RefSeq. Mapped read depth from kdrlpos cells on the genome, or assigned to each annotation are indicated, as is a 3P-seq feature. The GSE32900 track is consolidated RNA-seq reads from all stages indicated in . The location of a putative missing 3' UTR is indicated. (D) Pie chart showing numbers of reference genes with the same or longer 3' UTRs in each indicated annotation. (E) Pie charts showing the proportion of reference genes selectively identified as kdrlpos- or pdgfrbpos-enriched by Ens95 and RefSeq with indicated relative 3' UTR length. (F, G) Correlation plots showing log10 average expression from kdrlpos RNA-seq (n = 3) quantified with each annotation for matched reference genes with (F) longer Ens95 (maroon) or RefSeq (light blue) 3' UTR, or (G) same 3' UTR length. Data are not normally distributed, Spearman correlation, r values are indicated. (H, I) UCSC browser images of (H) sox17 and (I) cspg4 loci, both on the minus strand, showing 3' UTR annotations from Ens95 and RefSeq. Mapped read depth of RNA-seq from (H) kdrlpos or (I) pdgfrbpos cells captured for each annotation is indicated. Consolidated reads from GSE32900 and location of 3P-Seq features are indicated, as is putative missing 3' UTR in cspg4.
Figure 2—source data 1.Missing 3' UTR annotations in RefSeq and Ens95.This file includes lists of Ens95 (worksheet 1) and RefSeq (worksheet 2) genes indicating annotation as coding sequence (CDS) and whether there is an annotated stop codon and 3' UTR. Data from RNA-seq-based quantification for Ens95 genes missing a 3' UTR that is present in RefSeq is included for kdrlpos (worksheet 3), pdgfrbpos (worksheet 4), and Nr2f2pos (worksheet 5) cells. These data were used to generate and graphs in ; .
Figure 2—source data 2.Reference gene set for 3' UTR comparisons.IDs for representative Ens95, RefSeq, and V4.3 transcript ID, along with V4.3 gene symbols are shown with respective 3' UTR lengths (worksheet 1). Average median ratio normalized expression and log2 fold change (pos/neg) values quantified with Ens95, RefSeq, and V4.3 annotations from kdrlpos (worksheet 2), pdgfrbpos (worksheet 3), and Nr2f2pos (worksheet 4) RNA-seq for reference genes are included. Data directly used to generate , , and incorporated into source data as indicated below.
Figure 2—source data 3.RNA-seq analysis of Nr2f2pos and NR2f2neg cells.Output from DESeq2 analysis comparing Nr2f2pos and Nr2f2neg RNA-seq from gene expression levels quantified using RSEM with Ens95 (worksheet 1) or RefSeq (worksheet 2). Median ratio normalized expression values are shown for each sample, along with adjusted p-value, p-value, log2 fold change, fold change, and log10 adjusted p-value. Intersection of genesets identified as significantly enriched in Nr2f2pos cells using Ens95 or RefSeq (worksheet 3).
Figure 2—source data 4.Transcript based-comparison of RefSeq and Ensembl annotations.Worksheet one is a list of Ens95 genes missing from RefSeq with Ensembl gene ID, matching ZFIN ID and biotype annotation. Worksheet two is a list of RefSeq genes missing from Ensembl with NCBI gene ID, matching ZFIN ID, and coding sequence annotation. Transcript level matching output from gffcompare is included using Ens95 (worksheet 3) or RefSeq (worksheet 4) as a reference. Worksheet five is a transcript level comparison of Ens95 and Ens99. In this case, all transcripts exhibit a complete intron/exon chain match (designated by a ‘=" in class code). Data used to generate .

Nathan D Lawson, et al. eLife. 2020;9:e55792.
8.
Figure 2

Figure 2. From: A Novel Missense Mutation of the DDHD1 Gene Associated with Juvenile Amyotrophic Lateral Sclerosis.

Schematic representation of DDHD1 on chromosome 14 and DDHD1 protein. (A) Schematic representation of DDHD1 on chromosome 14 (RefSeq NM_001160147.1). The black boxes represent coding exons. The upper row shows the novel missense mutation c.1483A>G (p.Met495Val) in our report, and the lower row shows the mutations in previous reports.c.1249C>T, c.1766G>A, c.1874delT, c.2438-1G>T (RefSeq NM_030637.2) (Tesson et al., ). c.1422_1423insA, c.2279delT (RefSeq NM_030637.2) (Liguori et al., ). c.1429C>T (RefSeq NM_001160148) (Mignarri et al., ). c.914_917delGTAA (RefSeq NM_030637.2) (Miura et al., ). (B) Schematic representation of DDHD1 protein (RefSeq NP_001153619.1). The red box represents DDHD domain in the C-terminus. The upper row shows the novel p.M495V mutation identified in this study. The known mutations are indicated below. p.R417*, p.R589Q, p.L625*, c.2438-1G>T(p.?)(RefSeq NP_085140.2) (Tesson et al., ). p.V476Sfs*20, p.M760Sfs*37 (RefSeq NP_085140.2) (Liguori et al., ). p.R477* (RefSeq NP_001153620.1) (Mignarri et al., ). p.S305Ifs*2 (RefSeq NP_085140.2) (Miura et al., ).

Chujun Wu, et al. Front Aging Neurosci. 2016;8:291.
9.
FIG 3

FIG 3. From: Comparative Analysis and Data Provenance for 1,113 Bacterial Genome Assemblies.

Comparative metrics for 1,113 ASRGs versus RefSeq Assemblies. (A) Intersection of ASRGs versus RefSeq for strains labeled as being from ATCC. In parentheses are the total numbers of RefSeq assemblies, allowing for strain redundancy. (B) N50 variability of RefSeq versus ASRGs by sequencing technology. Note that the scale is 1E6. (C) Differences in contig counts for ASRG versus RefSeq assemblies. Positive values indicate that the RefSeq assembly had more contigs. (D) Ratios of ASRG N50 values (y axis) to RefSeq N50 values (“public,” x axis). Density along the diagonal indicates that many assemblies are similar, while density along the y axis indicates ASRGs with higher N50 values. (E) GC content for ASRGs (y axis) versus RefSeq (x axis). Nearly all assemblies have less than 0.1% difference in GC content. (F) Pairwise GC content differences between ASRGs and comparable RefSeq assemblies for the same strain.

David A. Yarmosh, et al. mSphere. 2022 May-Jun;7(3):e00077-22.
10.
Figure 2

Figure 2. From: Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

Common and unique annotated features of GENCODE and RefSeq protein-coding genes. Venn diagram to show intersection between A) transcripts annotated at GENCODE Comprehensive and RefSeq NXR protein-coding loci B) unique (non-redundant) translations annotated at GENCODE Comprehensive and RefSeq NXR protein-coding loci C) unique (non-redundant) exons annotated at GENCODE Comprehensive and RefSeq NXR protein-coding loci

Adam Frankish, et al. BMC Genomics. 2015;16(Suppl 8):S2-S2.
11.
Fig. 4

Fig. 4. From: RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification.

The fraction of reads classified among Bacillus species varied depending on which RefSeq version was used. a Classifying B. cereus VD118 reads with Kraken (left) and Bracken (right) against different versions of RefSeq. Species-level classifications varied, and the fraction of unclassified reads decreased with Kraken, as the database grew. Once B. cereus VD118 appeared in the database (ver. 60), Bracken correctly classified every read. b Species-level classifications decrease with Kraken as RefSeq grows using real reads from an environmental Bacillus cereus not in RefSeq. Fraction of B. cereus ISSFR-23F reads classified using Kraken ver. 1.0 (left) and Bracken ver. 1.0.0 (right) against different versions of bacterial RefSeq. Bracken classification pushed all reads to a species-level call, though these classifications were often for other Bacillus species

Daniel J. Nasko, et al. Genome Biol. 2018;19:165.
12.
Figure 5.

Figure 5. From: Leveraging the Mouse Genome for Gene Prediction in Human: From Whole-Genome Shotgun Reads to a Global Synteny Map.

Relationships among the genes and exons annotated by TWINSCAN, GENSCAN, and aligned RefSeq transcripts. (A) Number of genes annotated by RefSeq, TWINSCAN, and GENSCAN, and number of exact matches among them. RefSeq and TWINSCAN contain 1,791 identical genes, RefSeq and GENSCAN contain 1,115, TWINSCAN and GENSCAN contain 2,809, and the intersection of all three sets contains 670. (B) Number of unique coding exons annotated by RefSeq, TWINSCAN, and GENSCAN, and number of exact matches among them. RefSeq and TWINSCAN contain 80,530 identical exons, RefSeq and GENSCAN contain 77,442, TWINSCAN and GENSCAN contain 134,507, and the intersection of all three sets contains 67,320.

Paul Flicek, et al. Genome Res. 2003 Jan 1;13(1):46-54.
13.
Figure 4

Figure 4. From: Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

Non-concordance of variant functional annotation. Percentage non-concordant annotation i.e. variants with annotation in only one dataset (unique) or different annotation between datasets (discordant). The variants are represented in four broad classes; CDS, other, splice and LoF with comparisons between GENCODE Comprehensive and RefSeq NXR using 1KG data (Blue), GENCODE Basic and RefSeq NXR using 1KG data (Red), GENCODE Comprehensive and RefSeq NXR using ESP data (Green), and GENCODE Basic and RefSeq NXR using 1KG data (Purple).

Adam Frankish, et al. BMC Genomics. 2015;16(Suppl 8):S2-S2.
14.
Figure 4

Figure 4. From: DNA methylation profiles in diffuse large B-cell lymphoma and their relationship to gene expression status.

Relationships among DNA methylation and gene expression status. MethyLight (ML) PMR values (X-axis) were plotted against expression value (Y-axis) for all genes showing statistically significant trends (Benjamini-Hochberg corrected P<0.05) for decreasing expression with increasing levels of methylation. The location of each ML reaction relative to the start of the corresponding RefSeq is provided in parentheses. Panel (a) BNIP3 (32-bp upstream of RefSeq NM_004052), (b) MGMT (exon 48-bp downstream of RefSeq NM_002412), (c) RBP1 (exon 43-bp downstream of RefSeq NM_002899), (d) GATA4 (intron 394-bp downstream of RefSeq NM_002052), (e) IGSF4 (exon 37-bp downstream of RefSeq NM_014333), (f) CRABP1 (exon 37-bp downstream of RefSeq NM_004378), and (g) FLJ21062 (21-bp upstream of RefSeq NM_001039706). The ML reaction ID, Affymetrix probe tiling ID, equation for a linear fit of the data, and r-squared value are provided.

Brian L. Pike, et al. Leukemia. ;22(5):1035-1043.
15.
Figure 1—Figure supplement 1.

Figure 1—Figure supplement 1.Heatmap of RefSeq genes.. From: A mammalian pseudogene lncRNA at the interface of inflammation and anti-inflammatory therapeutics.

Mean centered heatmap of RefSeq protein coding genes and RefSeq lncRNAs.
DOI: http://dx.doi.org/10.7554/eLife.00762.004

Nicole A Rapicavoli, et al. eLife. 2013;2:e00762.
16.
Figure 2:

Figure 2:. Number of novel draft proteins verified by draft-only peptides in different categories.. From: Proteogenomic annotation of the Chinese hamster reveals extensive novel translation events and endogenous retroviral elements.

The draft annotation predicted thousands of novel protein sequences. (A) Of these, 3,389 had peptides mapping to proteins uniquely supporting the novel protein sequences. (B) Only 140 did not have extra peptide support from peptides that map to proteins uniquely, and thousands provided peptide support. RefSeq perfect short: RefSeq proteins map perfectly but are shorter than draft proteins; High quality: high quality mapping proteins between draft and RefSeq; Draft high quality: draft proteins map to RefSeq with high quality, but the reverse doesn’t hold; RefSeq high quality: RefSeq proteins map to draft with high quality, but the reverse doesn’t hold; Low quality: low quality mapping between draft and RefSeq.

Shangzhong Li, et al. J Proteome Res. ;18(6):2433-2445.
17.
Figure 1

Figure 1. From: The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays.

Dynamics of RefSeq database. Release statistics retrieved from shows the development of the RefSeq database, including (A) all RefSeq IDs, (B) human RefSeq IDs, and (C) human RefSeq IDs termed "transcript variant". (D) For human RefSeq IDs, consecutive releases were compared to each other to determine changes in the database over time.

Daniela Eggle, et al. BMC Bioinformatics. 2009;10:186-186.
18.
Figure 2.

Figure 2. From: Construction of mate pair full-length cDNAs libraries and characterization of transcriptional start sites and termination sites.

Evaluation of the TSS/PAS libraries. (A) Positions of the TSS/PAS mate pair tags relative to RefSeq genes. The frequencies of the TSS/PAS tags were calculated depending on their location within or outside RefSeq gene regions (left panels). Among the tags associated with RefSeq genes, their distributions were further separated depending on the internal positions of the RefSeq transcript models (right panels). The top panels show the TSS tags and the bottom panels show the PAS tags. The right panels represent the breakdowns of the population ‘inside RefSeq’ in the left panels. (B) The distribution of the locations of TSS tags and PAS tags relative to the RefSeq NM transcript model. The top panel shows the TSS tags and the bottom panel shows the PAS tags (see Supplementary Figure S2J for further breakdowns of the longer populations). Additionally, note that in many cases, the RefSeq model included a long transcript with a distal 5′-exon. At the same time, RefSeq annotates another 5′-exon, which overlaps with our TSC, downstream from that distal 5′-exon (right margin; Supplementary Figure S2J). For further details on the overlap between the TSCs or our data and the RefSeq data, see Supplementary Figure S2A–C. (C) Statistical significance of the biased distribution of the TSS tags and PAS tags to the TSC or PAC (left and right panels, respectively) calculated against the random distribution on the mRNA assuming a Poisson distribution. The numbers of TSCs or PACs giving the indicated P values (x-axis) are shown. The percentages in the plots show the proportions of the indicated populations (P < 1e−10).

Kyoko Matsumoto, et al. Nucleic Acids Res. 2014 Sep 15;42(16):e125-e125.
19.
Fig. 4

Fig. 4. From: Mouse genome annotation by the RefSeq project.

The UCSC Genome Browser does not accurately represent RefSeq data. a NCBI Sequence Viewer. Coordinates on mouse chromosome 17 (NC_000083.6 from 25,957,500 to 25,988,800) and a graphical display the neighboring loci, 1700022N22Rik (GeneID: 69431) and Capn15 (GeneID: 50817), were screen captured from NCBI sequence viewer in the Gene resource and labels were edited. b UCSC Genome Browser. Coordinates on mouse chromosome 17 (NC_000083.6) and the RefSeq Genes track were screen captured from the UCSC Genome Browser and labels were edited. No RefSeq models are displayed in the RefSeq Genes track

Kelly M. McGarvey, et al. Mamm Genome. 2015;26(9-10):379-390.
20.
Figure 1—figure supplement 1.

Figure 1—figure supplement 1.Comparison of Ens95 and RefSeq zebrafish transcriptome annotations for bulk RNA-seq analysis.. From: An improved zebrafish transcriptome annotation for sensitive and comprehensive detection of cell type-specific genes.

(A) Log10 average expression (n = 3) for kdrlpos-enriched genes as quantified by each indicated annotation. Each separate plot shows genes identified as kdrlpos-enriched only using or Ens95 or RefSeq. Data are not normally distributed, Wilcoxon matched-pairs signed-rank test, p values are indicated. (B) Venn diagram of intersection for genes with a common NCBI ID in Ens95 and RefSeq found significantly enriched in pdgfrbpos cells (log2 fold change [pdgfrbpos/pdgfrbneg]>1,adjp <0.05) using each annotation. (C, D) Volcano plots showing differentially expressed genes from TgBAC(pdgfrb:citrine)s1010-positive and negative cells identified using RNA-seq reads quantified using (C) RefSeq or (D) Ens95 annotations. Genes with significant differences (padj <0.05) are shown as red (log2 fold change >1) or blue (log2 fold change<-1). Grey dots are genes that fall below these statistical cutoffs; n = 3. Green dots indicate selected genes previously identified as expressed in vascular smooth muscle cells or pericytes (see main text). (E) Plots of commonly annotated genes identified as pdgfrbpos-enriched only by Ens95 with indicated values from Ens95 or RefSeq. (F) Correlation of expression levels from indicated annotation for pdgfrbpos-enriched genes identified selectively as such by Ens95 (maroon) or RefSeq (grey) only (left plot) or in both annotations (right plot). Data are not normally distributed, Spearman correlation, r values are indicated. (G) Log10 average expression (n = 3) for pdgfrbpos-enriched genes as quantified by each indicated annotation. Each separate plot shows genes only identified as pdgfrbpos-enriched using Ens95 or RefSeq. Data are not normally distributed, Wilcoxon matched-pairs signed-rank test, p values are indicated.
Figure 1—figure supplement 1—source data 1.DESeq2 output for pdgfrbpos and pdgfrbneg RNA-seq quantified with RefSeq (GCF_000002035.6_GRCz11) or Ensembl, v95.Gene expression levels were quantified using RSEM. Median ratio normalized expression values are shown for each replicate, along with adjusted p-value, and log2 fold change. Data used to generate plots in , and incorporated into source data tables indicated below.
Figure 1—figure supplement 1—source data 2.Intersection of pdgfrbpos-enriched genes from RefSeq and Ens95 commonly annotated by NCBI ID.Gene symbol, along with matching Ensembl gene ID and NCBI ID, as well as differential annotation (i.e., identified as differentially expressed only in RefSeq, Ens95, or both) are indicated. Expression data are derived from . Data used to generate plots in .

Nathan D Lawson, et al. eLife. 2020;9:e55792.

Display Settings:

Items per page

Supplemental Content

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...
Support Center