NCBI Nerophis ophidion Annotation Release GCF_033978795.1-RS_2023_12

The genome sequence records for Nerophis ophidion RefSeq assembly GCF_033978795.1 (RoL_Noph_v1.0) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_033978795.1-RS_2023_12".

Date of Entrez queries for transcripts and proteins: Dec 20 2023
Date of submission of annotation to the public databases: Dec 26 2023
Software version: 10.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
RoL_Noph_v1.0	GCF_033978795.1	University of Idaho at Moscow	12-01-2023	Reference	29 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	RoL_Noph_v1.0
Genes and pseudogenes	35,195
protein-coding	23,280
non-coding	8,896
Transcribed pseudogenes	0
Non-transcribed pseudogenes	2,986
genes with variants	11,428
Immunoglobulin/T-cell receptor gene segments	24
other	9
mRNAs	48,545
fully-supported	47,466
with > 5% ab initio	440
partial	113
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	48,545
non-coding RNAs	11,949
fully-supported	6,684
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	7,953
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	48,569
fully-supported	47,466
with > 5% ab initio	550
partial	114
with major correction(s)	455
known RefSeq (NP_)	0
model RefSeq (XP_)	48,545

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	32,185	37,831	15,068	55	1,231,520
All transcripts	60,494	3,377	2,612	55	101,520
mRNA	48,545	3,869	3,037	209	101,520
misc_RNA	1,970	3,897	3,038	195	17,707
tRNA	3,996	74	73	68	93
lncRNA	4,714	1,692	1,082	106	21,468
snoRNA	253	131	128	63	316
snRNA	490	148	141	55	193
rRNA	517	846	154	119	3,990
Single-exon transcripts	851	2,182	1,699	209	13,136
coding transcripts (NM_/XM_ )	851	2,182	1,699	209	13,136
CDSs	48,545	2,139	1,536	96	100,356
Exons	294,658	342	141	1	23,389
in coding transcripts (NM_/XM_ )	277,628	328	140	1	23,389
in non-coding transcripts (NR_/XR_ )	28,610	414	145	10	20,877
Introns	262,471	5,229	1,386	30	1,109,931
in coding transcripts (NM_/XM_ )	250,588	5,089	1,388	30	1,004,869
in non-coding transcripts (NR_/XR_ )	23,216	6,305	1,312	30	1,109,931

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2	1	1	50
Number of exons per transcript	12.33	9	1	271

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the actinopterygii_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 23280 coding genes, 21305 genes had a protein with an alignment covering 50% or more of the query and 10048 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
RoL_Noph_v1.0	GCF_033978795.1	66.10%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	1	1 (100.00%)	1 (100.00%)	95.16%	35.94%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	4,370,913,706	74%	45%	307,395
SAMN20842824	NA	brood pouch (Nerophis ophidion, male, SAMN20842824)	50,813,002	79%	50%	201,341
SAMN20842825	NA	brood pouch (Nerophis ophidion, male, SAMN20842825)	50,162,414	81%	50%	199,518
SAMN20842826	NA	brood pouch (Nerophis ophidion, male, SAMN20842826)	52,419,656	80%	51%	203,534
SAMN20842827	NA	brood pouch (Nerophis ophidion, male, SAMN20842827)	52,178,950	81%	56%	202,930
SAMN20842828	NA	brood pouch (Nerophis ophidion, male, SAMN20842828)	51,230,978	81%	56%	193,570
SAMN20842829	NA	brood pouch (Nerophis ophidion, male, SAMN20842829)	49,769,546	82%	51%	193,717
SAMN20842830	NA	brood pouch (Nerophis ophidion, male, SAMN20842830)	48,912,250	76%	53%	203,125
SAMN20842831	NA	brood pouch (Nerophis ophidion, male, SAMN20842831)	47,181,422	77%	52%	195,363
SAMN20842832	NA	brood pouch (Nerophis ophidion, male, SAMN20842832)	50,887,046	79%	48%	194,336
SAMN20842833	NA	brood pouch (Nerophis ophidion, male, SAMN20842833)	52,531,102	79%	54%	204,629
SAMN20842834	NA	brood pouch (Nerophis ophidion, male, SAMN20842834)	52,240,228	79%	54%	199,603
SAMN20842835	NA	brood pouch (Nerophis ophidion, male, SAMN20842835)	53,423,980	78%	50%	206,589
SAMN20842836	NA	brood pouch (Nerophis ophidion, male, SAMN20842836)	51,639,326	79%	50%	194,114
SAMN20842837	NA	brood pouch (Nerophis ophidion, male, SAMN20842837)	53,021,964	77%	50%	174,338
SAMN20842838	NA	brood pouch (Nerophis ophidion, male, SAMN20842838)	48,347,130	76%	48%	170,518
SAMN20842839	NA	brood pouch (Nerophis ophidion, male, SAMN20842839)	45,841,674	75%	51%	150,733
SAMN20842840	NA	brood pouch (Nerophis ophidion, male, SAMN20842840)	48,609,136	79%	45%	170,739
SAMN20842841	NA	brood pouch (Nerophis ophidion, male, SAMN20842841)	48,607,104	79%	49%	174,649
SAMN20842842	NA	brood pouch (Nerophis ophidion, male, SAMN20842842)	51,810,578	78%	44%	178,733
SAMN20842843	NA	brood pouch (Nerophis ophidion, male, SAMN20842843)	45,785,596	77%	42%	173,489
SAMN20842844	NA	brood pouch (Nerophis ophidion, male, SAMN20842844)	47,414,232	77%	44%	178,520
SAMN20842845	NA	brood pouch (Nerophis ophidion, male, SAMN20842845)	47,730,946	77%	43%	176,892
SAMN20842846	NA	brood pouch (Nerophis ophidion, male, SAMN20842846)	48,034,652	76%	42%	169,534
SAMN20842847	NA	brood pouch (Nerophis ophidion, male, SAMN20842847)	48,761,804	77%	42%	167,920
SAMN23848236	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848236)	67,157,488	73%	46%	203,486
SAMN23848237	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848237)	45,936,238	74%	46%	191,434
SAMN23848238	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848238)	78,487,870	73%	43%	218,546
SAMN23848239	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848239)	55,977,030	78%	43%	192,469
SAMN23848240	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848240)	43,727,306	75%	49%	177,458
SAMN23848241	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848241)	37,716,388	74%	42%	170,272
SAMN23848242	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848242)	30,970,472	78%	48%	180,048
SAMN23848243	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848243)	48,035,390	73%	49%	193,010
SAMN23848244	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848244)	59,428,734	72%	51%	230,438
SAMN23848245	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848245)	43,593,538	73%	51%	172,011
SAMN23848246	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848246)	49,936,706	74%	34%	184,615
SAMN23848247	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848247)	39,146,928	75%	51%	172,671
SAMN23848248	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848248)	17,705,978	75%	46%	134,400
SAMN23848249	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848249)	23,854,298	72%	41%	161,486
SAMN23848250	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848250)	32,756,744	1%	55%	2,827
SAMN23848251	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848251)	42,714,370	73%	41%	177,776
SAMN23848252	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848252)	54,469,838	75%	44%	189,089
SAMN23848253	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848253)	51,511,068	76%	48%	197,759
SAMN23848254	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848254)	23,477,416	77%	49%	141,539
SAMN23848255	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848255)	41,505,870	72%	44%	172,566
SAMN23848256	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848256)	45,472,206	75%	43%	172,735
SAMN23848257	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848257)	33,395,274	72%	45%	175,811
SAMN23848258	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848258)	40,573,714	72%	43%	170,007
SAMN23848259	NA	Surgical Area Tissue (Nerophis ophidion, female, SAMN23848259)	45,935,256	72%	43%	187,309
SAMN23848260	NA	Gill (Nerophis ophidion, female, SAMN23848260)	42,732,978	66%	42%	198,174
SAMN23848261	NA	Gill (Nerophis ophidion, female, SAMN23848261)	71,348,970	70%	40%	217,292
SAMN23848262	NA	Gill (Nerophis ophidion, female, SAMN23848262)	62,878,272	66%	43%	207,945
SAMN23848263	NA	Gill (Nerophis ophidion, female, SAMN23848263)	42,466,372	67%	42%	197,613
SAMN23848264	NA	Gill (Nerophis ophidion, female, SAMN23848264)	52,944,738	65%	43%	209,329
SAMN23848265	NA	Gill (Nerophis ophidion, female, SAMN23848265)	33,488,376	58%	42%	188,667
SAMN23848266	NA	Gill (Nerophis ophidion, female, SAMN23848266)	57,438,590	68%	40%	207,742
SAMN23848267	NA	Gill (Nerophis ophidion, female, SAMN23848267)	61,813,544	60%	42%	212,079
SAMN23848268	NA	Gill (Nerophis ophidion, female, SAMN23848268)	63,963,472	67%	44%	208,413
SAMN23848269	NA	Gill (Nerophis ophidion, female, SAMN23848269)	65,023,266	68%	41%	215,933
SAMN23848270	NA	Gill (Nerophis ophidion, female, SAMN23848270)	51,883,848	72%	41%	206,704
SAMN23848271	NA	Gill (Nerophis ophidion, female, SAMN23848271)	77,141,574	70%	42%	219,391
SAMN23848272	NA	Gill (Nerophis ophidion, female, SAMN23848272)	76,994,442	67%	43%	212,677
SAMN23848273	NA	Gill (Nerophis ophidion, female, SAMN23848273)	45,468,360	71%	43%	200,974
SAMN23848274	NA	Gill (Nerophis ophidion, female, SAMN23848274)	53,424,104	64%	44%	205,303
SAMN23848275	NA	Gill (Nerophis ophidion, female, SAMN23848275)	70,143,202	66%	42%	217,023
SAMN23848276	NA	Gill (Nerophis ophidion, female, SAMN23848276)	75,861,044	69%	43%	215,265
SAMN23848277	NA	Gill (Nerophis ophidion, female, SAMN23848277)	58,364,960	64%	45%	204,863
SAMN23848278	NA	Gill (Nerophis ophidion, female, SAMN23848278)	66,825,960	70%	43%	213,826
SAMN23848279	NA	Gill (Nerophis ophidion, female, SAMN23848279)	52,852,166	64%	43%	206,584
SAMN23848280	NA	Gill (Nerophis ophidion, female, SAMN23848280)	66,180,344	71%	43%	213,932
SAMN23848281	NA	Gill (Nerophis ophidion, female, SAMN23848281)	55,266,302	70%	42%	207,004
SAMN23848282	NA	Gill (Nerophis ophidion, female, SAMN23848282)	53,327,858	66%	44%	203,147
SAMN23848283	NA	Gill (Nerophis ophidion, female, SAMN23848283)	48,062,788	54%	45%	193,098
SAMN38562648	24680775	Brain (Nerophis ophidion, male, SAMN38562648)	104,136,788	83%	22%	220,416
SAMN38562649	24680775	Eye (Nerophis ophidion, male, SAMN38562649)	165,631,020	84%	30%	247,110
SAMN38562650	24680775	Gill (Nerophis ophidion, male, SAMN38562650)	97,411,782	87%	40%	218,137
SAMN38562651	24680775	Liver (Nerophis ophidion, male, SAMN38562651)	80,594,372	90%	58%	182,391
SAMN38562652	24680775	Ovary (Nerophis ophidion, male, SAMN38562652)	143,953,900	89%	51%	241,836
SAMN38562653	24680775	Testes (Nerophis ophidion, male, SAMN38562653)	122,449,478	88%	54%	204,455

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR15507395	SRX11806593	SRP332988	SAMN20842824	50,813,002	79%	50%
SRR15507394	SRX11806594	SRP332988	SAMN20842825	50,162,414	81%	50%
SRR15507392	SRX11806596	SRP332988	SAMN20842826	52,419,656	80%	51%
SRR15507391	SRX11806597	SRP332988	SAMN20842827	52,178,950	81%	56%
SRR15507390	SRX11806598	SRP332988	SAMN20842828	51,230,978	81%	56%
SRR15507389	SRX11806599	SRP332988	SAMN20842829	49,769,546	82%	51%
SRR15507388	SRX11806600	SRP332988	SAMN20842830	48,912,250	76%	53%
SRR15507387	SRX11806601	SRP332988	SAMN20842831	47,181,422	77%	52%
SRR15507386	SRX11806602	SRP332988	SAMN20842832	50,887,046	79%	48%
SRR15507385	SRX11806603	SRP332988	SAMN20842833	52,531,102	79%	54%
SRR15507384	SRX11806604	SRP332988	SAMN20842834	52,240,228	79%	54%
SRR15507383	SRX11806605	SRP332988	SAMN20842835	53,423,980	78%	50%
SRR15507381	SRX11806607	SRP332988	SAMN20842836	51,639,326	79%	50%
SRR15507380	SRX11806608	SRP332988	SAMN20842837	53,021,964	77%	50%
SRR15507379	SRX11806609	SRP332988	SAMN20842838	48,347,130	76%	48%
SRR15507378	SRX11806610	SRP332988	SAMN20842839	45,841,674	75%	51%
SRR15507377	SRX11806611	SRP332988	SAMN20842840	48,609,136	79%	45%
SRR15507376	SRX11806612	SRP332988	SAMN20842841	48,607,104	79%	49%
SRR15507375	SRX11806613	SRP332988	SAMN20842842	51,810,578	78%	44%
SRR15507374	SRX11806614	SRP332988	SAMN20842843	45,785,596	77%	42%
SRR15507373	SRX11806615	SRP332988	SAMN20842844	47,414,232	77%	44%
SRR15507372	SRX11806616	SRP332988	SAMN20842845	47,730,946	77%	43%
SRR15507370	SRX11806618	SRP332988	SAMN20842846	48,034,652	76%	42%
SRR15507369	SRX11806619	SRP332988	SAMN20842847	48,761,804	77%	42%
SRR17194140	SRX13374600	SRP350203	SAMN23848236	67,157,488	73%	46%
SRR17194139	SRX13374601	SRP350203	SAMN23848237	45,936,238	74%	46%
SRR17194137	SRX13374603	SRP350203	SAMN23848238	78,487,870	73%	43%
SRR17194136	SRX13374604	SRP350203	SAMN23848239	55,977,030	78%	43%
SRR17194135	SRX13374605	SRP350203	SAMN23848240	43,727,306	75%	49%
SRR17194134	SRX13374606	SRP350203	SAMN23848241	37,716,388	74%	42%
SRR17194133	SRX13374607	SRP350203	SAMN23848242	30,970,472	78%	48%
SRR17194132	SRX13374608	SRP350203	SAMN23848243	48,035,390	73%	49%
SRR17194131	SRX13374609	SRP350203	SAMN23848244	59,428,734	72%	51%
SRR17194130	SRX13374610	SRP350203	SAMN23848245	43,593,538	73%	51%
SRR17194129	SRX13374611	SRP350203	SAMN23848246	49,936,706	74%	34%
SRR17194128	SRX13374612	SRP350203	SAMN23848247	39,146,928	75%	51%
SRR17194126	SRX13374614	SRP350203	SAMN23848248	17,705,978	75%	46%
SRR17194125	SRX13374615	SRP350203	SAMN23848249	23,854,298	72%	41%
SRR17194124	SRX13374616	SRP350203	SAMN23848250	32,756,744	1%	55%
SRR17194123	SRX13374617	SRP350203	SAMN23848251	42,714,370	73%	41%
SRR17194122	SRX13374618	SRP350203	SAMN23848252	54,469,838	75%	44%
SRR17194121	SRX13374619	SRP350203	SAMN23848253	51,511,068	76%	48%
SRR17194120	SRX13374620	SRP350203	SAMN23848254	23,477,416	77%	49%
SRR17194119	SRX13374621	SRP350203	SAMN23848255	41,505,870	72%	44%
SRR17194118	SRX13374622	SRP350203	SAMN23848256	45,472,206	75%	43%
SRR17194117	SRX13374623	SRP350203	SAMN23848257	33,395,274	72%	45%
SRR17194115	SRX13374625	SRP350203	SAMN23848258	40,573,714	72%	43%
SRR17194114	SRX13374626	SRP350203	SAMN23848259	45,935,256	72%	43%
SRR17194113	SRX13374627	SRP350203	SAMN23848260	42,732,978	66%	42%
SRR17194112	SRX13374628	SRP350203	SAMN23848261	71,348,970	70%	40%
SRR17194111	SRX13374629	SRP350203	SAMN23848262	62,878,272	66%	43%
SRR17194110	SRX13374630	SRP350203	SAMN23848263	42,466,372	67%	42%
SRR17194109	SRX13374631	SRP350203	SAMN23848264	52,944,738	65%	43%
SRR17194108	SRX13374632	SRP350203	SAMN23848265	33,488,376	58%	42%
SRR17194107	SRX13374633	SRP350203	SAMN23848266	57,438,590	68%	40%
SRR17194106	SRX13374634	SRP350203	SAMN23848267	61,813,544	60%	42%
SRR17194104	SRX13374636	SRP350203	SAMN23848268	63,963,472	67%	44%
SRR17194103	SRX13374637	SRP350203	SAMN23848269	65,023,266	68%	41%
SRR17194102	SRX13374638	SRP350203	SAMN23848270	51,883,848	72%	41%
SRR17194101	SRX13374639	SRP350203	SAMN23848271	77,141,574	70%	42%
SRR17194100	SRX13374640	SRP350203	SAMN23848272	76,994,442	67%	43%
SRR17194099	SRX13374641	SRP350203	SAMN23848273	45,468,360	71%	43%
SRR17194098	SRX13374642	SRP350203	SAMN23848274	53,424,104	64%	44%
SRR17194097	SRX13374643	SRP350203	SAMN23848275	70,143,202	66%	42%
SRR17194096	SRX13374644	SRP350203	SAMN23848276	75,861,044	69%	43%
SRR17194095	SRX13374645	SRP350203	SAMN23848277	58,364,960	64%	45%
SRR17194093	SRX13374647	SRP350203	SAMN23848278	66,825,960	70%	43%
SRR17194092	SRX13374648	SRP350203	SAMN23848279	52,852,166	64%	43%
SRR17194091	SRX13374649	SRP350203	SAMN23848280	66,180,344	71%	43%
SRR17194090	SRX13374650	SRP350203	SAMN23848281	55,266,302	70%	42%
SRR17194089	SRX13374651	SRP350203	SAMN23848282	53,327,858	66%	44%
SRR17194088	SRX13374652	SRP350203	SAMN23848283	48,062,788	54%	45%
SRR27015481	SRX22708012	SRP383388	SAMN38562648	104,136,788	83%	22%
SRR27015480	SRX22708013	SRP383388	SAMN38562649	165,631,020	84%	30%
SRR27015479	SRX22708014	SRP383388	SAMN38562650	97,411,782	87%	40%
SRR27015478	SRX22708015	SRP383388	SAMN38562651	80,594,372	90%	58%
SRR27015477	SRX22708016	SRP383388	SAMN38562652	143,953,900	89%	51%
SRR27015476	SRX22708017	SRP383388	SAMN38562653	122,449,478	88%	54%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Hippocampus comes high-quality model RefSeq (XP_)	15,317	15,084 (98.48%)	15,084 (98.48%)	71.84%	80.31%
Betta splendens high-quality model RefSeq (XP_)	18,289	17,766 (97.14%)	17,766 (97.14%)	70.35%	78.38%
Actinopterygii GenBank	94,233	87,526 (92.88%)	87,526 (92.88%)	69.25%	79.83%
Actinopterygii known RefSeq (NP_)	25,752	23,853 (92.63%)	23,853 (92.63%)	68.83%	77.82%
Danio rerio high-quality model RefSeq (XP_)	7,594	7,096 (93.44%)	7,096 (93.44%)	68.00%	72.24%
Esox lucius high-quality model RefSeq (XP_)	18,508	17,687 (95.56%)	17,687 (95.56%)	68.42%	75.51%
Xiphophorus maculatus high-quality model RefSeq (XP_)	18,457	17,877 (96.86%)	17,877 (96.86%)	69.78%	77.60%
Homo sapiens known RefSeq (NP_)	67,607	55,805 (82.54%)	55,805 (82.54%)	67.32%	70.42%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences