AGP Validation
Files structure and content can be validated at two different levels:
I. File content and structure
program: agp_validate
parameters:
purpose: this program checks for text formatting and consistency (as defined by the AGP Specification). It also prints the following statistics for components, gaps, scaffolds and objects.
Error level violations:
- Incorrect number of columns (excluding comments): There should be 9 tab-separated columns; the first 8 should not be emtpy.
- Non-positive integers in the following columns:
- 2: object_beg
- 3: object_end
- 4: part_num
- 6b: gap_length
- 7a: component_beg
- 8a: component_end
- Objects have ranges that are non-sequential and/or overlapping.
- object_beg is greater than the object_end.
- component_beg is greater than the component_end.
- The length of the span specified for the component (in column 7a and 8a) does not match the length of the span specified for the object (in column 2 and 3).
- An object not having an object_beg coordinate of 1.
- An object not having a part_number of 1.
- The value of column 4 (part_number) not equal to the previous part_number + 1.
- Invalid terms or symbols in the following columns:
- 5: component_type
- 7b: gap_type
- 8b: linkage
- 9a: orientation
- 0 or na component orientation used for a non-singleton scaffolds
- Objects with non-sequential lines and/or lines mixed with other objects.
Warning level violations:
- A gap at the beginning or the end of an object
- Consecutive gap lines of the same type
- Linkage=yes with a gap_type other than fragment, clone or repeat.
- Overlapping spans used for a given component_id.
- Non-draft component_id used more than once.
- Non-draft component spans out of order.
- Extra tab character at the end of the line.
II. Overlap and switch point validation
This is currently unavailable. Support for this may be added in the future.
