| Trees | Indices | Help |
|---|
|
|
Parser for (new) ACE files output by PHRAP.
version 1.3, 05/06/2004
Written by Frank Kauff (fkauff@duke.edu) and
Cymon J. Cox (cymon@duke.edu)
Uses the Biopython Parser interface: ParserSupport.py
Usage:
There are two ways of reading an ace file: The ACEParser() reads
the whole file at once and the RecordParser() reads contig after
contig.
1) Parse whole ace file at once:
from Bio.Sequencing import Ace
aceparser=Ace.ACEParser()
acefilerecord=aceparser.parse(open('my_ace_file.ace','r'))
This gives you:
acefilerecord.ncontigs (the number of contigs in the ace file)
acefilerecord.nreads (the number of reads in the ace file)
acefilerecord.contigs[] (one instance of the Contig class for each contig)
The Contig class holds the info of the CO tag, CT and WA tags, and all the reads used
for this contig in a list of instances of the Read class, e.g.:
contig3=acefilerecord.contigs[2]
read4=contig3.reads[3]
RD_of_read4=read4.rd
DS_of_read4=read4.ds
CT, WA, RT tags from the end of the file can appear anywhere are automatically
sorted into the right place.
see _RecordConsumer for details.
2) Or you can iterate over the contigs of an ace file one by one in the ususal way:
from Bio.Sequencing import Ace
recordparser=Ace.RecordParser()
iterator=Ace.Iterator(open('my_ace_file.ace','r'),recordparser)
for contig in iterator :
print contig.name
...
Please note that for memory efficiency, when using the iterator approach, only one
contig is kept in memory at once. However, there can be a footer to the ACE file
containing WA, CT, RT or WR tags which contain additional meta-data on the contigs.
Because the parser doesn't see this data until the final record, it cannot be added to
the appropriate records. Instead these tags will be returned with the last contig record.
Thus an ace file does not entirerly suit the concept of iterating. If WA, CT, RT, WR tags
are needed, the ACEParser instead of the RecordParser might be appropriate.
|
|||
|
rd RD (reads), store a read with its name, sequence etc. |
|||
|
qa QA (read quality), including which part if any was used as the consensus. |
|||
|
ds DS lines, include file name of a read's chromatogram file. |
|||
|
af AF lines, define the location of the read within the contig. |
|||
|
bs "BS (base segment), which read was chosen as the consensus at each position. |
|||
|
rt RT (transient read tags), generated by crossmatch and phrap. |
|||
|
ct CT (consensus tags). |
|||
|
wa WA (whole assembly tag), holds the assembly program name, version, etc. |
|||
|
wr WR lines. |
|||
|
Reads Holds information about a read supporting an ACE contig. |
|||
|
Contig Holds information about a contig from an ACE record. |
|||
|
Iterator Iterates over an ACE-file with multiple contigs. |
|||
|
RecordParser Parses ACE file data into a Record object. |
|||
|
ACEFileRecord Holds data of an ACE file. |
|||
|
ACEParser Parses full ACE file in list of contigs. |
|||
|
_Scanner Scans an ACE-formatted file. |
|||
|
_RecordConsumer Reads the ace tags into data records. |
|||
|
|||
StringTypes =
|
|||
xml_support = 1
|
|||
| Trees | Indices | Help |
|---|
| Generated by Epydoc 3.0.1 on Mon Sep 15 09:22:37 2008 | http://epydoc.sourceforge.net |