Class PDBFileParser


  • public class PDBFileParser
    extends java.lang.Object
    This class implements the actual PDB file parsing. Do not access it directly, but via the PDBFileReader class.

    Parsing

    During the PDBfile parsing several Flags can be set:
    • setParseCAOnly(boolean) - parse only the Atom records for C-alpha atoms
    • setParseSecStruc(boolean) - a flag if the secondary structure information from the PDB file (author's assignment) should be parsed. If true the assignment can be accessed through AminoAcid.getSecStruc();
    • setAlignSeqRes(boolean) - should the AminoAcid sequences from the SEQRES and ATOM records of a PDB file be aligned? (default:yes)

    To provide excessive memory usage for large PDB files, there is the ATOM_CA_THRESHOLD. If more Atoms than this threshold are being parsed in a PDB file, the parser will automatically switch to a C-alpha only representation.

    The result of the parsing of the PDB file is a new Structure object.

    For more documentation on how to work with the Structure API please see http://biojava.org/wiki/BioJava:CookBook#Protein_Structure

    Example

    Q: How can I get a Structure object from a PDB file?

    A:

     public Structure loadStructure(String pathToPDBFile){
                // The PDBFileParser is wrapped by the PDBFileReader
                    PDBFileReader pdbreader = new PDBFileReader();
    
                    Structure structure = null;
                    try{
                            structure = pdbreader.getStructure(pathToPDBFile);
                            System.out.println(structure);
                    } catch (IOException e) {
                            e.printStackTrace();
                    }
                    return structure;
            }
     
    Since:
    1.4
    Author:
    Andreas Prlic, Jules Jacobsen
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int ATOM_CA_THRESHOLD
      the maximum number of atoms that will be parsed before the parser switches to a CA-only representation of the PDB file.
      static java.lang.String HELIX
      Helix secondary structure assignment.
      static int MAX_ATOMS
      the maximum number of atoms we will add to a structure this protects from memory overflows in the few really big protein structures.
      boolean parseCAOnly
      Set the flag to only read in Ca atoms - this is useful for parsing large structures like 1htq.
      static java.lang.String PDB_AUTHOR_ASSIGNMENT
      Secondary strucuture assigned by the PDB author/
      static java.lang.String STRAND
      Strand secondary structure assignment.
      static java.lang.String TURN
      Turn secondary structure assignment.
    • Constructor Summary

      Constructors 
      Constructor Description
      PDBFileParser()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.lang.String getTimeStamp()
      Returns a time stamp.
      boolean isAlignSeqRes()
      Flag if the SEQRES amino acids should be aligned with the ATOM amino acids.
      boolean isParseCAOnly()
      the flag if only the C-alpha atoms of the structure should be parsed.
      boolean isParseSecStruc()
      is secondary structure assignment being parsed from the file? default is null
      void linkChains2Compound​(Structure s)
      After the parsing of a PDB file the Chain and Compound objects need to be linked to each other.
      Structure parsePDBFile​(java.io.BufferedReader buf)
      parse a PDB file and return a datastructure implementing PDBStructure interface.
      Structure parsePDBFile​(java.io.InputStream inStream)
      parse a PDB file and return a datastructure implementing PDBStructure interface.
      void setAlignSeqRes​(boolean alignSeqRes)
      define if the SEQRES in the structure should be aligned with the ATOM records if yes, the AminoAcids in structure.getSeqRes will have the coordinates set.
      void setParseCAOnly​(boolean parseCAOnly)
      the flag if only the C-alpha atoms of the structure should be parsed.
      void setParseSecStruc​(boolean parseSecStruc)
      a flag to tell the parser to parse the Author's secondary structure assignment from the file default is set to false, i.e.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • PDB_AUTHOR_ASSIGNMENT

        public static final java.lang.String PDB_AUTHOR_ASSIGNMENT
        Secondary strucuture assigned by the PDB author/
        See Also:
        Constant Field Values
      • HELIX

        public static final java.lang.String HELIX
        Helix secondary structure assignment.
        See Also:
        Constant Field Values
      • STRAND

        public static final java.lang.String STRAND
        Strand secondary structure assignment.
        See Also:
        Constant Field Values
      • TURN

        public static final java.lang.String TURN
        Turn secondary structure assignment.
        See Also:
        Constant Field Values
      • ATOM_CA_THRESHOLD

        public static final int ATOM_CA_THRESHOLD
        the maximum number of atoms that will be parsed before the parser switches to a CA-only representation of the PDB file. If this limit is exceeded also the SEQRES groups will be ignored.
        See Also:
        Constant Field Values
      • MAX_ATOMS

        public static final int MAX_ATOMS
        the maximum number of atoms we will add to a structure this protects from memory overflows in the few really big protein structures.
        See Also:
        Constant Field Values
      • parseCAOnly

        public boolean parseCAOnly
        Set the flag to only read in Ca atoms - this is useful for parsing large structures like 1htq.
    • Constructor Detail

      • PDBFileParser

        public PDBFileParser()
    • Method Detail

      • isParseCAOnly

        public boolean isParseCAOnly()
        the flag if only the C-alpha atoms of the structure should be parsed.
        Returns:
        the flag
      • setParseCAOnly

        public void setParseCAOnly​(boolean parseCAOnly)
        the flag if only the C-alpha atoms of the structure should be parsed.
        Parameters:
        parseCAOnly - boolean flag to enable or disable C-alpha only parsing
      • isAlignSeqRes

        public boolean isAlignSeqRes()
        Flag if the SEQRES amino acids should be aligned with the ATOM amino acids.
        Returns:
        flag if SEQRES - ATOM amino acids alignment is enabled
      • setAlignSeqRes

        public void setAlignSeqRes​(boolean alignSeqRes)
        define if the SEQRES in the structure should be aligned with the ATOM records if yes, the AminoAcids in structure.getSeqRes will have the coordinates set.
        Parameters:
        alignSeqRes -
      • isParseSecStruc

        public boolean isParseSecStruc()
        is secondary structure assignment being parsed from the file? default is null
        Returns:
        boolean if HELIX STRAND and TURN fields are being parsed
      • setParseSecStruc

        public void setParseSecStruc​(boolean parseSecStruc)
        a flag to tell the parser to parse the Author's secondary structure assignment from the file default is set to false, i.e. do NOT parse.
        Parameters:
        parseSecStruc - if HELIX STRAND and TURN fields are being parsed
      • getTimeStamp

        protected java.lang.String getTimeStamp()
        Returns a time stamp.
        Returns:
        a String representing the time stamp value
      • parsePDBFile

        public Structure parsePDBFile​(java.io.InputStream inStream)
                               throws java.io.IOException
        parse a PDB file and return a datastructure implementing PDBStructure interface.
        Parameters:
        inStream - an InputStream object
        Returns:
        a Structure object
        Throws:
        java.io.IOException
      • parsePDBFile

        public Structure parsePDBFile​(java.io.BufferedReader buf)
                               throws java.io.IOException
        parse a PDB file and return a datastructure implementing PDBStructure interface.
        Parameters:
        buf - a BufferedReader object
        Returns:
        the Structure object
        Throws:
        java.io.IOException - ...
      • linkChains2Compound

        public void linkChains2Compound​(Structure s)
        After the parsing of a PDB file the Chain and Compound objects need to be linked to each other.
        Parameters:
        s - the structure