Experimental phasing with Phaser
From Media Wiki
Main Page - Using the CCP4 software - Experimental phasing with CCP4 - Experimental phasing with Phaser
Phaser is a program for phasing macromolecular crystal structures with maximum likelihood methods. It has been developed by Randy Read's group at the University of Cambridge and is available through the Phenix and CCP4 software suites. General information is available on the Phaser website. In particular, questions that are not answered by this document may be answered by the Phaser FAQ section of that website. As well, tutorials for carrying out experimental phasing calculations (including links to the necessary data) can be found in the Phaser tutorials page on that website.
This section describes the use of Phaser for experimental phasing by single-wavelength anomalous diffraction (SAD), which optionally can exploit information from a partial (molecular replacement) model. These capabilities are available in versions of Phaser above 2.1. Version 2.1.4 is part of CCP4 6.1. Molecular replacement with Phaser is described on a separate page.
Contents |
[edit] Concepts
The general principles of experimental phasing are described elsewhere. Likelihood-based methods are based on a probabilistic understanding of the diffraction experiment. In this view, experimental phasing arises from the joint probability distributions of collections of structure factors, e.g. native plus derivative, or Friedel pairs for observations made at different wavelengths. By explaining part of the structure factors and part of the correlations between pairs of structure factors, we gain information about the possible phase angles.
[edit] SAD likelihood target
For SAD phasing, the likelihood target is based on the joint distribution of the Friedel-related pair of reflections, F+ and F-. These differ in intensity because Friedel's law breaks down for mixtures of real and anomalous scatterers. If there is a model for the anomalous scatterers, then their contribution to F+ and F- can be computed. The likelihood target is derived by starting from the joint distribution of F+, F- and the structure factors computed from the anomalous scatterer model. The calculated structure factors can be fixed, giving the joint distribution of F+ and F- with their phases, then a likelihood target for the joint distribution of the measured structure factor amplitudes is derived by integrating over all possible values of the two phases. This treatment accounts for measurement errors, for the effect of errors in the anomalous scatterer model and, importantly, for the correlation in the errors of the calculated pair of structure factors.
[edit] Log-likelihood-gradient (LLG) map
By computing the derivative of the likelihood target with respect to the calculated heavy atom structure factor, one can obtain a map showing where, by adding anomalous scatterers, the likelihood target would be improved. Such a map gives a clearer picture of the location of missing anomalous scatterers than the conventional anomalous difference Fourier. In Phaser, LLG maps are used to find new sites for anomalous scatterers and to detect anisotropy in existing anomalous scatterers (indicated by peaks very close to existing atomic positions).
[edit] Running Phaser for automated SAD phasing
Before carrying out SAD phasing in Phaser, you need an atomic model. Usually this will be a model of the anomalous scatterers, which can be obtained with SHELX-D or with other programs such as HySS and SnB. Alternatively, you can provide a partial model of the structure of the macromolecule, with or without a set of anomalous scatterers. The partial model is often obtained by molecular replacement, but may be obtained by an iterative process of model-building and SAD phasing using the model. The model of anomalous scatterers need not be complete, as Phaser will carry out completion through LLG maps. It is better to leave out questionable sites at this stage and leave them for Phaser to find.
You will also need an MTZ file with columns for F+, F- and their standard deviations. If you only have a file with Fmean, DANO and their standard deviations, it is possible to convert this with the CCP4 program mtzMADmod, but be aware that the quality of the results will suffer: this process loses information about the relative precision of the F+ and F- measurements and about which of F+ and F- was measured, if only one of the pair was measured.
[edit] Mode
If you have only a model of the anomalous scatterers, use the default mode: Single wavelength anomalous dispersion (SAD).
If you have a partial model of the real scatterers (with or without additional anomalous scatterers), choose the other mode: SAD with molecular replacement partial structure.
[edit] Define data folder
The "Define data" folder of the interface is used to select an MTZ file containing the SAD diffraction data. You should check that the columns automatically chosen for F+, F- and their standard deviations are correct. Usually it will be fine to use data over the full resolution range, but if desired you can adjust the high resolution limit.
It is assumed that the space group in the MTZ file will be correct (apart from the ambiguity for enantiomorphic pairs like P3121 and P3221), because you need to assign the space group to determine the anomalous scatterer substructure. Unless you have some reason to believe that the hand of the heavy atom substructure is correct, you should choose "Both enantiomorphs" under "Enantiomer choice". In this case, Phaser will carry out phasing for both hands and produce two MTZ files; for enantiomorphic space groups, the hand of the space group will also be changed. Note that, if there is only a single type of anomalous scatterer, the structure factors computed from the substructure will obey Friedel's law, so the substructure completion does not need to be repeated and only the phasing will be carried out for the other hand. However, if there is a mixture of anomalous scatterer types, Friedel's law is no longer obeyed, so even the substructure completion will be carried out again by Phaser for the other hand.
For optimal results, Phaser needs to know the correct relative values of the real (f+f') and imaginary (f") components of the scattering factors for all anomalous scatterers. For data collected away from an absorption edge, it is most convenient to use a table lookup according to the wavelength (select "CuKa" or "at other wavelength" from the Scattering pulldown). Near the absorption edge, it is best to provide f' and f" values obtained "by fluorescence scan", when there is a single type of anomalous scatterer, or "explicitly by atomtype".
Usually it is best to allow Phaser to complete the anomalous substructure (leave "LLG-map completion" on). The default of accepting new sites above 6 sigma in the LLG map generally works well, but in cases with marginal SAD signal you may wish to lower the threshold somewhat, e.g. to 5 sigma. You can tell whether this is a good idea by looking at the Z-scores for peaks and holes in the last LLG maps. The noise level in the LLG map can be estimated from the highest Z-score for holes in the LLG map; if this is significantly less than 6 (e.g. around 5 or lower), it might be worth reducing the sigma level for completion.
It is important to specify which anomalous scatterers are expected in the substructure. If there is more than one type of anomalous scatterer, Phaser will compute a separate LLG map for each scatterer, and for each peak will assign the element that gave the largest Z-score (standard deviations above mean) in its LLG map.
[edit] Define atoms folder
The "Define atoms" folder allows you to specify the prior structural information, most often as a PDB file containing the anomalous substructure determined by a program such as SHELX-D, HySS or SnB, but sometimes including a partial model from molecular replacement or an earlier cycle of phasing and model-building.
[edit] SAD case
The positions of the anomalous scatterers are usually entered with a PDB file, but there are options to enter the sites individually in input boxes, using a SOL (.sol) file from a previous run of Phaser, or using the HA file format. Unless the sites have already been refined in Phaser, it is generally best to tick the "Set B factors to Wilson B" box, as the B-factor refinement will tend to be more stable.
Note that, if the substructure has been determined with SHELX-D, all of the atoms will be assigned as S atoms in the PDB file. In this case, if they are not S atoms the PDB file should be edited to replace the atom name with the correct element type.
[edit] SAD+MR case
In this case, the partial structure is entered as a PDB file. To initialise the variances describing the effect of model errors, Phaser needs an estimate of model accuracy, which can be entered either as the sequence identity of the molecular replacement model or as an estimated RMS error. The variances are refined, so the exact value entered is not usually important.
Optionally, the model of anomalous scatterers can also be entered, in the same formats as for the SAD case.
[edit] Composition folder
To put the data on absolute scale (which has the benefit that the refined occupancies are approximately correct), Phaser needs to know the content of the asymmetric unit. This is most conveniently defined by providing the sequences of any macromolecules in the crystal.
[edit] Define refinement parameters folder
This is closed by default, as the default refinement protocol has been tested to work in a variety of test cases. If desired, you can change the parameters that are refined after each cycle of LLG completion. More than one macrocycle can be specified; e.g. you may wish to refine only occupancies in one macrocycle, then refine a larger set of parameters in the next macrocycle.
[edit] Additional parameters folder
When opened, this folder gives access to a number of parameters that are not usually changed.
[edit] Output files
[edit] Log file
The most important details of the log file are marked up with summary tags. These parts appear in red in the ccp4i log file viewer, or can be viewed on their own by pressing the "Show Summary" button. The important results are summarized in the following.
[edit] Anisotropy correction
Phaser uses a likelihood target to refine an anisotropy correction. Once the anisotropy correction has been applied, the intensities should fall off equally in all directions in the diffraction pattern. (A better correction can be done once there is an atomic model, which is why it is better to let refinement programs carry out their own correction on the uncorrected data.) At the end of the analysis, Phaser reports the size of the correction along three principal axes of a thermal ellipsoid and reports an "anisotropic deltaB", which is the difference between the biggest and smallest components of the thermal ellipsoid. You can get a feel for the size of the effect of the anisotropy correction at the resolution limit by noting that intensities in the direction where the diffraction is weakest will be scaled up by a factor equal to exp(2deltaB/(4dmin2))=exp(deltaB/(2dmin2)), relative to the intensities from the strongest direction. For example if the anisotropic deltaB is 30A2 and your crystal diffracts to 3A resolution, the intensities in the weakest direction are scaled up by a factor of more than 5 compared to the intensities in the strongest direction. This therefore constitutes a significant level of anisotropy.
[edit] Cell content analysis
Phaser compares the content you have specified for the asymmetric unit (typically by giving sequence files for the different components and saying how many copies of each component are present) with the average content determined from an analysis by Kantardjieff and Rupp. You should look at how your content compares to the frequency distribution of previously observed contents, to see whether you should consider other possibilities for the number of copies.
[edit] SAD phasing and refinement
Cycles of refinement are interspersed with LLG completion steps. If the completion step changes the model, by the addition or deletion of atoms or by converting an isotropic atom to anisotropic, another refinement cycle is carried out.
At the beginning of each cycle, potential outliers are detected and (for reasons of reliability and numerical stability) left out of the rest of that cycle. Note that, in the initial cycle, the estimates of variances used to detect outliers will not be reliable, so little attention should be paid to the initial assignment of outliers or the estimated figures of merit.
Each completion step analyses the LLG map(s). Peaks near isotropic atoms cause those atoms to become anisotropic, and other peaks are assigned as new anomalous scatterers. Atoms that refine to a low occupancy are deleted. However, if a deleted atom is reintroduced in a subsequent completion step, it will be flagged so that it is not repeatedly deleted and restored.
When substructure completion has converged then, if requested, phasing will be carried out on the other hand for the anomalous scatterer model (and space group for an enantiomorphic space group). If the substructure contains a mixture of anomalous scatterer types, this will restart from the initial model; otherwise, only the phasing calculation will be repeated.
[edit] MTZ file
This contains the columns that were present in the input MTZ file, plus the following:
- FWT/PHWT: map coefficients computed with weights and a correction for bias from the real scattering of the model
- PHIB/FOM: centroid phase and associated figure of merit
- FPFOM: pseudo-FOM that can be paired with the mean amplitude and PHIB to reproduce the FWT/PHWT map coefficients (used for Resolve)
- HLA/HLB/HLC/HLD: Hendrickson-Lattman coefficients encoding the phase probability information
- FLLG/PHLLG: coefficients for the LLG map for the first anomalous scatterer in the LLG completion list
[edit] PDB file
This contains the atomic model of the anomalous substructure.
[edit] .sol file
This contains all refined parameters, including the atomic model of the anomalous substructure, estimated variances and f" values.
[edit] Following Phaser with DM
It is a good idea to provide DM with the detailed phase probability information encoded in the Hendrickson-Lattman coefficients (HLA/HLB/HLC/HLD). In addition, the normal figure-of-merit weighted map coefficients that DM would use by default can contain significant model bias, particularly if you have started Phaser from a partial molecular replacement model. So it is better to start DM off with a map computed from the bias-corrected map coefficients, FWT/PHWT. In CCP4 6.1, this can be achieved from the ccp4i interface to DM, but in earlier versions of ccp4i it is necessary to choose the "Run&View Com File" option and edit the LABIN command line to DM, adding "FDM=FWT PHIDM=PHWT".
In cases where there is significant anisotropy in the diffraction data, you might wish to try an additional DM run using the anisotropy-corrected amplitudes and sigmas that can be produced in a separate MR_ANO run (accessed from the molecular replacement with Phaser interface). If your original data were labelled F/SIGF, the corrected data will be labelled F_ANO/SIGF_ANO.
[edit] Following Phaser+DM with ARP/wARP
In general, the best results are obtained by using the MLHL target for the Refmac refinement part of ARP/wARP cycles. The phase information should be specified by choosing the Hendrickson-Lattman coefficients from Phaser (HLA/HLB/HLC/HLD); the coefficients produced by DM (HLADM/HLBDM/HLCDM/HLDDM) tend to be overly optimistic.
You should not use data corrected for anisotropy by Phaser in an ARP/wARP job, because Refmac will do a better job of correcting for anisotropy by comparing observed and calculated structure factor amplitudes.
