Data processing

From Media Wiki

Jump to: navigation, search

The term "Data processing" is used here to mean the whole process from a series of X-ray diffraction images through to a list of structure amplitudes (F) for each unique reflection. It comprises a series of steps.

Contents

[edit] Integration of diffraction images

This involves the following operations:

  • index the image correctly, ie assign Miller indices to the diffraction spots and a unit cell to the lattice;
  • refine all parameters of the experiment (cell, crystal orientation, detector position etc.)
  • assign spot profile, estimate intensities and associated error estimates, and fractionality for each observation, indexed as HKL.
  • integration may be carried out in the Mosflm system; see also the tutorial information included with it.

[edit] Determination of Laue group

  • Indexing the lattice of spots from the images provides a unit cell but no indication of the symmetry of the diffraction pattern. Once integrated intensities are available, they can be inspected to determine the point group symmetry of the diffraction pattern (the Laue group plus any lattice centering).
  • At the same time, it may be possible to detect screw axes from axial systematic absences, which in favourable cases may lead to an unambiguous indication of the space group.
  • In cases where the Laue group is known but there are alternative indexing schemes, the indexing must be made consistent with previously processed datasets, using a reference file.
  • These operations may be carried out with the program Pointless, which can also combine multiple input files, and will write a sorted file suitable for input to the scaling program Scala.

[edit] Scaling & merging data

  • Intensities estimated by the integration program (eg mosflm) are not all on the same scale, due to a combination of all the physical factors of the experiment, so a number of "corrections" need to be applied to infer the best estimate of the true intensity and structure amplitude which are needed for structure determination. Some factors are purely geometric, eg Lorentz & polarisation corrections, and are generally applied by the integration program, but other factors such as the illuminated volume and absorption cannot be easily calculated, so are estimated by trying to make replicated symmetry-related observations of a reflection intensity equal, using some model of the physical experiment. Note that this process makes the data internally consistent, but is not guaranteed to provide an absolute correction, since the same systematic error may apply to all symmetry-related observations.
  • Analysis of the scatter of the scaled observations allows a "correction" of the standard deviation estimates, by making on average the standard deviation as far as is possible equal to the scatter, for all intensity ranges.
  • "Outliers", ie observations which deviate too far from their symmetry mates, are rejected. This is most reliable for data with high multiplicity.
  • Given the scale factors, an average intensity is calculated for each unique HKL, since this is what most structure determination programs want. Note that there is a case for keeping the data unmerged, since this allows for example time-dependent phenomena to be taken into account at a later stage (eg radiation damage). The merging stage also provides a large number of statistics on the data quality, mainly from the agreement between symmetry-related observations.
  • Scaling and merging may be carried out with the program Scala

[edit] Estimation of F from I

For perfect data, the amplitude |F| is just the square root of intensity I, but in the presence of errors, a better estimate of F, particularly for small values, comes from the mean of a probability distribution truncated at zero. The program Truncate carries out this procedure, and also analyses the intensity statistics to detect pathologies such as twinning. Finally, the dataset should be completed and a freeR column should be generated or copied from an earlier file.

Personal tools