The SNPMaP package has been designed to handle the processing of SNPMaP data from the CEL files generated by the Affymetrix GeneChip Command Console (AGCC) or GeneChip Operating Software (GCOS), through to the RAS (Relative Allele Scores: the pooling equivalent of a relative allele frequency) used in most analyses. This can be as simple as typing
ras <- snpmap()
at the R prompt. The package will identify and read in the CEL files from the current directory, extract the relevant probe intensities and calculate a mean RAS for each SNP on each chip, returning a SNPMaP S4 object containing the scores.
Plotting functions such as
boxplot() help with quality control
Given the amount of data generated by current SNP arrays, even with the relatively modest numbers of arrays (tens) typical of SNPMaP experiments, we have provided the option of a memory-mapping approach (using the R.huge package), which allows analysis to be done on a PC with 2GB of memory (naturally there is a speed penalty). If memory limits are exceeded in the course of the analysis, SNPMaP attempts to automatically switch from storing objects in memory to storing objects on disk.
S4 methods for generic R
functions such as
make it easy to query the SNPMaP object and visualize the data it
Accessors provide convenient access to the data. All functions are
documented through the R help system. For example, typing
will bring up a page describing the
function and its
will bring up help pages for the SNPMaP package and the SNPMaP class
Although the SNPMaP object is intended to be useable for further analyses, the data can also easily be extracted to a matrix using
A user who wants CEL files transformed into a spreadsheet of RAS in the simplest possible way need not use R interactively at all; example scripts that can be invoked from various shells are available, including a point-and-click front end for Windows. These steps comprise the simplest route from CEL files to the RAS used for association analysis.
On the other hand, a user who wants to examine all steps of the analysis and experiment with new methods has access to the data in straightforward and convenient form. This flexibility is one of the major strengths of an implementation in R because of the impressive array of cutting-edge statistical techniques already implemented in the R environment.
image() method uses the signal intensities from a "
raw" format SNPMaP object to draw a pseudoimage of the array. This is useful for checking the surface for artifacts, such as the bubble here. A
fastRender option allows rapid initial screening of all the arrays in the study, and the image can be coloured using custom palettes.
A more involved approach might begin by extracting the raw
from the CEL files (running the workflow function
rather than the
snpmap(lowMemory = FALSE, RUN = 'cel2raw')
This allows the user to plot the raw probe intensities and
generate pseudoimages of the processed chips using the
method, so the user can check for scanning artifacts such as dust
or fingerprints. The raw intensities can be further processed to RAS
time one RAS per probe quartet, rather than a summary RAS averaged across an array as before) by a workflow function:
ras <- raw2ras(raw)
Other options available at the
quantile-normalizes the raw
probe intensities across chips,
SNPMaP to use the natural logarithm of the probe intensities, and
which causes SNPMaP to subtract mismatch probe intensities (where
For testing purposes, here are some Affymetrix CEL files from a real SNPMaP experiment. Each download is a zip file containing two arrays, each with DNA from 30 individuals pooled on it.
|Mapping 250K Sty Array||54MB||download|
|Mapping 250K Nsp Array||51MB||download|
|Genome-Wide Human SNP Array 5.0||44MB||download|
|Genome-Wide Human SNP Array 6.0||69MB||download|