BioHPC Site Logo

Computational Biology Service Unit
BioHPC Web Computing Resources
(compute nodes status)

User: guest | Login | Apps Home | FAQ |

APPLICATIONS
(click on a category below
to access programs)

   


MISCELLANEOUS
Subscribe
Apps Home
Clusters Status
Applications Statistics
BioHPC Home
CBSU Home
CBSU ftp server
CBSU SeqDB
CTC Windows Bioinformatics Applications
DISTRUCT
T-REX (T-RFLP manager)
Next-Gen@BioHPC
CBSU Survey
Read Survey (adm)
Reset Password
F A Q
Contact Us

Version 1 Rev 454
(2011/12/21 10:37:26)

NAM-GWAS @ BioHPC

NAM-GWAS performs a genome wide association (GWA) analysis on user supplied trait data for the maize Nested Association Mapping (NAM) population. The analysis is described in Feng Tian et al. 2011. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nature Genetics doi:10.1038/ng.746 (advanced online publication 09 January 2011). The analysis here requires users to upload a set of residuals for each chromosome from a joint linkage model and set a few parameters. The analysis proceeds by projecting 1.6 million SNPs and indels genotyped on the NAM founder lines onto approximately 5000 RILs then by testing each of the SNPs for association with the trait using a fixed effects linear model that includes a term for population.

Choices for type of analysis include a single SNP test, which tests each SNP and reports the results for each SNP; stepwise, which fits a single forward regression model to all of the data; bootstrap, which fits a series of forward regression models using subsamples of the data; and permutation, which runs the single snp test on permuted data sets and identifies the smallest p-values for each run. The bootstrap regression option can either sample with replacement or create a subsample of 80% of the lines without replacement. 100 bootstrap iterations will usually run in less than 24 hours. The haplotype and bootstrap with permutation options are not explained here and should not be used (unless you know what you are doing).

Calculations will be carried out on the BioHPC compute cluster at CBSU. You will receive e-mail notifications when the job is submitted, when it starts, and when it is finished. Output will be available via links embedded in the notification e-mails. For more information about this program and BioHPC interface in general, please visit our Frequently Asked Questions page.


     E-mail:  (only guests need to use this field, registered users should log in) 

Job name (please, no spaces, special characters etc., uderscore is OK)


Trait  file:
The trait file consists of residuals from the joint linkage model calculated separately for each chromosome. To create this file, a joint linkage model is fit to the original trait data. Then for each chromosome in turn residuals from that model, excluding the population terms and the terms for that chromosome, are calculated. This produces a set of data that allows SNPs on this chromosome to be tested for association independent of QTL on other chromosomes. Unpublished results have shown that leaving the population terms in when calculating residuals produces very similar results.

The user supplied trait data has eleven columns and is tab-delimited. The first column contains the RIL line names. The following ten columns contain the residuals for each of the ten chromosomes in order from 1 to 10. The first row is the header column followed by one row for each RIL in the data set. Missing data is not allowed in the file, though all lines in the NAM population need not be included. The RIL names for the original NAM populations must be “ZxxxEyyyy”, where “xxx” is the population number and “yyyy” is the entry number in the population. IBM RILs must have the “MOzzz” name.

Example:

sample                  chr1             chr2           chr3      chr4      chr5       chr6      chr7       chr8        chr9        chr10
Z001E0001       43.361483553       49.2613121738          63.5630735 …
Z001E0002      -20.894001589      -54.902189830           -52.772837863 …

Options:

List of chromosomes (comma-delimited, between 1 and 10)
Enterlimits (comma-delimited list of real numbers, one for each chromosome; if left blank, this option will not be used)
Enterlimit (single real number to be applied to all chromosomes; if not blank - it will override Enterlimits above)
Type of analysis
Number of bootstrap iterations/permutations
Sample with replacement in bootstrap

Maximum number of processors to use:

Cluster: This application can't run at this time - no suitable clusters
or you are not authorized to use the service.
The service is available only to Cornell students, faculty, and staff.
 
( Show timeout info )


Messages:
Cluster Athena under maintenance:
Cluster biosim operating normally
Cluster biosim2K8 operating normally
Cluster cbsum2k8 operating normally
Cluster cbsusrv05 operating normally
Cluster cbsulm01 under maintenance:
Cluster CAC_v4_lease under maintenance:
Cluster biosim_linux operating normally
Cluster cbsuss04 operating normally
Cluster cbsum1c1b001 operating normally
Cluster cbsum2 operating normally

Application P-IPRSCAN under maintenance: