RNA Journal Club 6/25/09

Posted in RNA Journal Club, RNAJC w/ review by YPAA on June 25, 2009

Argonaute HITS-CLIP decodes microRNA–mRNA interaction maps

Sung Wook Chi, Julie B. Zang, Aldo Mele & Robert B. Darnell

Nature, Advance Online Publication, June 17 2009.
Nature 460 (7254): 479-486, July 23 2009.
doi:10.1038/nature08170

This week’s deep summary and analysis by Noah Spies:

Despite the best efforts of numerous labs over the last decade, studying microRNA—messenger RNA interactions is still a slow and error-prone process of computational predictions based around sequence conservation (and a host of other sequence elements), supported by luciferase reporter assays, microRNA-transfection followed by micro-array or mass spec analysis, knockouts of microRNAs and components of their pathway, and other methods. So, it is with great excitement and some frustration that we receive this HITS-CLIP paper from the Darnell lab.

In a late 2008 paper, Darnell and colleagues developed the “HITS-CLIP” method, short for high-throughput sequencing of RNA isolated by cross-linking immunoprecipitation. This mouthful-of-an-acronym method involves using ultraviolet light to cross-link proteins to nucleic acids, allowing stringent immunoprecipitation of direct protein—RNA complexes. In Chi et al (2009), the Darnell lab has focused this method on identifying interactions between the workhorse of the microRNA-induced silencing complex, Argonaute (Ago), the microRNAs bound in Ago, and the mRNAs targeted by those microRNAs. The authors found that immunoprecipitation of Ago from cross-linked cells produced two populations of Ago-RNA complexes: (1) Ago—microRNA complexes, which run at about 110 kDa after partial RNase digestion, and (2) Ago-mRNA complexes, which run closer to 130 kDa. By isolating these RNA populations separately and sequencing using Illumina, the authors were able to globally identify both Ago—microRNA and Ago-mRNA interactions.

The difficulty in analyzing these data comes from the heterogeneous population of microRNAs: which Ago-mRNA sequence tag corresponds to an Ago with which microRNA loaded? The authors first use a clever approach they dub “in silico CLIP” to simulate distributing sequence tags across messenger RNAs based on mRNA expression. This simulation provides a background level for the number of tags that would be expected by chance to simultaneously overlap one another, forming clusters. The authors then identify significantly enriched mRNA-sequencing tag clusters, and show that tags in most of these clusters are tightly distributed around the center, giving a sharp peak. For each cluster, then, the authors can search for 6—8mer microRNA seed matches within the cluster, and suggest which microRNA bound which mRNA clusters.

There was a significant enrichment of clusters at both ends of 3′ untranslated regions (UTRs), as was expected given prior research that most functional microRNA targets are in these regions. This study also identified many Ago—mRNA clusters in coding sequence, although not above background, and in introns and intergenic sequence, though the authors did not explore the explanation that these may simply be the result of unannotated transcripts and retained introns.

This method has the advantage over computational predictions of identifying true Ago—mRNA interactions, but these interactions do not necessarily result in noticeable down-regulation of the messenger RNA. To begin to assess how often these Ago—mRNA interactions are productive, the authors transfected a brain-specific microRNA, miR-124, into the cervical cancer cell line HeLa, and then used HITS-CLIP to identify Ago—mRNA clusters. The authors found that those mRNAs apparently bound by miR-124, according to HITS-CLIP, were significantly downregulated following transfection when compared to those with miR-124 sites computationally predicted by TargetScan. This was true at both the protein and the mRNA level across all transcripts, as well as when only looking at the brain-expressed messenger RNAs at the mRNA level, although brain-expressed genes did not show a convincing down-regulation at the protein level.

The authors end with a faulty Gene-Ontology—based analysis, comparing HITS-CLIP to previously published microRNA-target predictions. For the most highly expressed microRNAs in their study, the authors analyzed enrichment of various GO categories in HITS-CLIP mRNA clusters with the associated seed site. They found significant enrichment for several of these microRNAs for several of these neuronal GO categories. In comparing these results to microRNA-target predictions, the authors compared mRNAs with and without predicted microRNA target sites. However, this ignores the fact that many genes have very little conservation in their 3′ UTRs, and hence could not be predicted as targets of any microRNA. A better comparison might take transcripts targeted by non-expressed microRNAs as the background set, and compare these to those predicted to be targeted by the highly expressed microRNAs.

In summary, this is an exciting and powerful new technique, which will quickly broaden our understanding of microRNA regulation. A few issues marred what could have been an exceptionally interesting paper. First, the authors seemingly randomly cherry-pick their data for each figure panel, sometimes choosing conserved microRNAs, sometimes non-conserved; sometimes those clusters present in all their replicates, sometimes only those in two or more replicates; sometimes the top 30 most-expressed microRNAs, sometimes only the top 20. These decisions may have been well founded, or the results were similar regardless of which data they chose, but without clear explanations of why they conducted their analyses the way they did, it is difficult to express confidence in the robustness of their results. Secondly, the HITS-CLIP method has a huge advantage over target prediction methods in being able to identify non-conserved target sites, and yet the authors restricted most of their analyses to only those conserved microRNA targets. Finally, the authors chose not to make the raw HITS-CLIP sequencing data readily available online (submission to the NCBI Short Read Archive is the standard for sequencing data, as GEO is the standard for micro-array data), although one can hope that this will be rectified in the near future.

Update 7/24/09:

As Dr. Darnell calls attention to in his comment below, all of the raw data and UCSC links are now available. For them, visit the Darnell Lab Ago HITS-CLIP website here.

4 comments

4 Responses

Subscribe to comments with RSS.

Hawt RNA Blogs « You’d Prefer An Argonaute said, on July 6, 2009 at 6:41 pm

[…] the topic far more sensorial. My blog has recently spotlighted literature with titles like, “Argonaute HITS-CLIP decodes microRNA–mRNA interaction maps.” The other RNA blog, “Surrender to the Playboy Sheikh”, and “Disrobbed […]

Reply
Bob Darnell said, on July 23, 2009 at 6:20 am

Thanks for your great interest in the Ago HITS-CLIP map that we developed. All of our Ago HITS-CLIP raw data and UCSC links will be released on the formal publication date (today), July 23. We will continue to maintain updates on our project website.

I can understand that going through all the Supplementary data adds a big burden to the review–it is 70 pages, but this might have helped your review a bit. For example, re: the concern that we randomly cherry-picked data for our figure panel (I assume you mean Fig 3), what in fact we show are two examples where miR-124 has been rigorously shown by bioinformatic (as done so nicely by your boss Dave Bartel) and mutagenesis studies to regulate expression. There are not so many such examples. We illustrate the maps for all the ones we could find in the literature, but couldn’t fit it in Fig 3 for space reasons–all the others are in Supplementary Figure 10 (which contains 10 subfigures).

Reply
Noah said, on July 24, 2009 at 4:16 pm

Thanks for your response, Dr Darnell. I’m glad that the raw data are now available, and completely understand that getting such things together can take a little longer. I’ll have our blog editor post an update linking to the data. I’m sure my colleagues will be excited to have a look at the data for themselves.

My problem with cherry-picking data actually comes primarily from figures 1 & 2. For example, figures 1h and 2c display clusters only found in all 5 replicates, whereas some subsequent analyses involve all clusters found in at least two experiments (eg Fig 5) or at least 3 experiments (eg estimate of false negative rate; this one may just be a typo). Or for the calculation of false positive rate, the top & bottom 30 seeds were used for one analysis whereas the top & bottom 20 seeds were used for another analysis. Or the analysis that claims that a peak heigh cutoff of at least 20 is good, yet figure 2 uses a different peak height cutoff of at least 30. These individually are minor inconsistencies, but come without good explanation and at the very least make it difficult to compare analyses within the paper.

Reply
Chi said, on July 27, 2009 at 11:50 am

Thank you very much for the comment on our paper. We are continuously trying to update our project website to provide all information about Ago HITS-CLIP map. Please keep on eye on our website (http://ago.rockefeller.edu) for updated information.

Regarding the issue raised by Noah above, I’d like to make it clear that we did not “cherry pick” our data. In the paper, we tried to clarify our criteria and motivations regarding our choice of using criteria for biologic complexity (BC) and peak height in the paper as follows:
“Relative to more stringent analyses (Fig. 2c), our analysis at this
threshold (BC>=2) was more sensitive and sufficiently specific such that we
used it for subsequent analyses (Supplementary Fig. 7)”

Part of the difficulty is the severe space constraint put upon us by the journal, such that we struggled to balance general descriptions of the work with sufficient detail necessary to make it rigorous (with a necessarily large amount of supplementary information). What this sentence is meant to clarify is our general strategy: to initially analyze the data using a stringent set of criteria, and then generalize it using a larger dataset. Hence what appeared as “cherry picking” to you was rationally based.

We originally developed an empirical approach to BC and peak height based on the validation experiments. In Figure 2, we set out to develop a new method ab initio. We begin with the most stringent and trustable data sets to figure out the distributional property of tags and cluster width relative to peak position to define a conservative Ago footprint region. In order to get accurate high resolution of peak position, which is interpolated by cubic spline, we need to have quite good number of tags in the clusters with single peaks. As the length of tags is 36nt (maximum reads of solexa sequencing is 36), peak heights more than 30 are needed to define single nucleotide resolution of the peak position (in the extreme case, positioning 6mer seed interacting sites in 36nt tags needs 30 different unique tags with single nucleotide resolution). So we used the clusters with BC5 and peak height > 30 (Fig. 2A, we thought that it is quite intuitive without detailed explanation. We should have put it in supplementary information for the people who couldn’t catch this intuitively.)

Peak height cutoff (>20) in supplementary fig 7 is the cutoff for accuracy of Ago binding based on the comparisons of cluster number in different peak height threshold with different BC (because the distribution of different BC begin to merge around this threshold). So we used >30, which is more stringent than 20 (and likely to be more real; that’s why we refer to supplementary fig 7) and could be used to accurately define peak position in high resolution, which is essential to define the Ago footprint region (Fig2A)

We then generalize to the data from the more stringent dataset (BC5) to the more general dataset (BC2; Fig 2A->2B). A similar pattern was done with the analysis of conserved seed sequences in Ago footprints; our initial discovery strategy was with stringent conditions (Fig. 2C: BC5, threshold 30), and we then took a more general approach; 2C->2D, where in Fig. 2D,BC > 2, no restriction on peak height). Given our promising results in Fig. 2B and 2D, we then chose to use these criteria (BC2, no peak height restriction), for rest of paper (we also used BC>=2 for estimation of false positive. We now realize that it is typo written as BC>2, the equals sign disappeared somehow in editing.)

Regarding analysis of top30 seeds vs bottom30 seeds, if you look at Supplementary figure 13C, the false positive rate is quite good up to 20. That’s why we used the top 20 miRNA seeds for generating Ago ternary map giving high accuracy. But the purpose of fig2C is for estimating how good Ago footprint is for defining miRNA binding sites by comparing not only the number of seeds but also their distribution relative to peaks (Top30 seeds vs bottom 30 seeds). For the comparison of two distribution fairly (by kurtosis), seeds from control set (bottom 30) should have some good amount of conserved seed sites. We found very few sites from the bottom 20, which is not enough for statistics to compare the distribution. So we put Top30 vs Bottom30 for this comparison although the result is not better than Top20 vs Bottom20 (Fig2C and Supplementary figure 13C). For estimating false positive rates, we used Top20 vs Bottom20, which is our final criteria for Ago ternary map, but we also explained the false positive rates, which could be estimated by figure 2C to give further information for the reader (Also in Supplementary table3 and Supplementary figure 13).

Bob & Chi

Reply

You'd Prefer An Argonaute