RNA Journal Club 9/2/10

Posted in RNA Journal Club, RNAJC w/ review by YPAA on September 2, 2010

Genome-wide measurement of RNA secondary structure in yeast

Michael Kertesz, Yue Wan, Elad Mazor, John L. Rinn, Robert C. Nutter, Howard Y. Chang & Eran Segal

Nature Vol 467, 2 September 2010.
doi:10.1038/nature09322

This week’s summary and analysis by David Garcia:

In contrast to experimental methods for probing RNA secondary structure such as footprinting or SHAPE, the novel method described in this paper, called PARS (Parallel Analysis of RNA Structure), offers a significant advancement: the ability to work on a grand scale. The authors applied PARS to thousands of mRNAs simultaneously from S. cerevisiae, but the technique could in theory be applied to any population of RNAs for which sequence is known, and which can be selected and folded in vitro.

That the technique analyzes RNAs folded in vitro is a valid concern, as we might not want to get too excited about the fidelity of mRNA structure formed in a test tube versus how it actually happens, probably co-transcriptionally, in the cell, especially on this scale. But to my knowledge, all other currently available methods for analyzing RNA secondary structure are in vitro too. And it doesn’t exclude the authors from noting some interesting similarities to a published in vivo ribosome profiling dataset. When a full-blown genome-wide in vivo structure approach arrives, PARS data will be a useful comparison as well.

At the core of the method is detection of which nucleotides in RNAs are either paired or unpaired, to reveal a picture (relatively low resolution in this iteration) of secondary structure on a genome-wide level. It relies on the different specificities of two nucleases, RNase V1 which preferentially cleaves phosphodiester bonds 3’ of double-stranded RNA, and S1 nuclease which preferentially cleaves phosphodiester bonds 3’ of single-stranded RNA. The authors subjected a pool of poly-A selected yeast mRNA to either enzyme, followed by base hydrolysis mediated random fragmentation to generate smaller molecules amenable to cloning and sequencing by SOLiD. After aligning the reads, they produced profiles for each RNase that, based on where and how frequently reads clustered along an mRNA in either the V1 or S1 libraries, represent which portions of the RNA were double or single stranded. A ratio of signals from each library is expressed in the PARS score (log₂ ratio of V1 over S1), such that a larger/positive score represents a more double-stranded region, a smaller/negative score more single-stranded.

Now the first issue to be raised is that they did not perform a minus nuclease negative control, as is standard in footprinting experiments. This would help reveal how much of their library results from endogenous degradation products (or during cell lysis) which have 5’-phosphoryl ends and make it through selection. While this “contamination” is probably small, the control seems basic to me. On the plus side they did check for several other biases in their method, but I won’t go into detail here.

Next they compared PARS and traditional footprinting profiles for several endogenous mRNAs, as well as other RNAs they spiked into their library (domains from HOTAIR and the Tetrahymena group I ribozyme). They see strong overlap between the profiles. They also show strong agreement between PARS scores and known secondary structures for a few well-characterized domains of endogenous mRNAs. This data represents a convincing proof of principle, and now the task is, of course, to see if there’s a tangible way to assess PARS’s accuracy throughout a large dataset.

While they saw an overall strong correlation between PARS and Vienna scores (predicted double-stranded probability), even when they looked at only nucleotides with very strong PARS scores (high or low), a little less than half in each set could still fill out the entire distribution of Vienna scores, meaning a decent fraction were contradictory. It’s hard to conclude too much from these apples and oranges comparisons, but hopefully the two methods will be complementary in many cases, as the authors stress.

Using their PARS dataset, they highlight five global properties of yeast mRNAs. Number one: based on PARS scores, the CDS was more structured than UTRs. I found this to be quite intriguing, and perhaps I haven’t appreciated how intrinsic structure is to the sequence of raw nucleotides, absent of proteins. Unfortunately, what they did not address with this result is how much structural differences relate to sequence composition differences between the UTRs and the CDS. Since UTRs are more AU rich, could this explain the result? Or what fraction does it not explain? I realize this gets into a kind of a chicken and egg debate, because it has been shown by many that the CDS and UTRs differ in numerous ways, which are likely highly intercorrelated, and so one cannot really say what is controlling what. Still, I think this should have been checked.

Finding number two: when they looked at average PARS scores along the CDS (not in the UTRs), they saw the strongest periodic signal in 3-nt cycles, with the first position of each codon scoring the lowest average PARS score. They also saw a strong correlation between the amplitude of this 3-nt cycle and translational efficiency, as measured by average ribosome occupancy from Ingolia et al. Thus this cycle could in some way facilitate ribosome translocation, and messages that utilize it most effectively are rewarded with increased translation. It’s an interesting observation made by linking an in vivo and in vitro dataset. The system seems all so intelligently designed.

Finding number three: a small anti-correlation between mRNA structure around the translation start site and translation efficiency (again, via ribosome density from Ingolia et al.). It was clearest when the authors clustered subsets of messages into groups where the average PARS scores where distinct. Finding number four, which the authors describe as a “rich picture of biological coordination,” didn’t make much sense to me, it involved GO analysis. Maybe it was too rich for me.

Their last finding was that transcripts that encode signal peptides had less structure in portions of the 5’ UTR and the first ~30 nucleotides in the CDS compared to non-signal peptide encoding transcripts. They might have checked to see whether this effect was due in part to the sequence/codon constraints in these regions required to code for the signal peptide itself.

PARS should be a highly useful method for probing RNA structure on a genome-wide scale. While this study has nucleotide resolution, it’s low, and so better suited for systematic analysis rather than molecule-by-molecule structure determination. More controls, testing conditions, and deeper sequencing will reveal more. In the absence of any directly comparable dataset, the authors present some intriguing similarities to the Ingolia et al. dataset, implying that a measureable fraction of RNA function in vivo is inherent to sequence itself, perhaps no big surprise, but cool to ponder nonetheless. The findings could have benefited from more computational rigor, with respect to sequence constraints that may partly explain structural differences.

The in vivo main course could take a while–snack judiciously on PARS in the meantime.

You'd Prefer An Argonaute