View

Sort

Coming soon

Do you happen to have a picture of the gel/tapestation/bioanalyzer of the fragment size? If it's fairly wide, I wouldn't worry too much about it. You can use AMPureXp beads to cut out the smaller stuff on the final library.

Good point. I'm using NEB's NEBnext Ultra II (don't you just love these names?) and doing the recommended 1:1 beads:sample purification after adapter addition. So, very little that's <200bp should end up in the library at the end.

I've only run an agarose gel on the fragments. Not best practices, I know. However, my PI isn't interested in paying our U's core facility for a bioanalyzer run. They have a poor reputation on pricing, mixing up samples, and just generally screwing up their runs I guess (I have no experience with them, myself).

Due to the low concentration that's recommended in the bioruptor manual for DNA shearing (10 ng/uL) and the volumes I have, combined with recommended volumes by NEB for the library prep itself there's so little DNA loaded on the gel it takes a bit of photoshop work to see the smear, so I don't think it's terribly diagnostic aside from being able to say "yeah, there's a smear there and maybe it's darker at ~250bp."

I ran another gel comparing my input fragmented DNA to some completed libraries (post-index primer PCR) and saw that there was an amplified smear up around the 400-500 range. Which seems about right for them after adapters are added.

If you are having trouble seeing your fragments on a gel, consider trying SYBR gold instead of ethidium bromide. I've had great luck with it when running gels with small quantities of DNA.

How are you getting from "If I do this experiment 100 times I would expect to see this result 5 times or less due to random chance" to "a p-value of < 0.05 means that there is less than a 5% chance that the alternative hypothesis is true"? Or to either of those from "'probability' that a particular assumption can be accepted at a particular significance level"?

The only changes that seem necessary to me:

If I do this experiment 100 times I would expect to see this result [or more extreme] 5 times

~~or less~~due to random chance [alone].

There's a few problems with your statement.

First, the null hypothesis being tested against doesn't have to be random chance. Second, frequentist statistics does not test whether the alternative hypothesis is true, but rather if the null hypothesis is unlikely. This means any definition of p-value needs to be framed around the context of a null hypothesis. If you want to measure the probability of the alternative hypothesis, you need to answer the question with a bayesian framework.

It's not a phrasing I'd teach p-values with, but I don't see the problems.

First, the null hypothesis being tested against doesn't have to be random chance.

It sounds weird to me to say that "our *null* hypothesis is an effect of 1 standard deviation." I'm probably thinking of "null" in the wrong sense, but I like "studied" hypothesis, as in the ASA statement.

But even if you go that way, I think it's implicit in "result." That is, the character of the result (its extremeness) is defined with respect to the studied hypothesis (and perhaps the alternative hypothesis).

Second, frequentist statistics does not test whether the alternative hypothesis is true, but rather if the null hypothesis is unlikely. This means any definition of p-value needs to be framed around the context of a null hypothesis.

I think this was the subject of my first question to /u/FlatbeatGreattrack, but you seem to have only restated their comment.

If by "random chance" OP meant "random chance alone" (or if we change it to that), I think it implies conditioning on the null hypothesis. It seems very likely that they did mean it that way since the next sentence conditions on an alternative hypothesis: "if there is an effect, you'll see it more frequently."

But even if you go that way, I think it's implicit in "result." That is, the character of the result (its extremeness) is defined with respect to the studied hypothesis (and perhaps the alternative hypothesis)

I think your confusion lies in the fact that you think rejecting the null hypothesis means that your alternative hypothesis is true. Since you are only testing the null hypothesis, and not the alternative hypothesis, you can't make claims about the validity of the alternative hypothesis. It could be the case that the null hypothesis you have chosen is unlikely to explain the data, but another valid null hypothesis would.

This goes back to my reference to Bayesian statistics. You can make claims about the alternative hypothesis directly because that is what you are testing - what is the probability of the alternative hypothesis given the observed data, and what is the probability of the observed data given the alternative hypothesis is true.

multi-mod commented on •

What exact protocol was followed for library construction?

Double digest radseq with pst1 as the first restriction enzyme and sau3a1 as the second on paired end reads. Other than than 4 barcodes were used other than that I'm not sure

Whoever performed it should be able to provide the reference for the protocol they followed. It's difficult to give you answers when, for example, we don't know the adapter sequences.

This sounds like qPCR, and generally speaking if you have a lot of variability in the response, a 1.5 difference isn't enough. Generally I need 5x-10x difference to see an effect when the data is non-parametric.

Just a point of clarifiaction. Data is never parametric or non-parametric. These terms refer to the statistical tests you apply to the data.

Did you generate a 95% CI of the difference between means, or of just the individual samples? If the latter, this is generally considered too stringent of a test for most cases. You would be better served employing an actually statistical test, such as a t-test or regression.

Glycoblue will pellet even without DNA, so you will be fine.

Bio-analyzer for your nano concentrations if rna ND DNA!

I generally prefer the tapestation over the bioanalyzer. A lot cheaper per sample, and you can run as few samples as you want without having to worry about filling a chip.

Usually you state that the protocol was performed as previously described in Author et al., and then you mention any changes or clarifications to the protocol to make it reproducible. In this case I would just write the protocol in detail after citing the questionable one initially.

What is the response for each of the language tests? For example, are all of them a score from 0-100%?

Also, is each student taking all of the proficiency exams?

Not defending your PI's garbage personality, but statistical significance =/= biological significance.

Agreed. Going into an experiment you should not only have some statistical cutoff, but a magnitude of difference between the groups being compared that you would consider biologically relevant.

What question do you want to answer with your data?

Comment deleted1 month ago

DNA replication occurs first before meiosis begins, so the cells starts off as tetraploid. The first division separates homologous chromosomes, so the daughters have either one of the original chromosome pairs making them diploids. In the second division the two chromosome copies are separated, amd the daughters become haploid. In most animals meiosis only occurs in germline cells, so they don't become diploid again until a sperm and egg fuse.

Yep... business that never make enough and try to squeeze every ounce, every minute, everything... that’s where the issue is... it’s pretty frustrating to see new technology wasted like this..

What do you think about the fact that Crispr can edit human embrions as well ?

The current problem with CRISPR in human genomic editing is that CRISPR tends to cause random mutations in the genome. This is because its targeting method is somewhat flexible in its sequence complementarity with DNA.

There is also the problem of our imprecise knowledge of some genomic elements. For example, genes themselves can have non-coding RNA transcribed both within their gene body and antisense to the main gene. We may have knowledge of how editing a gene can effect its protein, but the subsequent effect on the non-coding RNAs and their downstream impact would likely be unknown.

So although CRIPR represent a promising direction for *in vivo* genomic editing in humans, there is still a large body of work necessary for its safe implementation.

Because you’re increasing your type I error with multiple tests.

Just for OP's knowledge, you can mitigate this by p-value correction if you so wish. Holm-Bonferroni or FDR are two examples.

If you really only care about particular pairwise comparisons, I would even skip the ANOVA and go straight to pairwise tests with correction.

PRAW for python is your best best.

multi-mod commented on •

+1, I trust your judgment

multi-mod commented on •

There are a few ways to calculate GO enrichment, but they all roughly ask whether there are more genes with a certain GO term present in the list of genes it was given than one would expect by chance for a list of genes that size.

One common way of looking at this is the hypergeometric test. Imagine there are 1,000 genes total in our hypothetical organism, and that 100 of those genes are known to be involved in neuron development (10% of 1000). If you pick 100 genes at random from the 1,000 genes you would expect on average to get about 10 genes involved in neuronal development purely by chance (10% of 100).

Expanding on this information from above, you are interested in whether a transcription factor is important for the expression of genes involved in neuronal development in our hypothetical organism. You decide to delete this transcription factor and check to see whether the expression of genes involved in neural development are effected. You find that there are 100 genes in total out of 1,000 that had their expression decreased. Out of those 100 genes, 50 of them are involved in neural development. The hypergeometric test will help you decide whether finding 50 genes out of 100 is likely by pure chance, given that we know there are 100 neural development genes out of 1000 total genes in the genome. We know that by chance you would expect about 10 genes out of 100, so chances are that getting 50 genes out of 100 in our experiment will be very unlikely by chance. The hypergeometric test in this case will indeed confirm this and allow you to say there is enrichment of that GO term.

If the comparison is between 2 groups, it would be a students t-test. If it is between 2 groups plus a control group, it would be a one-way ANOVA with a post-hoc test if they pass (this is true for 2 or more groups).

If you only care about the pairwise comparisons you can skip the ANOVA and do the pairwise tests directly, followed by a p-value correction such as Holm-Bonferroni or Benjamini-Hochberg. Alternatively, a linear regression does all the pairwise comparisons to your reference group by default, so it's a little cleaner.

I read in a statistics book that the error from each pairwise test carries over, so it is better to do one ANOVA rather than say 5 t-tests.

A p-value correction (such as the ones I stated above) takes care of the increased type I error rate with multiple pairwise comparisons.

One of the main reasons to avoid an ANOVA if you only care about pairwise comparisons is that an ANOVA tests not only all pairwise comparisons, but all possible contrasts as well. So it will test, for example, the combined mean of groups 1, 2, and 3, compared to the mean of group 4. This often results in a p-value below your alpha, but none of those differences occurring from the pairwise comparisons in the contrast. That's why it's better to just skip the ANOVA if you don't care about the non-pairwise contrasts.

The explanation of your experiment is rather confusing. Can you briefly summarize your response and explanatory variables?

If you are having trouble keeping track of your beads, consider switching to magnetic protein A beads. I find them much easier and quicker to work with.

As for your protocol in general, you appear to be doing your cell lysis and some of your washes with a buffer that contains SDS. Ionic detergents are generally pretty stringent, especially when you want to use your protein in downstream applications. I would consider re-optimizing your buffers by switching to a non-ionic detergent (such as NP-40 or Triton), and playing around with your salt concentrations.

Why are you converting it to a percentage anyway?

This is referred to as path analysis, and there are a few packages in R that deal with this.

The coefficient in multiple logistic regression is the change in log odds of being in the reference group of your response variable for every "unit" change in your predictor variable, while holding all other variables constant.

The null hypothesis for each variable is that the adjusted coefficient is 0 (signifying that the variable is no better a predictor than guessing). The p-value is expressing the probability of getting your coefficient or greater given that the null hypothesis is true.

Remember that this p-value is derived from a coefficient that was itself the result of holding all other variables constant. This means including or excluding other variables from your model can actually change the p-value. So although other variables may not be significant, they themselves could be helping another variable to be significant.

Do you expect some sort of monotonic relationship between the variables? Perhaps it would be better to make a scatter plot of two of the variables, and then include the third as a style on the points - such as using a color gradient for intensity or changing the size of the point based on the magnitude.

It's difficult to give more specific feedback without knowing anything about your experiment.

Moderator of these communities

Trophy Case (3)

euphauric