Press J to jump to the feed. Press question mark to learn the rest of the keyboard shortcuts
Coming soon
6 points · 9 days ago

Technical replicate error does not factor into statistical analysis.

The technical error is simply a measure of reliance that your PCR result is accurate: you pipetted correctly, your reagent “behaves” correctly, and your primers aren’t so improperly designed that they are wild or inefficient.

As a general rule, you want your technical SE or SD to be below a certain threshold (usually 0.3, which essentially means all your Cqs for a given sample were within 1Cq of each other); if two of the three agree very well and one is out of line, you can exclude the aberrant one and go with the average of the other two - most likely it was a just a pipetting error.

Error between biological replicates, of course, is factored into your analysis. That’s the “real” error of the data.

see more

Since pipetting isn't perfect, technical replicates capture the variance introduced by the researcher into the data. Because of this, you should include them as a random effect in a regression model.

FYI noeatnosleep joined the Discord to more effectively talk about this. It'd be nice to have more mods there so we can talk in real-time, at least when special occasions like this call for it.

see more

I joined, but any big decisions have to be made on this subreddit. Small things obviously can be worked out between mods that are on, but we survey the entire mod team for a reason for the bigger topics.

What's your discord name? Or PM me there so I can add you to the mod channels.

see more

It should just be multi-mod

Load more comments

Original Poster1 point · 21 days ago · edited 20 days ago

Thanks, that's encouraging! I'm hesistant to use R, but all of the Bioconductor packages are for R, so here we go. I plan to study up on the command line scripting, should I need it later, but I'm hoping my boss will pony up for the assembly & alignment. I'm not going to have that many samples.

see more

R definitly is a true programming language. Not sure why you think otherwise.

Original Poster1 point · 20 days ago

Edited, thx. All I meant is that Python seems to have been developed more like a traditional, versatile scripting language, whereas R seems highly proprietary. That's just what I've gathered after surveying some bioinformatics & biostats folks. Python good, R bad, SAS bad.

see more

R is open source, and many of the packages on bioconductor are peer reviewed. I would really recommend learning a bit of R first just for bioconductor. Picking up python later will be fairly easy since a lot of the syntax is the same.


It's in beta, so report back with any funky stuff it does



There are two components to this bot: a domain spam watcher, and a media channel spam watcher. The domain spam watcher will check if the submitted domain has been submitted by the user too many times, and the media spam watcher does the same for media channels. You can start or stop the bot by a simple message, and you can configure a whitelist for the domain spam watcher with a simple wiki page. The bot should obviously have access to view the wiki.

Quick Start

  1. Create domain whitelist: make a wiki called spam_watcher and add the yaml formatted line domain_whitelist: ['', '', ''] with whatever domains you want the domain spam bot to ignore. Make sure the bot has access to see the wiki!
  2. send a message to /u/spam_watcher, with the subreddit name as the title, and either add or remove as the body.

Spam Conditions

  • Ignore if the user has less than 5 posts
  • Report if user has 5 to 9 posts with a percentage above 50%
  • Report if a user has 10 or more posts with a percentage above 25%

Users are free to change the default settings by adding the following lines to their spam_watcher wiki:

  • lower_percentage: 50 sets the report percentage when a user has 5-9 posts inclusive.
  • upper_percentage: 25 sets the report percentage when a user has >= 10 posts.

Note on Whitelist

The whitelist currently only applies to the domain spam bot. It's recommended to put general media hosting domains such as as those are taken care of by the media spam bot.


  • The bot only checks link submissions. It currently does not check within the body of self posts or comments.
  • The bot gets the media channel info from the reddit API, so the accuracy of the media watcher bot depends on reddit.

Thank you for offering the bot. Giving it a try in /r/HealthyFood

One feature I'd love to see in an anti-spam bot is a search for the username within the url.

see more
Original Poster1 point · 1 month ago

One feature I'd love to see in an anti-spam bot is a search for the username within the url.

That could potentially be doable, although anything but looking for an exact match in the url might be challenging.

No probs, and thank you for the tool - really handy for mobile mods who don't have toolbox/can't be arsed checking Snoopsnoo etc.

As for the wiki logs, this bot we use has them and when it hits the wiki page limit it just appends the newest and out with the oldest which is fine by me.

Any word on Play Store media authors? Pissing in the wind?

see more
Original Poster2 points · 1 month ago

I added a special condition to look for the dev of the play app. This is under the development branch, so I'm going to test it a bit, and then hopefully roll it out live soon.

Also, I noticed you have the bot running on a subreddit that only allows self posts. The bot currently does not look within the body of a self post. It only looks at link posts. I may consider adding that if people want it.

Load more comments

Your question is a bit unclear. Would you mind providing a more specific example?

unmoderated lists all posts that haven't been approved. It's a built in reddit thing so it can't be changed.

I usually check if there are any rule breaking posts for the last two days, and just approve anything older than that since its run its course at that point.


Hi all,

I've been doing a trial run of my spam removal bot, and it seems to be working fairly well so far.

Briefly, it removes posts if people have been posting the domain or channel too often.

The conditions seem to be fairly accurate in getting rid of the more obvious stuff.

  • ignore user if they have less than 5 total posts
  • remove if they haven 5-9 posts with a domain or channel >= 50%
  • for 10 or more posts, remove if the domain or channel >= 25%



3 points · 1 month ago

Here's a pastebin of the sites from the last 1000 posts to the sub in case you want to expand your whitelist:

I removed the few spam sites that were added but it should be pretty clean. I can run another scrape in a few days when the next thousand roll through and then merge the lists.

see more
Original Poster1 point · 1 month ago


I made it wiki configurable now in case you want to go through and add some stuff also

Good, glad to hear it. Yeah, that's my only concern, that we don't filter good websites just because people post them a lot. If we can catch more of the dodgy spam that'd be great.

Thanks for your hard work, by the way.

see more
Original Poster2 points · 1 month ago

Load more comments

I've tended to have trouble with ribo-zero kits not sufficiently depleting rRNA in the past, even after going back and forth with illumina over the phone. I switched to terminator exonuclease and had a lot more luck with it (plus it's a lot cheaper too). However, this degrades all non-capped RNA, so only consider this if you only care about capped RNA.

Remember to run your before and after RNA on a gel/tapestation/bioanalyzer to ensure appropriate rRNA depletion too.

Do you happen to have a picture of the gel/tapestation/bioanalyzer of the fragment size? If it's fairly wide, I wouldn't worry too much about it. You can use AMPureXp beads to cut out the smaller stuff on the final library.

Original Poster1 point · 2 months ago

Good point. I'm using NEB's NEBnext Ultra II (don't you just love these names?) and doing the recommended 1:1 beads:sample purification after adapter addition. So, very little that's <200bp should end up in the library at the end.

I've only run an agarose gel on the fragments. Not best practices, I know. However, my PI isn't interested in paying our U's core facility for a bioanalyzer run. They have a poor reputation on pricing, mixing up samples, and just generally screwing up their runs I guess (I have no experience with them, myself).

Due to the low concentration that's recommended in the bioruptor manual for DNA shearing (10 ng/uL) and the volumes I have, combined with recommended volumes by NEB for the library prep itself there's so little DNA loaded on the gel it takes a bit of photoshop work to see the smear, so I don't think it's terribly diagnostic aside from being able to say "yeah, there's a smear there and maybe it's darker at ~250bp."

I ran another gel comparing my input fragmented DNA to some completed libraries (post-index primer PCR) and saw that there was an amplified smear up around the 400-500 range. Which seems about right for them after adapters are added.

see more

If you are having trouble seeing your fragments on a gel, consider trying SYBR gold instead of ethidium bromide. I've had great luck with it when running gels with small quantities of DNA.

How are you getting from "If I do this experiment 100 times I would expect to see this result 5 times or less due to random chance" to "a p-value of < 0.05 means that there is less than a 5% chance that the alternative hypothesis is true"? Or to either of those from "'probability' that a particular assumption can be accepted at a particular significance level"?

The only changes that seem necessary to me:

If I do this experiment 100 times I would expect to see this result [or more extreme] 5 times or less due to random chance [alone].

see more

There's a few problems with your statement.

First, the null hypothesis being tested against doesn't have to be random chance. Second, frequentist statistics does not test whether the alternative hypothesis is true, but rather if the null hypothesis is unlikely. This means any definition of p-value needs to be framed around the context of a null hypothesis. If you want to measure the probability of the alternative hypothesis, you need to answer the question with a bayesian framework.

It's not a phrasing I'd teach p-values with, but I don't see the problems.

First, the null hypothesis being tested against doesn't have to be random chance.

It sounds weird to me to say that "our null hypothesis is an effect of 1 standard deviation." I'm probably thinking of "null" in the wrong sense, but I like "studied" hypothesis, as in the ASA statement.

But even if you go that way, I think it's implicit in "result." That is, the character of the result (its extremeness) is defined with respect to the studied hypothesis (and perhaps the alternative hypothesis).

Second, frequentist statistics does not test whether the alternative hypothesis is true, but rather if the null hypothesis is unlikely. This means any definition of p-value needs to be framed around the context of a null hypothesis.

I think this was the subject of my first question to /u/FlatbeatGreattrack, but you seem to have only restated their comment.

If by "random chance" OP meant "random chance alone" (or if we change it to that), I think it implies conditioning on the null hypothesis. It seems very likely that they did mean it that way since the next sentence conditions on an alternative hypothesis: "if there is an effect, you'll see it more frequently."

see more

But even if you go that way, I think it's implicit in "result." That is, the character of the result (its extremeness) is defined with respect to the studied hypothesis (and perhaps the alternative hypothesis)

I think your confusion lies in the fact that you think rejecting the null hypothesis means that your alternative hypothesis is true. Since you are only testing the null hypothesis, and not the alternative hypothesis, you can't make claims about the validity of the alternative hypothesis. It could be the case that the null hypothesis you have chosen is unlikely to explain the data, but another valid null hypothesis would.

This goes back to my reference to Bayesian statistics. You can make claims about the alternative hypothesis directly because that is what you are testing - what is the probability of the alternative hypothesis given the observed data, and what is the probability of the observed data given the alternative hypothesis is true.

What exact protocol was followed for library construction?

Original Poster1 point · 2 months ago

Double digest radseq with pst1 as the first restriction enzyme and sau3a1 as the second on paired end reads. Other than than 4 barcodes were used other than that I'm not sure

see more

Whoever performed it should be able to provide the reference for the protocol they followed. It's difficult to give you answers when, for example, we don't know the adapter sequences.

Glycoblue will pellet even without DNA, so you will be fine.

Bio-analyzer for your nano concentrations if rna ND DNA!

see more

I generally prefer the tapestation over the bioanalyzer. A lot cheaper per sample, and you can run as few samples as you want without having to worry about filling a chip.

8 points · 2 months ago · edited 2 months ago

Usually you state that the protocol was performed as previously described in Author et al., and then you mention any changes or clarifications to the protocol to make it reproducible. In this case I would just write the protocol in detail after citing the questionable one initially.

What is the response for each of the language tests? For example, are all of them a score from 0-100%?

Also, is each student taking all of the proficiency exams?

27 points · 3 months ago

Not defending your PI's garbage personality, but statistical significance =/= biological significance.

see more

Agreed. Going into an experiment you should not only have some statistical cutoff, but a magnitude of difference between the groups being compared that you would consider biologically relevant.

  1. PCR Cleanup
  2. UV/Vis spec
  3. Done
see more

UV won't give you an accurate reading because ssDNA absorbs more light than dsDNA (the hyperchromatic effect). Furthermore, the RNA in the RT reaction will also absorb light.

What question do you want to answer with your data?

Comment deleted3 months ago

DNA replication occurs first before meiosis begins, so the cells starts off as tetraploid. The first division separates homologous chromosomes, so the daughters have either one of the original chromosome pairs making them diploids. In the second division the two chromosome copies are separated, amd the daughters become haploid. In most animals meiosis only occurs in germline cells, so they don't become diploid again until a sperm and egg fuse.

Comment deleted3 months ago

The current problem with CRISPR in human genomic editing is that CRISPR tends to cause random mutations in the genome. This is because its targeting method is somewhat flexible in its sequence complementarity with DNA.

There is also the problem of our imprecise knowledge of some genomic elements. For example, genes themselves can have non-coding RNA transcribed both within their gene body and antisense to the main gene. We may have knowledge of how editing a gene can effect its protein, but the subsequent effect on the non-coding RNAs and their downstream impact would likely be unknown.

So although CRIPR represent a promising direction for in vivo genomic editing in humans, there is still a large body of work necessary for its safe implementation.

Because you’re increasing your type I error with multiple tests.

see more
9 points · 4 months ago · edited 4 months ago

Just for OP's knowledge, you can mitigate this by p-value correction if you so wish. Holm-Bonferroni or FDR are two examples.

If you really only care about particular pairwise comparisons, I would even skip the ANOVA and go straight to pairwise tests with correction.

PRAW for python is your best best.

+1, I trust your judgment

There are a few ways to calculate GO enrichment, but they all roughly ask whether there are more genes with a certain GO term present in the list of genes it was given than one would expect by chance for a list of genes that size.

One common way of looking at this is the hypergeometric test. Imagine there are 1,000 genes total in our hypothetical organism, and that 100 of those genes are known to be involved in neuron development (10% of 1000). If you pick 100 genes at random from the 1,000 genes you would expect on average to get about 10 genes involved in neuronal development purely by chance (10% of 100).

Expanding on this information from above, you are interested in whether a transcription factor is important for the expression of genes involved in neuronal development in our hypothetical organism. You decide to delete this transcription factor and check to see whether the expression of genes involved in neural development are effected. You find that there are 100 genes in total out of 1,000 that had their expression decreased. Out of those 100 genes, 50 of them are involved in neural development. The hypergeometric test will help you decide whether finding 50 genes out of 100 is likely by pure chance, given that we know there are 100 neural development genes out of 1000 total genes in the genome. We know that by chance you would expect about 10 genes out of 100, so chances are that getting 50 genes out of 100 in our experiment will be very unlikely by chance. The hypergeometric test in this case will indeed confirm this and allow you to say there is enrichment of that GO term.

If the comparison is between 2 groups, it would be a students t-test. If it is between 2 groups plus a control group, it would be a one-way ANOVA with a post-hoc test if they pass (this is true for 2 or more groups).

see more

If you only care about the pairwise comparisons you can skip the ANOVA and do the pairwise tests directly, followed by a p-value correction such as Holm-Bonferroni or Benjamini-Hochberg. Alternatively, a linear regression does all the pairwise comparisons to your reference group by default, so it's a little cleaner.

I read in a statistics book that the error from each pairwise test carries over, so it is better to do one ANOVA rather than say 5 t-tests.

see more

A p-value correction (such as the ones I stated above) takes care of the increased type I error rate with multiple pairwise comparisons.

One of the main reasons to avoid an ANOVA if you only care about pairwise comparisons is that an ANOVA tests not only all pairwise comparisons, but all possible contrasts as well. So it will test, for example, the combined mean of groups 1, 2, and 3, compared to the mean of group 4. This often results in a p-value below your alpha, but none of those differences occurring from the pairwise comparisons in the contrast. That's why it's better to just skip the ANOVA if you don't care about the non-pairwise contrasts.

Cake day
May 24, 2013
Moderator of these communities

17,468,341 subscribers


13,359,988 subscribers


13,305,247 subscribers

Trophy Case (3)
Five-Year Club

Verified Email

Gilding II


Cookies help us deliver our Services. By using our Services or clicking I agree, you agree to our use of cookies. Learn More.