Press J to jump to the feed. Press question mark to learn the rest of the keyboard shortcuts
Coming soon
Original Poster1 point · 3 days ago

When can I start???

see more

Cool job, but 85K in the bay area is, quite literally, below the poverty line.

Every time I arrive at a data set too late, I am often only there to put it down.

see more

Didn't Fisher say something about this? About conducting a post mortem on the data?

Load more comments

T-shirt and jeans yo

Hi everyone,

I recently found this sub and I'm excited to learn from your experiences. I got a B.S. in electrical engineering and have been working in that field for 5 years now. I got bored and started an M.S. in Data Science, which is far more interesting to me than what I'm doing at work. I have 2 semesters left and I feel like I've barely brushed the surface of all of the fascinating stuff I could learn in this field.

I'm trying to figure out if I should get a PhD. What would that even look like? Would I be able to have a full-time job at the same time, like I have with my M.S., or is a PhD your whole life? I think it would be so cool to dig deeper on some of these topics and help advance AI, but naturally I enjoy having a steady income. Can anyone enlighten me on the pros and cons of an AI/Deep Learning PhD in the U.S.?

I'm not trying to get a job; I'm trying to get a job that I like. A job that requires (and respects) fundamental math knowledge. A job that doesn't think that a KPI dashboard is what Data Science is all about. A job where I have to formulate, critique, and customize models to optimally fit a solution. To get a job like that, would I need a PhD?

see more

To get a job like that, would I need a PhD?

If that is what you want, you should worry about the company you are working for and not your qualifications. If a company needs a Dashboard, then that is what you are getting paid to do. Find companies that need what you want to offer. Having a Ph.D won't guarantee you will find those jobs, let alone qualify for them.

I'm not trying to get a job; I'm trying to get a job that I like

A PhD is incredibly difficult. Having a job on the side like you have mentioned would almost ensure your work or your research, probably both, would suffer.

Decide what it is you want. A PhD offers you 4 years of uninhibited learning, practice, and discovery. It comes at the expense of income and opportunity.

My institution pays for my tuition and provides a small living wage.

Not much, but it is something.

Original Poster1 point · 3 days ago

You mean an assistantship or something?

see more

I don't know man, I just cash the cheques.

UBC's data science program is the best in Canada, and maybe North America according to Hadley Wickham.

The plan is to take every single existing and newly created article currently and run 50% of the traffic to the existing rec widget and 50% to a custom widget

If 50% of everyone who visits the site is given one or the other, you are subjecting yourself to the possibility of a huge loss if the B version performs significantly worse. I don't think this is a problem for this application in particular, but this is almost never done in practice. You should perform a sample size calculation and determine the minimum number of people needed to obtain a desired statistical power for the experiment.

I would actually model it using the binomial distribution. Each order is either returned or not. Construct a confidence interval using something like the Agresti Coul interval or the Wilson interval and determine if 1.8% is in the interval.

The only reason I suggest those intervals is because the Wald interval (the CI you learn about in stats 101) has bad coverage properties near 0 or 1.

They hire a bunch of people before September to help with the textbook rush. IIRC you get 10% off merchandise, 0% off textbooks because profits are already razor thin. Good job, nice people. It gets really boring after September.

This sounds like a shitty tip, but here goes. Eat less.

Three square meals a day adds up. Eat smaller portions and you'll find food lasts longer and that your trips to the grocer are longer.

If you aren't super into fitness, think about your energy expenditure. I am usually sitting writing/coding. That is a fairly sedentary life. I don't need that much energy to do the things I do, so I eat accordingly.

Second year of my PhD.


  • Jupyter notebooks for quick prototyping
  • Rstudio and Rmarkdown to reproducibility and transforming analysis to documents
  • Git and Github because I am a fucking idiot and can not be trusted with anything

If not for those three tools, science would be incredibly tough.

Is Jupyter free to download? I'm at work right now and only have access to my phone. I've heard great things about it though.

see more

Everything I use is free because my department has no money.

Ok, I'll play devils advocate. I'll argue that data science is more like engineering than it is like science.

  • It is undoubtedly an applied discipline, much like engineering.
  • Experiments are run, but they are not experiments to elucidate characteristics of some physical phenomenon, but rather to optimize existing processes or guide decisions.
  • The use of mathematics and computing is ubiquitous in other disciplines where the name "scientist" is not conferred (e.g. Economics)

Oh boy, I gotta dig deep in my memory to answer this one.

The inverse of the Fisher information is the covariance matrix. I don't think anyone would ever compute the variance by inverting the Fisher Information, but it is a nice theoretical tool.

In fact, aside from Fisher Scoring (a way of maximizing the likelihood if my memory serves me), I don't think of the Fisher Information is an applied tool, but rather a theoretical convenience. You derive the Cramer-Rao bound using the Fisher Information, for instance. The proof is quite elegant actually.

I don't think anyone would ever compute the variance by inverting the Fisher Information, but it is a nice theoretical tool.

You do this all the time in Design of Experiments. The experimental design gives you the information matrix, which you then compute the inverse of to get the variance matrix. The variance matrix is used in calculating many different properties of your design.

see more

Shit, well TIL

The best manager made the working environment fun. He guided me through tough problems, and didn't just give me the answer. He understood that sometimes, an analysis comes up with nothing, and that wasn't my fault.

A good manager I had gave me free reign. He understood I was good at what I do and he let me do it. We rarely communicated though, so I wish he had checked up on me now and again to course correct. I could have reached out, but one never knows when one is doing something wrong.

The worst manager I had managed as if they had only read about management from a textbook. She used a lot of business school approaches, which work for some but not for me (and apparently not my coworkers either).

Could you specify a bit about the business school approaches?

see more

You'll know when people use them. This particular managers approach to management seemed too rigid. I feel like I could detect motifs in their actions, language, even posture. It was weird and robotic.

This question is about statistics, but isn't very statistical, and so this isn't really the best place to post it. The problem you are facing is one every student has. Nothing you can do but practice.

6 points · 10 days ago
  1. Your data doesn't have a distribution.

  2. Whatever distribution the underlying population has, it can't be normal because negative girth values aren't valid.

What you want to do is say if you can tell from your data whether approximating the underlying distribution with a normal could be reasonable or not.

see more

If I wasn't a poor grad student, I would give you gold for this.

__compactsupport__ commented on
r/datasciencePosted byu/[deleted]

Does atom count? Love the git integration.

Can you tell me a little bit more about the data? What is being represented?

Original Poster1 point · 10 days ago

Just Twitter activity over time periods so can be volatile.

see more

So you're probably comparing tweets per month/year between users, right?

-1 points · 11 days ago · edited 11 days ago

Oh honey..... They are trolling you.... Edit: or maybe they are not. If not thats just terrible advice, math 1225 is not a calculus class, take calc 1000. And mass emailing institutions or ppl with such basic questions is never a good idea.

see more
Original Poster1 point · 11 days ago

Math 1225 and Calc 1000 are antirequisites though? Why is 1225 not considered a calc class?

see more

Probably rigour. I suggest you just go through Calc 1000. I don't know if you are scared, or just looking for an easy ride, but the calculus sequence is a lot easier than you would expect.

And mass emailing institutions or ppl with such basic questions is never a good idea.

OP is likely going into first year and has, literally, nothing to lose by asking people who would know best.


Here is what the theme looks like.

Link to my github. There is an Rmarkdown file in the Presentations folder with a minimal working example.

Alternatively, if you like using beamer instead of Rstudio to make presentations, the themes are available as .tex files.

1 comment

Have people totally forgotten about experimentation? You can't just feed data into a machine learning algorithm and magically get what you want.

I've been looking for a reason to get into deep learning. I think I found it.

but that is quite literally lying

see more

If the difference is one course, then it’s an acceptable lie.

Hows that acceptable at all? if someone did 11 out of 12 courses for a CS major and decided to skip the last course cus it's too hard, it's still fine by ur books to list CS as a major?

Not that I don't agree that it's not worth it to take the course. It's not like "Predictive Analytics" is a bad name in the first place. Why lie and risk it?

see more

It isn’t as if op didn’t graduate. Predictive analytics and data science are essentially synonyms anyway.

Cake day
May 15, 2017
Moderator of these communities

72,953 subscribers

Trophy Case (3)
One-Year Club


Verified Email

Cookies help us deliver our Services. By using our Services or clicking I agree, you agree to our use of cookies. Learn More.