Press J to jump to the feed. Press question mark to learn the rest of the keyboard shortcuts
1
Stickied postModerator of r/datasets

Show off, complain, and generally have a chat here.
Discuss whatever you've been playing with lately(datasets, visualisations, mining projects etc).
Also feel free to share/ask for tips suggestions and in general talk about services/tools/sites you find interesting.

P.S: Suggestions for this subreddit are always welcome.

1
6 comments
3

looking for a clean jokes only dataset/database in either csv/msyql/sqlite or other format to download

3
1 comment
2

Hi,

I'm looking for a dataset which includes neutral tweets, to be used during training of a naive bayes classifier. So far I have found the Sentiment140 dataset which includes 1.6 million tweets (800 000 positive/negative). I would like to have a third sentiment, for neutral tweets.

Do anyone know where I can find such dataset? I found T4SA dataset but unfortunately you need a password to download the dataset.

Much appreciated,

Nowarez

2
comment
2

Otherwise, any data related to RFID would be very appreciated. Thanks.

2
comment
32
Comments are locked

Hey Reddit: Want to write better? Eliminate grammatical mistakes, wipe out wordiness, and let your ideas shine. See for yourself why over 10 million users are hooked on Grammarly's free writing app.

32
comment
1

I'm looking for a data set that asks racially loaded questions, like some of the GSS, but that also includes information on whether the respondent is an immigrant and how long they had been in the US. Any thoughts?

1
comment
7

I'm having a difficult time finding raw datasets on sexual assault... mostly finding pdfs of conclusive reports. I know the material is sensitive, but any idea where to find some datasets? My hopes are to apply statistics to profiles for either victims or perpetrators, data science style study. Thanks for any thoughts...

7
6 comments
48

Just want to make people aware of this dataset:

On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.

https://openpolicing.stanford.edu/

48
comment
5

These are all the comments made by users in /r/rateme (a subreddit where people post a selfie asking for feedback on their looks) that include a rating out of 10. The first column is the comment and the second column is the rating in that comment. Comments were scraped using the Python Pushshift.io API Wrapper. This could be used for sentiment analysis.

https://www.kaggle.com/milesh1/rrateme-comments

https://www.kaggle.com/milesh1/rrateme-comments/downloads/rrateme-comments.zip/1

5
comment
2
Crossposted bypushshift.io1 day ago

It's been a long time coming to get caught up and work through Reddit API changes, but the files are currently available via https://files.pushshift.io/reddit/submissions

The format of the files is RS_2018-08.xz

New fields:

There are some new fields available and a field that I have added to the data through a lot of data mining. Here is an example of a JSON object from one of the files

Reddit now includes the author id under the authorfullname field. This starts with "t2" and is a base 36 representation of an integer, just like the other id fields.

I have added a field called "author_created_utc" -- this field is the epoch time of when that account was created on Reddit. Some data scientists may find this very useful with certain types of analysis.

If you have any questions, please let me know. July and August comments are near completion and should be available in a few days.

subreddit_subscribers -- Every submission object has the number of accounts subscribed to a subreddit. This number is accurate when compared to the "retrieved_on" value (when I ingested the object) and not the "created_utc" value. If you are analyzing subreddit growth, remember to always use the retrieved_on time when analyzing the number of subscribers.

Thank you!

6 points
2
comment
0

I should really stop doing homework at the last minute but can you please comment your height (preferably in inches), age, & how many pets you have :)

0
3 comments
0

Hi,

I am graduate student of electronics engineer. I work rail inspection detection using image processing on my thesis.

I'm looking for rail crack dataset for the project. I've found many concrete, bridge crack dataset but I couldnt find rail crack dataset. I would be very happy if you can help,

thanks in advance.

0
comment
1

Where can I find a dataset which just contains a large number of unlabeled sentences describing how a person is feeling?

Eg.

He is almost in tears.

She is filled with regret for what she did.

1
comment
16

I need the following weather data for major cities in Europe (if possible all cities over 50 000 inhabitants) on a hourly (or at least 6-hours) base for the last 10 years: - wind speed - temperature - precipitation ...

16
3 comments
0

I'm looking for a dataset that measures how culturally similar China is to various nations. Past measures have used nations' amount of native Chinese to proxy for this but I cannot find this dataset. I am open to any approaches that seek to measure cultural proximity.

Thank you!

0
comment
0
Comments are locked

Celebrate 20 years of Harry Potter with Reddit Gifts Harry Potter exchange! Sign up by October 1st!

0
comment
3

Hello! I'm starting to work on my Master's Degree thesis which is about Machine Learning algorithms and Data Mining and at the moment I can't access the UCI Dataset Repository. Does anyone know if it's currently unavailable or if it can only be accessed in the University Wifi eduroam?

Thank you!

3
6 comments
2

So I'm looking for a dataset about hardware failures which are coming from Data Centers. Do you guys know of any? Thanks!

2
2 comments
2

Hello, I am new to data mining and I would like to data mine data from a Freemium, free-to-play type of game (like Candy Crush or Fortnite). I would like to see the relationship between the level a player is at and the number of purchases and purchase amount made. I am also looking for other variables that may contribute to an increase of transactions and the amount per transaction.

I have been searching online for game data sets with transactions but so far I am not finding much that fits my needs. Where would be a good place to start, and what tools and languages should I use to data mine? (Python, R)? Thank you!

2
comment
1

Hello everyone,

I am looking for CO2, NOx and GHG emission data of companies publicly listed. These data are published in their ESG reports. Paid or free datasets is okay with me. I really need this for my current project. Thanks.

1
comment
2

Hey all, Trying to teach abstract data modeling and would appreciate a CSV or XLS of some sort of firearm catalog with manufacturer and specs.

Or

Firearms/ tanks / aircraft/ military equipment, plus some specs, plus appearance in war X.

Tldr: some sort of military or military history data set with several dimensions.

Thanks in advance.

2
2 comments
1

I'm working on a little project, and I need a database of french words and their definitions.

Thank you!!

1
2 comments
1

Hi,

I am looking for a dataset with a lot of pictures of old people to do face recognition on. The age should be 60+, the older the better.

I looked through a lot of datasets already, but most pictures seem to be students or middle aged people. Any help would be greatly appreciated.

Thanks.

1
comment
Community Details

41.4k

Subscribers

548

Online

A place to share, find, and discuss Datasets.

Create Post
r/datasets Rules
1.
Self-promotion / no disclosure
2.
Not original source
3.
Inappropriate Survey
4.
Low Effort
Cookies help us deliver our Services. By using our Services or clicking I agree, you agree to our use of cookies. Learn More.