Show off, complain, and generally have a chat here.
Discuss whatever you've been playing with lately(datasets, visualisations, mining projects etc).
Also feel free to share/ask for tips suggestions and in general talk about services/tools/sites you find interesting.
P.S: Suggestions for this subreddit are always welcome.
I need hotel data for a majority of the top 1000 cities in the US. All I need is city and average hotel cost, although it'd be great if it was also separated out by star. Any ideas? If there's a website I could scrape/crawl that would be acceptable too. Thank you!
Hi, I was wondering if anyone knows where I can find longitudinal data on the size of New Orleans (in terms of area). Ideally this would be like annually from 2000 to 2015 ish. I have only been able to find data for the most recent year.
The purpose of this is to track shoreline changes over the past 15 years
I found an old timer in my shed, and its not turning on at night, but daytime rather, curious to see if my theory is right. Any help is appreciated.
Maybe looking for the diffence in sunset/rise times over the years
Any time lost between ?
Im honestly not sure how to even start this.
I have a crawler that was working greater before the last patch.
My search creates urls and feeds it specific naics/funding agency combinations. However, as of last night it just loops through the first combination indefinitely.
FPDS-NG search results for <![CDATA[ : PRINCIPAL_NAICS_CODE:541990 FUNDING_AGENCY_ID:3600 SIGNED_DATE:[2018/07/01,2018/07/22] ]]> </title> <link rel="alternate" type="text/html" href="https://www.fpds.gov/ezsearch/search.do?s=FPDS&indexName=awardfull&templateName=1.5.1&q=PRINCIPAL_NAICS_CODE%3A541990+FUNDING_AGENCY_ID%3A3600+SIGNED_DATE%3A%5B2018%2F07%2F01%2C2018%2F07%2F22%5D&start=5000"/> <link rel="first" type="text/html" href="https://www.fpds.gov/ezsearch/FEEDS/ATOM?s=FPDS&FEEDNAME=PUBLIC&VERSION=1.5.1&q=PRINCIPAL_NAICS_CODE%3A541990+FUNDING_AGENCY_ID%3A3600+SIGNED_DATE%3A%5B2018%2F07%2F01%2C2018%2F07%2F22%5D&start=0"/> <link rel="last" type="text/html" href="https://www.fpds.gov/ezsearch/FEEDS/ATOM?s=FPDS&FEEDNAME=PUBLIC&VERSION=1.5.1&q=PRINCIPAL_NAICS_CODE%3A541990+FUNDING_AGENCY_ID%3A3600+SIGNED_DATE%3A%5B2018%2F07%2F01%2C2018%2F07%2F22%5D&start=40"/> <link rel="previous" type="text/html" href="https://www.fpds.gov/ezsearch/FEEDS/ATOM?s=FPDS&FEEDNAME=PUBLIC&VERSION=1.5.1&q=PRINCIPAL_NAICS_CODE%3A541990+FUNDING_AGENCY_ID%3A3600+SIGNED_DATE%3A%5B2018%2F07%2F01%2C2018%2F07%2F22%5D&start=4990"/> <modified/>
As you can see start=0 is the first page and start=40 is in ending page. If you look this query up on fpds.gov using their front end there are 43 entries. So start=0 is 10 results. My crawler should end at start=40 and go on to the next naics/funding combo. This has been working perfectly for over a year and a half but after yesterday's update it isn't working and I was wondering if anyone else has see something similiar.
I have searched the internet for anything. I am not looking for CT or MRI scans, etc. Just cancer cells under micro scopes.
I could use cancer types like
Squamous Cell Carcinoma(Skin cancer)
and things like that!
Thanks in advance!
Sorry this should be so simple but I've struck out after 2 hours looking. I've found charts galore but not the raw data.
I'm looking for a simple chart of the global income or wealth distribution by percentile. So 1st percentile 0.01% second percentile 0.02% etc...
Or if I could just get the income wealth by percentile I could do the distribution myself.
I found this for the UK which is brilliant. I just want the global version of it.
I'm gathering info about botnet traffic detection for my computer engineering conclusion work.
I have read some articles on this subject already. But almost every one of them works on a private dataset, usually collected from private networks.
So I'd like to have a dataset on which I could work on botnet traffic detection, and maybe botnet classification.
Thanks in advance, see ya!
The more info the better. If it has APR info, the introductory period of no interest, would be great. Basically every credit card company or financial institution. XML, CSV, Json or any format is fine. Thanks
Scraped the fifa.com statistics page for each game into CSV.
Columns are: `Game,Group,Team,Opponent,Home/Away,Score,WDL,Pens?,Goals For,Goals Against,Pen Shootout For,Pen Shootout Against,Attempts,On-Target,Off-Target,Blocked,Woodwork,Corners,Offsides,Ball possession %,Pass Accuracy %,Passes,Passes Completed,Distance Covered km,Balls recovered,Tackles,Blocks,Clearances,Yellow cards,Red Cards,Second Yellow Card leading to Red Card,Fouls Committed`
There's a little ruby scraper in that repo along with the source pages, but the above CSV is what is created.
I'm doing a research project and I'm looking for more (or better), resources. I'm part of a publication being written on Expat compensation rates around the world and the hassle factor. I'm taking a Data science approach looking at each country and what expatriates get annually. I bit on what expatriates are, they are people who leave their countries to pursue education or work, and then return to their original country. What I'm looking for is, data bases to access (ex. Gov websites, university data bases), that contain survey data, pay rate analytics, and anything that can be a use to my research. I'll be glad to explain more if you have any questions as most people don't know what expatriates, self initiated Patriots, and returnees are. Thank you :).
I have received a lot of great advice so far and have created a new Patreon page for Pushshift. This will help keep track of the amount of donations that Pushshift receives (which I feel should be transparent for the community). My first goal is $1,500 per month which would be sufficient to pay the bills and for the daily maintenance necessary to keep things running smoothly.
The Patreon page is located here: https://www.patreon.com/pushshift
Hello! I am not always the best when it comes to fund-raising and pursuing the best avenues for getting donations so I will reach out to you guys. I am reaching out for ideas on how to raise money to keep these services alive and healthy (and also to continue to improve the API and add more features).
The Pushshift.io API and the data dumps I provide (both for Reddit, Twitter and other data sources) requires a significant time investment from me and also requires a significant amount of funding. Just for the hardware maintenance and purchasing new hardware to keep up with the level of data I ingest, I have spent over $25,000+. There are also re-occurring monthly expenses for power, bandwidth, etc.
Unfortunately, donations have been sporadic lately. For the previous 4 weeks, I've gotten less than $100 in donations which isn't enough just for the monthly ISP bill.
To give some insight into my commitment to this project (the original primary aim was to help academic institutions and researchers interested in researching social media discourse, etc.), I left my full-time job with the National Democratic Institute last year around August to focus on this project full-time. I simply love data and helping out the academic community and wanted to spend more time focusing on open-source projects and getting involved in other projects that focus on making our world a better place. I spent some time late last year and earlier this year working with the CivilServant project. I had a family emergency earlier this year which caused me to have to leave that project (quick note -- CivilServant, run by Nathan Matias, is an amazing project and I highly suggest checking it out!).
My goal is to raise $3-5k monthly to both maintain the current services that Pushshift.io offers and also to improve the existing services and add new ones as well. I am currently not even averaging 1/10th of that amount. The largest donation I have received was from the Pineapple Fund which generously contributed $10,000 towards the project (that was a huge help -- thank you to whoever you are!) A bare-minimum of $1.5k per month would be enough to keep the present project alive, though.
If I cannot find some means to increase funding for this project, I will sadly have to shut-down the project at some point (If it comes to that, I will do my best to give some advance notice so that others who depend on this service can transition off of it). I am reaching out to the community for ideas on how to get more serious in raising funds for this project and would greatly appreciate any suggestions that you have.
I'm after a dataset of around 1-2k records for the #CROvENG match? I don't have the technical skills to utilize the API's so I'm hoping someone here does.
Mostly after just username, tweet, date, number of likes and retweets. Just a general
I am interested in data from companies like Ticketmaster and Pollstar that track ticket sales, pricing, attendance, etc. for live shows. I am pretty sure the financial data for any publicly traded company is available online, but I am more curious about the datasets that are used internally.
Hey guys, I'm looking for a dataset that consists of anywhere from 500-1000 images of faces, along with a description of the face (anywhere from 1-2 sentences, or even simple keywords). For example, an image like this one:
Could be paired with a description such as: "Black male, short goatee, short moustache, wide face, left eye blue, brown right eye, bushy eyebrows, black hair", or something along the lines of that. Does anyone know of a dataset like this? Thanks.
I crawled the results of the soccer magazine Kicker and gathered the data, described here
Edit: I updated the file. It contains the team names coded as unique IDs. Same for day of the week. Date and time were converted to integer formats. Thus all data is in a numeric form and can be used in some machine learning algorithm.