index

viewhistorytalk

What is a data visualization?

For the purposes of this subreddit a visualization is:

  • Automatically generated, and not manually drawn or a photograph. Legacy data visualizations (e.g. charts from the 1800s) are an exception. (See also: Infographic vs Visualization)
  • Based on non-visual data. It can't be a form of image effect or pixel shader.
  • Based on real or simulated data. If the image represents one number (pi), sequence (primes), or equation (sin(x)), then /r/mathpics is a more appropriate place.
  • Fake data, goofball statistics, and trivial analysis—for the sole purpose of making a joke—is not permitted. Go to /r/data_irl for your humor needs; our standards are a bit higher than that.
  • A mapping of information to a visual property. Text in a table is not sufficient. A data variable must be transformed and mapped onto a visual property such as color, size, or position.
  • For maps: If the visualization features spatial data, geographic position alone is not sufficient. It must be more than dots on a map. In addition, binary TRUE/FALSE maps are not enough to communicate something that a table couldn't. If your map doesn't apply, you can always post to /r/mapporn.
  • Made with the intent to communicate data. A music visualization from a media player, while pretty and mesmerizing, doesn't convey information. You can't differentiate songs just by looking at the images.

The definition was adapted from Eager Eyes and this InfoVis paper

Original sources

All posts need to link to the original author's article that introduced the visualization. It doesn't matter where YOU first saw it. You need to find the actual source.

If you found the image somewhere:

Please note that we require you to post the complete original source article, so that the readers can get context, the author can get credit, and third parties don't get hits from stealing another person's work. Posts must directly link to the visualization where it was originally introduced by the author (not an image on the site, but the actual full web page article).

The source is rarely Gawker, Tumblr, Imgur, etc. Figure out where they got it. Usually a link to the original source is in the first couple paragraphs of where you found it. You may wish to try TinEye or Reverse Google Images. If you need assistance, click here.

If you are trying to submit a Wikipedia link:

Try submitting the Wikipedia or Wikimedia commons link instead of the raw file. A quick look at the URL should confirm this. Compare the following links:

Take a look at Link A. That is just the image, and is the same issue with hotlinking an article image from a website; there is no context. Now look at Link B. There is a lot of enriching information, e.g. copyright license, author, source, and even a link to the revision history. The same is the case with Link C. In the ideal case, the B or C examples are the acceptable links.

If the original source is a PDF:

If you posted a link to an image within a PDF, but your image was autoremoved, leave your post as-is and message the mods. We'll make an exception for you. In your modmail message, be sure to include the original PDF.

Your post will be manually approved by the mod team, and we'll sticky the original PDF to the top of the comment thread so our readers can get context.

This is due to the fact that PDFs generally end up being a bit cumbersome for mobile browsers and data plans. We try to keep mindfulness of our mobile users while running the subreddit.

What counts as Original Content (OC)?

Original Content is a post where the person who posted the /r/DataIsBeautiful submission is also the author of the visual displayed. This means that they had gone through the steps of working with the data, performing the analysis, and finally designing the visual.

If that author's role in creating the visualization was little more than taking a screenshot, animating, or filtering, it's not OC. Authors must have designed the visualization somehow. A program as simple as Excel is fine because the user at least chooses the chart type. Google Ngram or mouse trackers are not OC unless you are a creator of that software.

Put simply, you need to do more than image manipulation (such as cropping or animating) or filtering (such as panning, zooming, and subset selection). You need to control not just which data is visualized, but how it's visualized.

You must substantially change the visualization if your OC is based on an existing visualization. This means you cannot simply change the colors or positioning and claim OC. We consider such weak edits plagiarism (see below), and we have a two-strike policy.

Tips for making a great OC post.

What is Plagiarism?

It actually pains me to say this, but there have been several instances of blatant plagiarism we've caught here in a reasonably short period of time. A lot of these instances are actual spammers who are trying to pull a quick one on us to promote their websites, but still other instances are redditors who have been getting warning after warning about why it's unethical to steal hard work from another viz practitioner and claim it as your own.

It's really sad that we have to spell it out, but here's a brief case study on what doesn't constitute OC:

  • Taking someone else's data viz and hosting it on another site.
  • Taking someone else's data viz and hosting it on another site, after making insignificant touchups in photoshop.
  • Taking someone else's data viz and hosting it on another site, after relabeling the axes to make it look funny.
  • Taking a screenshot of someone else's mobile app or website, and hosting it on another site.
  • Generating a replica of someone else's viz using someone else's code, without proper attribution.
  • Generating a copy of someone else's viz; without changing the data, the way it's presented, or adding anything unique.

Rule 2 is very clear, telling you to link to the original source. If it's not your viz, don't claim it as OC. In a perfect world, we shouldn't have to post a reminder about what plagiarized content is, since that's something that's taught in Elementary School---but here we are. Original Content (or "OC" for short) often takes redditors hours to complete. A lot of professional data practitioners take many workdays to complete their viz. Please respect their time by linking directly to the original material they created. If you are basing your work off of theirs (aka remixing), then take the time to give them credit. If it's not your OC, then don't claim it as OC. Period.

If you come across an instance of plagiarized content, or you need clarification about something, please send us a modmail and we'll respond as soon as possible.

Remixing

If you are remixing someone else's content, as opposed to plagiarism, that is perfectly allowable. However there must be a significant transformation to the visualization. The remix should have one or more of the following qualities:

  • Using a different source dataset, or "updating" someone else's work to apply to a more recent set.
  • Displaying a dataset quantifiably differently (e.g. changing a staggered bar into a stacked area).
  • Performing an analysis on the same dataset, but in a way that's different from the original post.

All other minor touchups should be done as a comment in the original thread (not a separate post), to avoid offending our plagiarism rule. In all cases whatsoever, you should give the original author credit. Again, if you're unsure, please send us a modmail and we'll respond as soon as possible.

Infographic vs Visualization

An infographic is made manually (e.g., via Illustrator), whereas a visualization is automatically generated from data. Here (1 2 3) are some example infographics.

Notice that while infographics are based on data, they are not generated systematically from data. A good test is that swapping out a dataset (e.g., to a different year or different location) should require little to no manual intervention. A visualization can just be regenerated, whereas an infographic has to be remade manually.

Sometimes a visualization is embedded in an infographic, which is sometimes acceptable but makes the boundary a bit fuzzy. If you're unsure, please contact the moderators before posting it.

If a post is clearly an infographic, please report it.

Note: We have nothing against infographics. They're just harder to regulate and require a categorically different skill to create. See /r/infographics for more infographics.

Compilations

We believe authors should be credited for their work. To achieve that, we ask that all visualizations posted in /r/DataIsBeautuful link directly to the original source of the work.

Blog posts and other articles that sidestep this by rehosting multiple visualizations from several authors will not be allowed.

Reposts

Popular posts (that have 100+ upvotes) posted in this subreddit within the past 2 months cannot be resubmitted. Please check /new or search for a couple keywords before posting.

Describing the data plainly

This rule was crafted due to the high amounts of sensationalism we started seeing in this subreddit.

The title of your post should avoid:

  • ALL CAPS, which is BASICALLY SHOUTING.
  • Same thing with exclamation marks!!!
  • Phrases like "you won't believe", in "one simple chart", and other clickbait headlines
  • Sweeping generalizations, impending doom, wild conclusions or postulations, especially if they have nothing to do with the visual or are based on outliers
  • Keywords like "Amazing", "Incredible", "Shocking", "Stunning".
  • Superlatives, when they don't apply.

It doesn't matter if you copied and pasted the title from the article; sensationalism is sensationalism. Let the visual's "amazingness" stand for itself; good design should be its own headline. If the graph is showing "Baseball player batting averages by season", then that should be your title.

While not required, it's good principle to avoid putting conclusions in your title, and also avoid pointing out the outliers. Instead of titling "Tribbles are wigglier!" or "This Tribble is falling behind", simply post "Tribble wiggliness over time" and let the readers come to their own conclusion in the comments.

Politics

We have a restriction on political posts, especially given the popularity of this sub and the massive flareup in political content. American Politics are only permitted on Thursdays (ET). It's still our most popular rule to date, and our loyal readers had been asking for it for ages.

A lot of great content often gets posted in this sub. But these posts get completely overlooked because of political bandwagoning on submissions; often submissions that the voter didn't read at all, but upvoted because it reaffirms their political bias at the time. This phenomenon has been choking out a lot of the often very good, high-quality submissions that actually do belong in this subreddit, and what made this sub a powerhouse of awesome content in its history before default.

What qualifies as Politics?

A quick rule of thumb: if you can find a flamewar about the United States people or policy on your Facebook feed, it's likely going to be removed if you post it here. Topics in the news, culture wars, outrage porn, federal policy; essentially, stuff that's about America and guaranteed to generate a good amount of heat in the comments section.

But in case that's not clear enough, included are common examples below. (Included are the examples below, not limited to)

Topics that are not gonna fly unless you post on Thursday:

  • The Federal budget, especially when there are budget talks in Congress
  • Gun policy, gun deaths, praising/complaining about guns. (An exception: if you went to a shooting range and made a heatmap of your targets, that's OK. We're concerned more specifically with guns as they relate to policy in the US and the flamewars they create in the comments section.)
  • Trump, Hillary, Jeb, Bernie, Obama, Bush, and other knuckleheads, their Twitters, their personalities, whatever.
  • X statistic by Party, by Presidential Administration, by control over House/Senate, etc.
  • What politicians think about Global Warming policy in the US, Paris Climate Deal visuals, etc.
  • US healthcare stats, US healthcare policy, US healthcare bankruptcies, how US healthcare is awful, US compared to other countries, US healthcare anything.
  • Why it should be legal to smoke weed in the US.
  • Information about a recent protest.

Here are some topics that are likely OK to post (and why):

  • The EU budget (Remember: the rule restricts American politics, it doesn't restrict European, Canadian, Australian, etc. politics)
  • A heatmap of last week's target practice with your AR-15 (see above)
  • UK election results (Remember: the rule restricts American politics, it doesn't restrict European, Canadian, Australian, etc. politics)
  • Global Warming (As long as it doesn't relate directly to policy in the US, or knuckleheads who said a thing, it should be fine)
  • Marijuana sales (Again, as long as it isn't directly preaching about policy in the US, or knuckleheads who said a thing, it should be fine)
  • The history of Alcohol Prohibition in the US (Policy that's 100 years old is probably going to be fine, as long as it's not preachy about modern policy)
  • Information about a protest that happened in 1932 (again, as long as it's not preaching about modern protesting)

All we're asking is for you to hold on to the political visuals in your browser tab until Thursday, Eastern Time. If you need a reminder, try sending a PM to this little fella, or using a scheduler.

Why Thursday?

A lot of political events in the US occur on Tuesdays. This includes primary votes, elections, gubernatorial events, and so on. This kind of timeline gives viz designers (professionals as well as amateurs) roughly 36 hours to work with the data, perform the analysis, and finally design the visual.

Why all the rules? Why not let the votes decide?

The official Reddit FAQ answers this exact question:

The reason there are separate subreddits is to allow niche communities to form, instead of having one monolithic overall community. These communities distinguish themselves with a unique focus, look and policies: what's on- and off-topic there, whether people are expected to behave civilly or can feel free to be brutal, etc.

One issue that arises is that casual, new, or transient visitors to a particular community don't always know the rules that tie it together.

As an example, imagine a /r/swimming and a /r/scuba. People can read about one topic or the other (or subscribe to both). But since scuba divers like to swim, a casual user might start submitting swimming links on /r/scuba. And these stories will probably get upvoted, especially by people who see the links on the reddit front page and don't look closely at where they're posted. If left alone, /r/scuba will just become another /r/swimming and there won't be a place to go to find an uncluttered listing of scuba news.

The fix is for the /r/scuba moderators to remove the offtopic links, and ideally to teach the submitters about the more appropriate /r/swimming subreddit.

Self-promotion, spam, and you

Dataisbeautiful thrives off of the visualizations and OC our submitters make, as well as visualizations users have found. We welcome original content by our users, but please try to contribute to our community instead of just using DiB to promote your site/blog/whatever it is. Users should make 9 non-personal submissions (both posts and/or discussion-focused comments) for every post containing content they own or from which they benefit in some way. If you post a link to your visualization on your blog to /r/dataisbeautiful and to /r/infographics, you should make at least 18 comments. If more than 2-3 items on the first page of your posting history are self-promoting, it's very likely you would be considered a spammer.

  • Any post on reddit, whether it is a link, text, or comment submission, that links to content that is an attempt by its author and/or submitter to gain traffic is considered self-promotion.
  • Posting or linking content that does not benefit you is not considered self-promotion. Content that “benefits you” includes content you own, content you helped create, content featuring you, and content released by a company you work for.
  • For comments to be considered in the ratio, they must be part of a discussion, answer questions from other users, not be short one or two-word comments, and they must not include your content. “Shotgunning” comments on random posts in order to post your content is greatly frowned upon.
  • If you wish to advertise, you can do so through reddit.

Common Questions

Where can I get ____ dataset?

Try searching or asking in /r/datasets or /r/SampleSize.

How do I make a visualization?

  1. Don't underestimate the power of Excel, Google Docs, or Plot. A "simple" line chart (or a collection of them) can be powerful tools in conveying interesting information.
  2. Tableau Public and StatTrends are free tools that can make some powerful visualizations.
  3. If you're comfortable with programming, try d3.js, R with RStudio, or matplotlib with IPython Notebook.
  4. If you're really comfortable with programming and have a sufficiently complex dataset, hacking away with a graphics library (opengl, webgl, directx, canvas, etc.) is time consuming but can yield some powerful results.

Can I post a question?

Yes, but you have to include a visualization with the question. You may want to add [Critique] to the title of your post. This community can be very helpful if you want to improve an existing visualization. However, you have to show a willingness to take the first step. You can request a critique and suggestions, but you can't just ask "How do I get data and make a visualization about topic X?" See the above two questions.

Also consider /rhttps://www.reddit.com/design_critiques for design critiques or /r/DataVizRequests if you want to request an entirely new visualization.

Can I post my own visualization?

Please do! Be sure to add [OC] to the post title.

  • Please review this guide to a great post.
  • Consider submitting your work to the Wikimedia Commons for use in Wikipedia.
  • Please share your data or source code. It's not required, but it can be useful. Giving everyone an opportunity improve your visualization helps everyone learn how to improve. Also, it's awesome to see different takes on a common dataset. And remember to give credit if you use someone else's code or dataset.

We prefer that DataIsBeautiful is composed of original content. The more we build off of each other's work (modifying and redesigning other submissions), the more useful DataIsBeautiful becomes.

Does appearance matter?

Yes! But pretty pictures are not the aim of this subreddit. Posts should strive to present information as effectively as possible. Part of that process is visual design. Default output from Excel, R, mapping programs, etc. can be overly cluttered and hard to understand. Try looking at font sizes, erroneous grid lines, alignment, and aliasing. A lack of good design ultimately limits the ability of a visualization to convey information.

However, don't downvote because you think a post is ugly. If you have some design experience, please add some constructive criticism, so people know how to improve.

Shouldn't it be "data ARE beautiful"?

In modern English, ''data'' is primarily treated as a mass noun. If we were discussing the beauty of an individual ''datum'', and we had many of these, then it would be plural.

Here, we refer to ''data'' as a whole, akin to water, fire, or information. "The water ARE cold" is not correct.

Oxford English Dictionary:

In modern non-scientific use, however, it is generally not treated as a plural. Instead, it is treated as a mass noun, similar to a word like information, which takes a singular verb. Sentences such as data was collected over a number of years are now widely accepted in standard English.

Guardian style guide:

takes a singular verb (like agenda), though strictly a plural; no one ever uses "agendum" or "datum"

"Data" has become a synonym for "dataset" or "information". And the word "datum" is of little practicality in the context of visualization design, where it could refer to a row, a cell, or a bit.

TL;DR: "Data is beautiful" is a grammatically (and semantically) correct statement.

Here's some data on the use of "data" by /u/philshem

http://i.imgur.com/1TFYFnE.png

Haha, Data from Star Trek IS beautiful

Seriously, it was funny the first fifty times. Now it's annoying. Go bother these guys instead.


revision by zonination— view source