×
all 27 comments

[–]Reckon1ng 0 points1 point  (0 children)

What softwares do you all use and how hard is it?

[–]suvl 0 points1 point  (0 children)

I'm comparing text-to-speech providers in order to assess which one understands European Portuguese the better. I have the results from about 3k people reading a script and classifying the resulting text.

Now I wanna plot it visually, as in, how many times all providers correctly translated the voice to the corresponding text, how many times none did and all in between (provider a and b but not c, a and c but not b,...)

Funny thing, a month into this and I still don't have a clue about how to visualise the data other than a table. Any help? Thanks.

[–]captmomo 0 points1 point  (0 children)

Hi, for practice I'm trying to visualize movie ratings vs production budgets, worldwide gross and domestic gross.
This is an example of what I have so far.
How do I improve on this?
How do I decide which should be the x or y axis?
Should I plot all 3 datasets on the same scatterplot?
Should I annotate each point?
Thanks.

edit; Here's my d3 chart

[–]GLHFKA 0 points1 point  (0 children)

Hey all. Sort of a low-level random question, but, I figured this would be the best place to ask. I'm trying to map out a mildly complicated schematic depicting a phone forwarding scheme using regular phones and Google Voice numbers for my workplace. What would be the best tool to make a visualization for this forwarding scheme? Thanks!

[–]FFL_SoMA 0 points1 point  (2 children)

Hey there,

New to this sub, but am a general Reddit lurker. I just started my career as a Research and Data Manager at a luxury real estate company. Some of these graphs and whatnot I’ve seen on the sub are absolutely stunning. In an effort to impress my superiors, can anyone provide me some tips or links to programs to help visualize all this data? I don’t know what applications/programs the people on this sub use, but they are incredibly stunning and I would like to take a stab at creating something like that for my company. Thank you for your time.

[–]zoninationOC: 42 2 points3 points  (1 child)

Personally I do R/ggplot2, but there are others.

Common /r/dataisbeautiful tools used:

  • Excel/Libreoffice/Google Sheets/Numbers - Typical spreadsheet softwares with basic plotting functions. Easy to learn but often gets called out for being corny or low-effort. It's also very "canned" and doesn't have a lot of basic functionalities that offer quality statistical representations (e.g. boxplots, heatmaps, faceting, histograms, etc.).
  • Tableau - Simple learning curve that offers more than a few basic plotting functions, and also allows interactive plots. Software is proprietary and "canned" and will cost you some. Maybe some more folks can elaborate what it's like to use, but this is my impression after hearing basic information from other users and witnessing lots of Tableau OC.
  • R (and by extension ggplot2) - R is my personal favorite, but one of the more advanced FOSS packages. The R (with ggplot2) code has a huge capability as a statistical engine and is used in a lot of parts of industry. This comes with a sharp learning curve, however. It can generate beautiful visuals, but it takes time to learn.
  • Python/matplotlib - FOSS. This is when you get into the raw code aspect of dataviz. Python is popular among software and FOSS fans, including but not limited to xkcd; and matplotlib is one of the packages that allows for plotting.
  • Gnuplot - Worth mentioning since some OC here is gnuplot based. Medium learning curve. However this software is not really well-supported, and the visuals don't come out too hot.
  • d3.js - FOSS, I think. Good for delivering high quality interactive plots. However the learning curve is steep. As is the case with R, it's capable of generating very high quality interactives.

As always, see if you can browse some of your favorite OC to see if there is a common thread among visuals that you like. All OC threads must state the tool they used (and OC-Bot will likely have a sticky to it), so if there's a lot of viz you like that's made with (say) Tableau or R, then that software is probably the right one for you.

[–]FFL_SoMA 0 points1 point  (0 children)

You just made my day and were incredibly thorough and helpful. Thank you so very much!

[–]SpookyWagons 2 points3 points  (1 child)

I'm looking into the possibility of making a mod for the game Civilization 6, one that would create a turn-by-turn network graph (or social network graph?) that would visualize the groupings and quality of the relationships between civilizations.

In the game, these are usually dictated by "diplomacy points". For example, if I were to agree to open up borders with Japan, Japan and I would improve diplomatic relations between each other by, say, 3 points. If I were to get caught spying on them, the total would go down by 9 points.

Needless to say, there are relationships between many civs happening at once, with many factors determining the point totals between them at any given turn. I thought a network graph would be the best way to show which civs are in similar families (i.e. an Axis Powers, or NATO, etc.)

My question, then, is what is the best tool to use to make these sorts of network graphs? I'm hoping to find a way to have the game spit out data, populate an excel sheet, and then have the program create the graph and send it back to the game. Thanks!

[–]zoninationOC: 42 0 points1 point  (0 children)

I hear that gephi is a nice platform nowadays.

[–]vibrate 1 point2 points  (1 child)

How would one visualise the movement of cash between the various crytocurrencies over the last 6 months?

I would particularly like to see how money moves out of one coin and into another.

I'm imagining a moving bar chart, with coins across the x axis and market cap along the y axis.

It would have to be animated, with pauses for various big announcements or IPOs, and a ticker for total market cap of all coins included.

Any ideas, or anyone willing to take a stab?

[–]brandonaaskov 0 points1 point  (0 children)

This is interesting, and would be really cool to see. I'm curious about the data source though: how are you going to determine that one coin was converted to another? That's typical of converting Bitcoin to some other alt-coin, but I don't know how you'd track something like NEO -> Ethereum since it would have to go through a Bitcoin exchange first.

[–]Toni_Chu 1 point2 points  (1 child)

Planning to track my running progress against my training plan. I also have resolved to run 1000 miles in 2018. I'd like any constructive criticism on this visualisation.

Here it is.

[–]zoninationOC: 42 0 points1 point  (0 children)

Hey. Looks good, however I have a couple of critiques to unpack:

  1. The date format. I assume you're non-American? Regardless, Unitedstatesians and Non-Unitedstatesians alike should all use ISO 8601 format (relevant xkcd). It's standard in a lot of programming for obvious reasons, and is starting to take root in a lot of other industries as well. Probably a better thing to do: You can start the X axis as "Day number" and start ordering them sequentially (1, 2, 3, ...) instead of requiring everybody turn their heads to look at the date.
  2. Dual axes. They are disabled in other softwares for a very good reason, but Excel hasn't gotten with the times it seems. Your plot can be split into two plots and that would quadruple the beauty. Stephen Few has a good argument about it here. Dual scales really only make sense if you're converting one unit to another (e.g. Fahrenheit to Celsius, Miles to Kilometers, etc.).
    • Honestly, I would either split the graph into two plots, or simply rid the bars altogether. Line graphs are intuitive enough that you don't have to include additional spatial comparisons.
    • Another possibility (and this might only be available in different software) is to compare the difference between your training plan and your actual miles run, by displaying a ribbon colored red/blue based on behind/ahead of your plan. But that's just a thought.
  3. Direct labels for the lines. Instead of needing to mentally note "red is this, blue is that", you can actually see the lines and labels with direct labelling without having to look back and forth at the key.
  4. Font size on Y axis - Shrink that down. Y-label "Cumulative Distance (miles)" should be the way to go since most standards have you say "Quantity (unit)"... maybe a secondary axis for "Cumulative Distance (kilometers)" for ease of conversion. Add a title as well.

You can always do more with less. See if you can go through your plot and start removing different elements to see if they're really necessary. Some people do this too much, but more often than not people do this too little and end up with noisy charts. Less is more; remove to improve.

[–]styler2goOC: 1 1 point2 points  (3 children)

I would love to get some feedback on how i can improve the data or which cool visualizations i could do with my data!

http://goedhartvoordieren.nl/?page=r/dataisbeautiful/comments/7p6otm/meassuring_my_computers_energy_consumption_over/

[–]zoninationOC: 42 1 point2 points  (2 children)

Okay. A lot to unpack here.

  1. The first glaring issue is units. What are your units? Watts? Joules? BTU/hr? horsepower? kW-h? Kiloergs? Milliamps at 120V RMS? Chickens? Please display them loud and proud.
  2. Copypasting from above: The date format. I assume you're non-American? Regardless, Unitedstatesians and Non-Unitedstatesians alike should all use ISO 8601 format (relevant xkcd). It's standard in a lot of programming for obvious reasons, and is starting to take root in a lot of other industries as well. Probably a better thing to do: You can start the X axis as "Day number" and start ordering them sequentially (1, 2, 3, ...) instead of requiring everybody turn their heads to look at the date.
  3. Too much information is in this image, and I think there's a way to combine one or two graphs the next time you do it.
    1. Power consumption over 3 months might be a good place to start. Log the power consumption as a function of what you're doing each day. (e.g. how many joules did you spend gaming? how many joules browsing?)
    2. With a stacked bar graph, plot the power consumption; but this time use your gaming, surfing, idling etc. values.
  4. !pies, below:

[–]styler2goOC: 1 0 points1 point  (0 children)

Thank you so much! I will work on that tomorrow.

Is there any additional information I can get out of my data? Like, I could read on which weekday the computer was on the most time etc.

[–]AutoModerator[S,M] 0 points1 point  (0 children)

You've summoned the advice page on !pies,. There are issues with Pie/Doughnut charts that are frequently overlooked, especially among Excel users and beginners. Here's what some experts have to say about the subject:

  • In Save the Pies for Dessert, Stephen Few argues that, with a single rare exception, the data is better represented with a bar chart. In addition to this, humans are terrible at perceiving circular area.
  • ExcelCharts argues that the pie chart is simply a single stacked bar in polar coordinates, and that there are many pitfalls to using this type of visualization. In addition, the author also argues that pie charts are better displayed as bar charts instead.
  • Edward Tufte, data viz thought leader, states about pie charts "A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between charts [...]. Given their low density and failure to order numbers along a visual dimension, pie charts should never be used." (excerpt from The Visual Display of Quantitative Information).
  • Cole Knaflic in this article rants about her hate of pie charts, and boldly states they should not be used.
  • Joey Cherdarchuk in this article shows how easily pies can be easily replaced by bar charts.

If you absolutely must use a pie, please consider the following:


I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]JehovahsBestWitness 1 point2 points  (6 children)

I'm building a graph tracking my roommates time at home (since he never is) and was wondering how on excel to make a graph that was a time line for the month where when here was here it's one colour and when not it's another.

Was hoping to also have it change as I altered the input data

[–]gash_dits_wafu 1 point2 points  (0 children)

Have two columns. The first is amount of time in hours that your roommate is in. The second column is 24-(the first column's number). This will give you the time that he's in and when he's out.

Label the columns 'Time in' and 'Time out'. Label the rows 'mon, tue, wed' etc.

Then choose a '100% stacked area' graph to display it.

[–]Irascible_92 1 point2 points  (4 children)

I'm currently tracking my sleep each night using a horizontal bar diagram. The design of it may be similar to what you are looking for. I built this in Excel, if you want to know how I can help.

[–]JehovahsBestWitness 0 points1 point  (3 children)

This is perfect, how did you build it on excel?

[–]Irascible_92 1 point2 points  (2 children)

I have 4 values for this graph

  1. Start of day = 12:00AM

  2. Woke up at = Time recorded by app

  3. Go to sleep at = Time recorded by app

  4. End of day = 11:59 PM

The values need to be changed to something the graph can read. The values for the graph are as follows.

  1. Start of day

  2. Woke up at - Start of day

  3. Go to sleep at - Woke up at

  4. End of day - Go to sleep at

When those values are used to create a stacked bar diagram the values show a bar for the times that I actually went to sleep. What excel is doing is measuring the hours between things (which is why the graphical values look weird). I would recommend looking up how to create a waterfall chart in excel as the technique used to create the floating bars is the same used in my chart.

If there is a place where I could upload my excel file for you to see I'm happy to do that as well. Good luck!

[–]JehovahsBestWitness 0 points1 point  (1 child)

What were your rise, fall and base values then? For me I need time here and time not here, do you have formula examples?

[–]Irascible_92 0 points1 point  (0 children)

From the Waterfall chart you just want to take the idea of using a stacked bar diagram with "invisble" bars.

My bars without fill/border is the time of day that I am awake which is calculated from "Go to sleep at - Woke up at" (number 3 in the above). And then my filled bars, time spent asleep, would be the Rise/Fall values.

[–]ElectedTulip462 4 points5 points  (1 child)

How would someone with little experience in this kind of field go about learning and creating graphs etc.?

[–]PelusterianoViz Practitioner 2 points3 points  (0 children)

Here's a copy+paste for a similar question. Original here.


Which of the following are you looking for?

a. Learning how to use a software to process and visualize data.

b. Learning the principles of data visualization (which chart should you use given the nature of your data)

c. Learning statistics to have a better idea of what the data means.

d. All of the above.

For (c), check the courses offered at Coursera, at edx, and the Khan Academy crash course.

You can say you've got a basic understanding of statistics when you know about: randomness, classic probability, bayesian probability, samples, data distribution, average/mean, mode, median, parametric statistics (based on a normal distribution) like t-test, Z-test, Pearson's correlation, one-way ANOVA two-way ANOVA, statistical inference. Then it moves to non-parametric statistics (non-normal distributions).

The most important part here is having a "statistical mind". Besides a regular textbook, I recommend "How to lie with statistics".

For (b) check the books by Edward Tufte, specially "The visual display of quantitative information", and learning about good graphic design principles, we also have some info at our wiki.

For (a) I recommend looking for courses on MS Excel (mainly to process data, not displaying it), R (to process and display), d3js (if you want to make dynamic and interactive displays), python (to process and display), Tableau (it's getting quite popular), etc.

Finally, I recommend you familiarize yourself with different types of data visualizations, for that I recommend this article and this site, and visit sites for dataviz for inspiration and ideas: Dark Horse Analytics, Five Thirty Eight, Minimaxir, several github.io profiles like Colin Morris or Zonination.