MANET


Missings
Are
Now
Equally
Treated

A brief summary of MANET features

MANET basics
Plots in MANET
First Interactive Implementations of
New Interactive Features
Ongoing Research includes
Odds & Ends


 

Plots in MANET

MANET incorporates missing values in nearly all known statistical plots including:

  • Missing value chart

    The missing value chart is the easiest way to visualize the multivariate structure of missing values. The Missing value chart draws a horizontal bar for each variable. The left part of the bar represents the proportion of data that is not missing, whereas the right part represents the proportion of missing values inside the selected variable. Using highlighting it is now very easy to access the different kinds of data. The righthand figure shows an example of this missing value chart for the Crash dataset. This is a version of the missing value chart originally implemented in the REGARD software.

  • Histograms

Missings can be incorporated in a very natural way for histograms and barcharts by adding a bar corresponding to the amount of missing data. The lefthand figure shows two examples, where we find the number of missing values represented by an additional white bar at the right (barchart) or left (histogram) of the plot separated by a small gap.

 

  • Spinograms

Spinograms are the equivalent of a spineplot for continuous data. Not the height but the width of a bar is proportional to the number of counts in it. With this, highlighting heights can easily be compared overall. The pictures on the right show a histogram and a corresponding spinogram. Only from the spinogram the trend of increasing highlighting proportions becomes obvious.

Since the horizontal axis no longer is continuous, the bars are drawn separated by a small gap.

 
  • Barcharts & Spineplots

The barcharts to the right and to the bottom are shown as standard barcharts as well as spineplots. A spineplot showns its counts not by bars with proportional height but by bar with proportional width, which is the first step to a Mosaic Plot.

 
  • Boxplots & Dotplots

    Boxplots and Dotplots can handle missing values only in a more rudimentary way. For each boxplot or dotplot of a variable that includes missing values a missing values plot is drawn. The figure shows boxplots and dotplots with their corresponding missing values charts. Plotting several boxplots simultaneously yields one combined missing values window.


Note, that overlapping points add their brightness. This is a very good means of avoiding a loss of information due to an overplotting in areas of high density. The size of the points can be altered interactively, giving the user control over the amount of this smoothing.
  • Scatterplots

    A scatterplot includes two variables. Thus each observation can belong to four different states:

  1. Both values were recorded
  2. The x-value was recorded, the y-value not.
  3. The x-value was not recorded, but the y-value was.
  4. Neither of the values was recorded.

    Values belonging to the first case are plotted in the classical way. Values belonging to case two or three can be drawn as projections along the x- or y-axis. Only case four cannot be incorporated in the scatterplot. For this, each scatterplot has three additional boxes at the bottom. The leftmost box represents the proportion of missing x-values, the middle box the proportion of missing x- and y-values, and the rightmost box the proportion of missing y-values. Thus the user can easily select all cases belonging to case four from this middle box.

  • Mosaic Plots

    Mosaic plots, see e.g. the articles of Michael Friendly, can handle many categorical variables at once. MANET offers mosaic plots with an arbitrary number of variables but at most 1000 cells per plot. The incorporation of missing values is done in the same way as in a barchart by adding an additional category including all missing values of this variable. The lower right figure shows a mosaic plot for two variables, having 5 (year), 3+1 (country) categories, leading to 5*4=20 boxes inside the plot. The implementation of the mosaic plot is fully interactive, and an extension of the definition of Hartigan & Kleiner in 1983. The highlighting information is added to the plot just like an extra binary variable, but without inserting a gap between the two categories (highlighted, not highlighted).

    The practical use of mosaic plots with non trivial data, shows that often a lot of combinations of categories include no values, and thus are empty. To distinguish those boxes from boxes which include only very few values, and thus are only very small (e.g. one or two pixels of width or height), really empty boxes are plotted with a "0" in the middle, to indicate that they are of size zero.

    Display Options

    The visual perception depends not only on the sizes of the boxes, but also on the size of the gaps between the boxes. To achieve an alternate view of a mosaic plot, the user can choose the "mondrian"-display of a mosaic plot, where all gaps are left out. The name of this option can be understood by looking at paintings of Mondrian. To understand the distribution of the epmty cells more easily it can be usefull to look at a plot with all bins having the same size.

    To find out more about the distribution of the different bin sizes inside a mosaic plot, an additional histogram of the bin sizes of the corresponding mosaic plot can be plotted. This histogram includes the option of showing the real amount of values belonging to a bin of a certain size, or the number of bins having a certain size. This histogram is fully linked with all other plots, too.

    For more details see:


 

 

  • Biplots

Classical methods of graphical data visualization start from low dimensional projections of the data and try to explore and reveal higher dimensional structures by various linking techniques.

This raises the question concerning 'interesting' projections in data sets.

There are several methods from from multivariate statistics for this purpose such as principal component analysis or correspondence analysis. By reducing dimension according to different criteria of optimization these techniques allow to concentrate on only a few factors. But to locate structural anomalies their applicability is limited - as in different data sets dependent on the context different structures are 'special'.

Graphical methods are however extremely useful for drawing such conclusions.

The basic idea of interactive biplots is to combine those approaches.

First of all graphics are a tool to visualize the theoretical methods. They give an impression of how well the method works for the underlying data as well as it reveals the said structural particularities.

The implementation of Biplots into the software MANET shows, that with additional interactive methods such as querying, linking and highlighting a useful working process with this kind of graphic becomes possible.

  • Polygon Plots

    To handle spatial data, MANET offers polygon plots as well. Once the map is read in, it can be used in the same interactive manner as all other plots. Selecting,highlighting and interrogation is defined for polygons, too.

    On the left hand a map of Bavaria is shown. Here we do not find any highlighting, but a shading with the variable population density. Shading by variables is simply done by dragging and dropping the variable to the map. The definition of the 32 steps of grey can be modified by a simple power transfer-function. As well as the exponent of the transfer-function, the sign of the base can be altered. In the left example high values correspond with high brightness, which could be changed by changing the sign.

    Categorical data can be used for the shading of maps, too!



Martin Theus, Juli 1997