- Missing value chart
The missing value chart is the easiest
way to visualize the multivariate structure of missing values.
The Missing value chart draws a horizontal bar for each variable.
The left part of the bar represents the proportion of data that
is not missing, whereas the right part represents the proportion
of missing values inside the selected variable. Using highlighting
it is now very easy to access the different kinds of data. The
righthand figure shows an example of this missing value chart
for the Crash dataset. This is a version of the missing value
chart originally implemented in the REGARD software.
|

|

|
Missings can be incorporated in a very natural way for histograms
and barcharts by adding a bar corresponding to the amount of
missing data. The lefthand figure shows two examples, where we
find the number of missing values represented by an additional
white bar at the right (barchart) or left (histogram) of the
plot separated by a small gap.
|
|
Spinograms are the equivalent of a spineplot
for continuous data. Not the height but the width of a bar is
proportional to the number of counts in it. With this, highlighting
heights can easily be compared overall. The pictures on the right
show a histogram and a corresponding spinogram. Only from the
spinogram the trend of increasing highlighting proportions becomes
obvious.
Since the horizontal axis no longer
is continuous, the bars are drawn separated by a small gap.
|
 |
The barcharts to the right and to the
bottom are shown as standard barcharts as well as spineplots. A spineplot showns its counts not by bars
with proportional height but by bar with proportional width,
which is the first step to a Mosaic
Plot.
|
 |
|
|

|

Note, that overlapping points add their brightness. This is
a very good means of avoiding a loss of information due to an
overplotting in areas of high density. The size of the points
can be altered interactively, giving the user control over the
amount of this smoothing.
|
- Both values were recorded
- The x-value was recorded, the y-value not.
- The x-value was not recorded, but the y-value was.
- Neither of the values was recorded.
Values belonging to the first case are plotted in the classical
way. Values belonging to case two or three can be drawn as projections
along the x- or y-axis. Only case four cannot be incorporated
in the scatterplot. For this, each scatterplot has three additional
boxes at the bottom. The leftmost box represents the proportion
of missing x-values, the middle box the proportion of missing
x- and y-values, and the rightmost box the proportion of missing
y-values. Thus the user can easily select all cases belonging
to case four from this middle box.
|
- Mosaic Plots
Mosaic plots, see e.g. the articles
of Michael
Friendly, can handle many categorical
variables at once. MANET offers mosaic plots with an arbitrary
number of variables but at most 1000 cells per plot. The incorporation
of missing values is done in the same way as in a barchart by
adding an additional category including all missing values of
this variable. The lower right figure shows a mosaic plot for
two variables, having 5 (year), 3+1 (country) categories, leading
to 5*4=20 boxes inside the plot. The implementation of the mosaic
plot is fully interactive, and an extension of the definition
of Hartigan & Kleiner in 1983. The highlighting information
is added to the plot just like an extra binary variable, but
without inserting a gap between the two categories (highlighted,
not highlighted).
|

|
The practical use of mosaic plots with
non trivial data, shows that often a lot of combinations of categories
include no values, and thus are empty. To distinguish those boxes
from boxes which include only very few values, and thus are only
very small (e.g. one or two pixels of width or height), really
empty boxes are plotted with a "0" in the middle, to
indicate that they are of size zero.
Display Options
The visual perception depends not only
on the sizes of the boxes, but also on the size of the gaps between
the boxes. To achieve an alternate view of a mosaic plot, the
user can choose the "mondrian"-display of a mosaic plot, where all gaps are left out.
The name of this option can be understood by looking at paintings
of Mondrian. To understand the distribution of the epmty cells
more easily it can be usefull to look at a plot with all bins
having the same size.
To find out more about the distribution
of the different bin sizes inside a mosaic plot, an additional
histogram of the bin sizes of the corresponding mosaic plot can
be plotted. This histogram includes the option of showing the
real amount of values belonging to a bin of a certain size, or
the number of bins having a certain size. This histogram is fully
linked with all other plots, too.
For more details see:
|


|
 |
Classical methods of graphical data visualization start from
low dimensional projections of the data and try to explore and
reveal higher dimensional structures by various linking techniques.
This raises the question concerning 'interesting' projections
in data sets.
There are several methods from from multivariate statistics
for this purpose such as principal component analysis or correspondence
analysis. By reducing dimension according to different criteria
of optimization these techniques allow to concentrate on only
a few factors. But to locate structural anomalies their applicability
is limited - as in different data sets dependent on the context
different structures are 'special'.
|
|
Graphical methods are however extremely useful for drawing
such conclusions.
The basic idea of interactive biplots is to combine those
approaches.
First of all graphics are a tool to visualize the theoretical
methods. They give an impression of how well the method works
for the underlying data as well as it reveals the said structural
particularities.
The implementation of Biplots into the software MANET shows,
that with additional interactive methods such as querying, linking
and highlighting a useful working process with this kind of graphic
becomes possible.
|

|
- Polygon Plots
To handle spatial data, MANET offers
polygon plots as well. Once the map is read in, it can be used
in the same interactive manner as all other plots. Selecting,highlighting
and interrogation is defined for polygons, too.
On the left hand a map of Bavaria is
shown. Here we do not find any highlighting, but a shading with
the variable population density. Shading by variables
is simply done by dragging and dropping the variable to the map.
The definition of the 32 steps of grey can be modified by a simple
power transfer-function. As well as the exponent of the transfer-function,
the sign of the base can be altered. In the left example high
values correspond with high brightness, which could be changed
by changing the sign.
Categorical data can be used for the
shading of maps, too!
|