Tutorial for vaex as a library

Introduction

This tutorial shortly introduces how to use vaex from IPython notebook. This tutorial assumes you have vaex installed as a library, you can run python -c 'import vaex' to check this. This document although not a IPython notebook, is generated from a notebook, and you should be able to reproduce all examples.

Run IPython notebook

From the IPython notebook website: > The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media

To start it, run $ ipython notebook in your shell, and it should automatically open the main webpage. Start a new notebook by clicking new.

Starting

Start you notebook by importing the relevant packages, for this tutorial, we will be using vaex itself, numpy and matplotlib for plotting. We also configure matplotib to show the plots in the notebook itself

import vaex
import numpy as np
import matplotlib.pylab as plt # simpler interface for matplotlib
# next line configures matplotlib to show the plots in the notebook, other option is qt to open a dialog
%matplotlib inline

Open a dataset

To open a dataset, we can call vaex.open to open local files. See the documentation of vaex.open for the arguments, hit shift-tab (1 or 2 times) or run vaex.open? in the notebook for direct help. For this tutorial we use vaex.example() which opens a dataset provided with vaex. (Note that ds is short for dataset)

ds = vaex.example()
# ds = vaex.open('yourfile.hdf5') # in case you want to load a different dataset
# x = np.arange(10)
# y = x**2
# ds = vaex.from_arrays(x=x, y=y)  # if you have your own data in numpy arrays
# ds = vaex.from_csv('mydata.csv') # or from a comma seperated file

Read here about other ways of getting your data into vaex.

You can get information about the dataset, such as the columns by simply typing ds as the last command in a cell. It will show the types, units and descriptions, when available, as well as the first and last part of the data. Otherwise use ds.cat/head/tail to display parts of the data.

ds
helmi-dezeeuw-2000-10p 330000 rows
path: /net/jansky/data/users/breddels/vaex/data/helmi-dezeeuw-2000-10p.hdf5

Columns:

columntypeunitdescriptionexpression
Efloat64$\mathrm{km^{2}\,s^{-2}}$
FeHfloat64$\mathrm{dex}$
Lfloat64$\mathrm{km\,kpc\,s^{-1}}$
Lzfloat64$\mathrm{km\,kpc\,s^{-1}}$
random_indexint64
vxfloat64$\mathrm{km\,s^{-1}}$
vyfloat64$\mathrm{km\,s^{-1}}$
vzfloat64$\mathrm{km\,s^{-1}}$
xfloat64$\mathrm{kpc}$
yfloat64$\mathrm{kpc}$
zfloat64$\mathrm{kpc}$
rvirtual column$\mathrm{kpc}$
sqrt(x**2+y**2+z**2)
vvirtual column$\mathrm{km\,s^{-1}}$
sqrt(vx**2+vy**2+vz**2)

Variables:

variabletypeunitdescriptionexpression
pifloat64$\mathrm{}$
3.141592653589793
efloat64$\mathrm{}$
2.718281828459045
km_in_aufloat64$\mathrm{}$
149597870.7
seconds_per_yearfloat64$\mathrm{}$
31557600

Data:

#EFeHLLzrandom_indexvxvyvzxyzrv
0-121238.171875-2.3092276091645179831.0799560546875-336.426513671875151164853.276721999999999288.38604700000002-95.264907800000003-0.777470766999999952.10626291999999981.937434672.9655450396553587308.35097513952178
1-100819.9140625-1.7887354915912291435.1839599609375-828.756774902343752728665252.81079099999999-69.949844400000003-56.3121032999999973.77427315999999992.23387193999999983.762093315.7782928104901803268.28591046810448
2-100559.9609375-0.761810902247879841039.2989501953125920.802490234375120263296.276473999999993226.440201-34.7527161000000011.3757626999999999-6.32838442.63250017000000016.9907960395025599248.49964859355492
3-70174.8515625-1.52087784229364132441.7248535156251183.58996582031251020502204.968842-205.67901599999999-58.977703099999999-7.06737804000000041.31737781-6.10543537000000049.4318427527075368296.30128801019578
4-144138.75-2.6553413584273611374.81643676757812-314.535308837890623154816-311.74237099999999-238.41217186.8241270.243441463-0.82278168200000001-0.206593871000000010.88256131213479672434.65724785182482
5-100301.9765625-1.61907618189987181687.4610595703125-529.640563964843752909036104.109596-55.194023100000003150.26174900000001-5.171744357.82915306000000031.82668829000000019.5592555864715436190.95491975276664
6-80560.671875-2.5601499426349887476.8902587890625-420.01226806640625226546-0.5102190380000000126.511215199999999-7.0045742999999998-15.9538851000000015.7712588299999998-9.024723050000000419.2166465439747427.425698094518381
7-99240.9375-1.6386098726165452852.242919921875252.835693359375358186-3.9217429199999998240.036438-256.7494811.05089223000000010.148724243000000013.05691837999999993.2359300612825548351.50130530202557
8-99924.7109375-1.05100281124878261406.838134765625886.528747558593751238783-38.97710800000000178.849372900000006105.4133311.04144290.40839070100000002-2.539742710000000111.337130911814587137.28950686944657
9-78041.8046875-0.83870187570062704842.8492431640625807.23571777343751437223-75.73140720000000719.904655500000001-22.135990100000001-12.399496113.9181805-5.434823040000000419.41650209076316481.372252056663584
..........................................
329990-130822.6328125-1.7779991672918616433.923095703125-396.09362792968751909169149.904144-131.8231350000000128.422914500000001-2.380180844.73540306000000030.141765862999999995.3018299229296861201.63445483913523
329991-129541.703125-2.5892089204307789358.84732055664062-354.626647949218753193393255.891052160.5109140.7823180999999981.754986762.48668790000000020.290353477000000033.0574336266528661304.80679157837739
329992-93620.3046875-2.4817783372090609329.89315795898438-292.584716796875299370-0.290237576-61.856170732.0571631999999984.786410329999999812.0049686-5.987076280000000114.24338714339588969.670164374353931
329993-131752.046875-1.9962179951963634518.6944580078125-244.807281494140622570758256.344177199.546753-80.569244400000002-1.8731038600000001-0.50309121599999995-0.951977014999999982.1605275001840565334.69784411889668
329994-93047.671875-1.7442979896165114355.9798583984375238.03421020507812754529-21.202246.381319050000000118.4881591999999981.1481790510.88127049.595383639999999714.55306661317657628.845593154588403
329995-119687.3203125-1.6499842518381402746.88336181640625-508.964843751919483107.432999-2.1377129617.51302723.76883793000000014.6625165900000001-4.42904139000000017.4538317615146807108.87205891568915
329996-68933.8046875-1.43360362477208362395.6330566406251275.490234375106414132.0108.089264179.060638000000019.1740932500000003-8.8709135099999994-8.6170768715.398412491068198211.58922721402607
329997-112580.359375-1.93062275973619421182.436279296875115.585578918457033748458.4671134899999991-38.276523599999997-127.541473-1.1404100699999999-8.49576949999999982.25749826000000028.8642502739256326133.43017501586223
329998-74862.90625-1.22501981883856791324.59265136718751057.017333984375425745110.221558-31.392559186.272682200000006-14.298593500000001-5.5175042200000002-8.654723170000000517.601047186042507143.44776160253096
329999-95361.765625-2.5689636894079477351.09555053710938-309.81439208984375289364-2.1054141500000001-27.61088563.8079996110.5450506-8.86106777-4.658354280000000214.54018152497029327.951648133680198

To get a list with all column names, use Dataset’s get_column_names method. Note that tab completion should work, typing ds.get_c and then pressing tab should help your complete it.

ds.get_column_names()
['E', 'FeH', 'L', 'Lz', 'random_index', 'vx', 'vy', 'vz', 'x', 'y', 'z']

Calculating statistics

Vaex can calculate statistics for colums, but also for an expression build from columns.

ds.mean("x"), ds.std("x"), ds.correlation("vx**2+vy**2+vz**2", "E")
(-0.067131491264005971, 7.3174597654824751, array(0.00676355917633636))

Since columns names can sometimes be difficult to remember, and to take advantage of the autocomplete features of the Notebook, column names can be accessed using the .col property, for instance

print(ds.col.x)
x
ds.mean(ds.col.x)
-0.067131491264005971

Dataset contains many methods to compute statistics, and plotting routines, see the API documentation for more details, for instance for:

Most of the statistics can also be calculated on a grid, which can also be visualized using for instance matplotlib.

ds.mean("E", binby=["x", "y"], shape=(2,2), limits=[[-10,10], [-10, 10]])
array([[-119166.43858099, -118291.18402363],
       [-117650.31604966, -119542.86139539]])
mean_energy = ds.mean("E", binby=["x", "y"], shape=(128,128), limits=[[-10,10], [-10, 10]])
plt.imshow(mean_energy)
<matplotlib.image.AxesImage at 0x7fc504174a58>
_images/tutorial_ipython_notebook_15_1.png

Plotting

Instead of using “bare” matplotlib to plot, using the .plot method is more convenient. It sets axes limits, labels (with units when known), and adds a colorbar. Learn more using the docstring, by typing ds.plot? or using shift-tab, or opening Dataset.plot.

ds.plot("x", "y", limits=[[-10,10], [-10, 10]]);
_images/tutorial_ipython_notebook_17_0.png

Instead of plotting the counts, the mean of an expression can be plotted. (Other options are sum, std, var, correlation, covar, min, max)

ds.plot("x", "y", what="mean(vx)", limits=[[-10,10], [-10, 10]], vmin=-200, vmax=200, shape=128);
_images/tutorial_ipython_notebook_19_0.png

More panels can be plotting giving a list of pairs of expressions as the first argument (which we call a subspace).

ds.plot([["x", "y"], ["x", "z"]], limits=[[-10, 10], [-10, 10]], figsize=(10,5), shape=128);
_images/tutorial_ipython_notebook_21_0.png

And the same can be done for the what argument. Note that the f argument is the transformation that will be applied to the values, for instance “log”, “log10”, “abs”, or None when doing no transformation. If given as a single argument, if will apply to all plots, otherwise it should be a list of the same length as the what argument.

ds.plot("x", "y", what=["count(*)", "mean(vx)"], f=["log", None],
        limits=[[-10, 10], [-10, 10]], figsize=(10,5), shape=128, vmin=[0, -200], vmax=[4, 200]);
_images/tutorial_ipython_notebook_23_0.png

When they are combined, what will form the columns of a subplot, while the rows are the different subspaces.

ds.plot([["x", "y"], ["x", "z"]],  f=["log", None, None, None],
        what=["count(*)", "mean(vx)", "mean(vy)", "correlation(vx,vy)"],
        colormap=["afmhot", "afmhot", "afmhot", "bwr"],
        limits=[[-10, 10], [-10, 10]], figsize=(14,8), shape=128);
_images/tutorial_ipython_notebook_25_0.png

Selections

For working with a part of the data, we use what we call selections. When a selection is applied to a dataset, it keeps a boolean in memory for each row indicating it is selected or not. All statistical methods take a selection argument, which can be None or False for no selection, True or "default" for the default selection, or a string refering to the selection (corresponding to the name argument of the Dataset.select method). It is also possible to have expressions in a selection, but these selections will not be cached and computed every time when needed.

# the following plots are all identical
ds.select("y > x")
ds.plot("x", "y", selection=True, show=True)
ds.plot("x", "y", selection="default", show=True) # same as the previous
ds.plot("x", "y", selection="y > x", show=True); # similar, but selection will be recomputed every time
_images/tutorial_ipython_notebook_27_0.png _images/tutorial_ipython_notebook_27_1.png _images/tutorial_ipython_notebook_27_2.png

Multiple selections can be overplotted, where None means no selection, and True is an alias for the default selection name of “default”. The selections will be overplotted where the background will be faded. (Note that becase the log is taken of zero, this results in NaN, which is shown as transparent pixels.)

ds.plot("x", "y", selection=[None, True], f="log");
_images/tutorial_ipython_notebook_29_0.png

Selection can be made more complicated, or can be logically combined using a boolean operator. The default is to replace the current selections, other possiblities are: “replace”, “and”, “or”, “xor”, “subtract”

ds.select("y > x")
ds.select("y > -x", mode="or")
# this next line has the same effect as the above two
# dataset.select("(y > x) | (x > -y)")
# |,& and ^ are used for 'or' 'and', and 'xor'
ds.select("x > 5", mode="subtract")
ds.plot("x", "y", selection=[None, True], f="log");
_images/tutorial_ipython_notebook_31_0.png

Using the visual argument, it is possible to show the selections as columns instead, see Dataset.plot for more details.

ds.select("x - 5> y", name="other")
ds.plot("x", "y", selection=[None, True, "other", "other | default"],
        f="log", visual=dict(column="selection"), figsize=(12,4));
_images/tutorial_ipython_notebook_33_0.png

Besides making plots, statisics can also be computed for selections

ds.max("x", selection=True)
array(4.99998713)
ds.max("x", selection=[None, True])
array([ 271.365997  ,    4.99998713])
ds.max(["x", "y"], selection=[None, True])
array([[ 271.365997  ,    4.99998713],
       [ 146.465836  ,  146.465836  ]])
ds.mean(["x", "y"], selection=[None, True, "other", "x > y"])
array([[-0.06713149, -2.98854513,  5.90555941,  3.59256693],
       [-0.05358987,  2.99097581, -6.92724312, -4.19886827]])

Virtual columns

If a particular expression occurs often, it may be convenient to create a virtual column, it behaves exactly like a normal column, but it is calculated on the fly (without taking up the memory of a full column, since it is done is chunks).

ds.add_virtual_column("r", "sqrt(x**2+y**2+z**2)")
ds.add_virtual_column("v", "sqrt(vx**2+vy**2+vz**2)")
ds.plot("log(r)", "log(v)", f="log10");
_images/tutorial_ipython_notebook_40_0.png

Extra methods for creating common virtual columns are:

Don’t fear to look at the source (click the green link [source]).

More about the dataset

Vaex works best with hdf5 and fits files, but can import from other sources as well. File formats are recognized by the extension. For .vot a VOTable is assumed, and astropy is used for reading this. For .asc the astropy’s ascii reader is used. However, these formats require the dataset to fit into memory, and exporting them in hdf5 or fits format may lead to better performance and faster read times. Datasets can also be made from numpy arrays using vaex.from_arrays, or imported for convenience from pandas using vaex.from_pandas.

In the next example we create a dataset from arrays, and export it to disk.

# Create a 6d gaussian clump
q = np.random.normal(10, 2, (6, 10000))
dataset_clump_arrays = vaex.from_arrays(x=q[0], y=q[1], z=q[2], vx=q[3], vy=q[4], vz=q[5])
dataset_clump_arrays.add_virtual_column("r", "sqrt(x**2+y**2+z**2)")
dataset_clump_arrays.add_virtual_column("v", "sqrt(vx**2+vy**2+vz**2)")

# create a temporary file
import tempfile
filename = tempfile.mktemp(suffix=".hdf5")

# when exporting takes long, progress=True will give a progress bar
# here, we don't want to export virtual columns, which is the default
dataset_clump_arrays.export_hdf5(filename, progress=True, virtual=False)
print("Exported to: %s" % filename)
Exported to: /tmp/tmpzn_gtp6m.hdf5
exporting: 100% |####################################################################################################################################################################################| Time: 0:00:00 CPU Usage: ---%
ds_clump = vaex.open(filename)
print("Columns: %r" % ds_clump.get_column_names())
Columns: ['x', 'vx', 'vz', 'z', 'y', 'vy']

concatenating tables

Using the .concat method, datasets can be concatenated to form one big dataset (without copying the data).

ds2 = ds.concat(ds_clump)
ds2.plot("x", "y", f="log1p", limits=[[-20, 20], [-20, 20]]);
_images/tutorial_ipython_notebook_46_0.png

Shuffling

TODO

Efficient use of multiple calculations

Imaging you want to calcule the correlation coefficient for a few subspaces. First we calculate it for E and Lz.

ds.correlation("E", "Lz")
array(-0.09404020895356191)

In the process, all the data for the column E and Lz was processed, if we now calculate the correlation coefficient for E and L, we go over the data for column E again. Especially if the data does not fit into memory, this is quiet inefficient.

ds.correlation("E", "L")
array(0.6890619164898808)

If instead, we call the correlation method with a list of subspaces, there is only one pass over the data, which can me much more efficient.

ds.correlation([["E", "Lz"], ["E", "L"]])
array([-0.09404021,  0.68906192])

Especially if many subspaces are used, as in the following example.

subspaces = ds.combinations()
correlations = ds.correlation(subspaces)
mutual_informations = ds.mutual_information(subspaces)
from astropy.io import ascii
import sys
names = ["_".join(subspace) for subspace in subspaces]
ascii.write([names, correlations, mutual_informations], sys.stdout,
            names=["names", "correlation", "mutual_information"])
# replace sys.stdout by a filename such as "example.asc"
filename_asc = tempfile.mktemp(suffix=".asc")
ascii.write([names, correlations, mutual_informations], filename_asc,
            names=["names", "correlation", "mutual_information"])

print("--------")
# or write it as a latex table
ascii.write([names, correlations, mutual_informations],
            sys.stdout, names=["names", "correlation", "mutual information"], Writer=ascii.Latex)
names correlation mutual_information
E_FeH -0.014068223053940808 0.45424186792187254
E_L 0.6890619164898808 0.7404061337881687
E_Lz -0.09404020895356191 1.0706737929496781
E_random_index -0.1294438804260704 1.7132853532556342
E_vx -0.006280672311820793 0.1469942705483035
E_vy 0.01786299906409399 0.16308541508821273
E_vz 0.01921099148010763 0.14587270420692017
E_x -0.012435764665535712 0.37575458684670193
E_y -0.006099113309545572 0.43174233264307804
E_z 0.01244518987212551 0.39227851480701037
FeH_L -0.08144257446458827 0.3087854784476569
FeH_Lz 0.4653258482938841 0.67903425399271
FeH_random_index 0.2150690111880752 1.435098879037007
FeH_vx 0.010488839427892981 0.10342994503122623
FeH_vy -0.011055183715440823 0.1026181270505932
FeH_vz 0.00375742600931867 0.11145188427951819
FeH_x 0.005261856198512363 0.12492044422245742
FeH_y 0.015003295277717208 0.13276137709114438
FeH_z -0.024137983404556557 0.1438360275619597
L_Lz -0.1289411770767984 1.0311950571530903
L_random_index -0.01195286572778252 0.9863646953020592
L_vx -0.007520910755859279 0.11295633234855082
L_vy 0.023488436893893464 0.11208221353943382
L_vz 0.031076571360133275 0.13220733133586987
L_x -0.02566286245858961 0.19727462018198172
L_y -0.00838158892350927 0.21606618030384148
L_z 0.003231034609821104 0.2107242660410347
Lz_random_index -0.22159810290993964 1.8807524566430187
Lz_vx 0.02359219180674478 0.12115009113844397
Lz_vy -0.02312324972293643 0.11578395258810392
Lz_vz 0.03296464183711297 0.12410177399196126
Lz_x -0.00030294055309957344 0.2150162106163776
Lz_y 0.027260049760350104 0.23708979843321445
Lz_z -0.06334896485239119 0.24951081671985295
random_index_vx -0.00522915568828874 0.13209120343292605
random_index_vy -0.0007787184679206849 0.1308160819902086
random_index_vz -0.011321762177823203 0.18483519919555708
random_index_x 0.002157476491340153 0.32861028043895546
random_index_y -0.002740986556550819 0.36853287435535315
random_index_z 0.028387450432431596 0.5119671679312903
vx_vy -0.03524604328853534 0.11105372656186498
vx_vz 0.005550990948008108 0.12708618558232798
vx_x -0.0077917898183534 0.10435586691547903
vx_y 0.01804910998078914 0.1701339925382788
vx_z -0.021753308878140573 0.11626543575085835
vy_vz 0.009916570683825747 0.1316782544117304
vy_x 0.0001401879823959935 0.15943598551987362
vy_y -0.004114980900371909 0.10979192849816789
vy_z 0.029883551266368533 0.11377910176974865
vz_x 0.020449779578494472 0.10991350641870239
vz_y -0.028477638600608927 0.1153895233928019
vz_z -0.009658004899831468 0.10902906677577343
x_y -0.066913086088751 0.1511814526380327
x_z -0.026563129089248065 0.18439180585071951
y_z 0.030838572698652564 0.21418760688854802
--------
begin{table}
begin{tabular}{ccc}
names & correlation & mutual information \
E_FeH & -0.0140682230539 & 0.454241867922 \
E_L & 0.68906191649 & 0.740406133788 \
E_Lz & -0.0940402089536 & 1.07067379295 \
E_random_index & -0.129443880426 & 1.71328535326 \
E_vx & -0.00628067231182 & 0.146994270548 \
E_vy & 0.0178629990641 & 0.163085415088 \
E_vz & 0.0192109914801 & 0.145872704207 \
E_x & -0.0124357646655 & 0.375754586847 \
E_y & -0.00609911330955 & 0.431742332643 \
E_z & 0.0124451898721 & 0.392278514807 \
FeH_L & -0.0814425744646 & 0.308785478448 \
FeH_Lz & 0.465325848294 & 0.679034253993 \
FeH_random_index & 0.215069011188 & 1.43509887904 \
FeH_vx & 0.0104888394279 & 0.103429945031 \
FeH_vy & -0.0110551837154 & 0.102618127051 \
FeH_vz & 0.00375742600932 & 0.11145188428 \
FeH_x & 0.00526185619851 & 0.124920444222 \
FeH_y & 0.0150032952777 & 0.132761377091 \
FeH_z & -0.0241379834046 & 0.143836027562 \
L_Lz & -0.128941177077 & 1.03119505715 \
L_random_index & -0.0119528657278 & 0.986364695302 \
L_vx & -0.00752091075586 & 0.112956332349 \
L_vy & 0.0234884368939 & 0.112082213539 \
L_vz & 0.0310765713601 & 0.132207331336 \
L_x & -0.0256628624586 & 0.197274620182 \
L_y & -0.00838158892351 & 0.216066180304 \
L_z & 0.00323103460982 & 0.210724266041 \
Lz_random_index & -0.22159810291 & 1.88075245664 \
Lz_vx & 0.0235921918067 & 0.121150091138 \
Lz_vy & -0.0231232497229 & 0.115783952588 \
Lz_vz & 0.0329646418371 & 0.124101773992 \
Lz_x & -0.0003029405531 & 0.215016210616 \
Lz_y & 0.0272600497604 & 0.237089798433 \
Lz_z & -0.0633489648524 & 0.24951081672 \
random_index_vx & -0.00522915568829 & 0.132091203433 \
random_index_vy & -0.000778718467921 & 0.13081608199 \
random_index_vz & -0.0113217621778 & 0.184835199196 \
random_index_x & 0.00215747649134 & 0.328610280439 \
random_index_y & -0.00274098655655 & 0.368532874355 \
random_index_z & 0.0283874504324 & 0.511967167931 \
vx_vy & -0.0352460432885 & 0.111053726562 \
vx_vz & 0.00555099094801 & 0.127086185582 \
vx_x & -0.00779178981835 & 0.104355866915 \
vx_y & 0.0180491099808 & 0.170133992538 \
vx_z & -0.0217533088781 & 0.116265435751 \
vy_vz & 0.00991657068383 & 0.131678254412 \
vy_x & 0.000140187982396 & 0.15943598552 \
vy_y & -0.00411498090037 & 0.109791928498 \
vy_z & 0.0298835512664 & 0.11377910177 \
vz_x & 0.0204497795785 & 0.109913506419 \
vz_y & -0.0284776386006 & 0.115389523393 \
vz_z & -0.00965800489983 & 0.109029066776 \
x_y & -0.0669130860888 & 0.151181452638 \
x_z & -0.0265631290892 & 0.184391805851 \
y_z & 0.0308385726987 & 0.214187606889 \
end{tabular}
end{table}
# reading it back in
table = ascii.read(filename_asc)
print("this is an astropy table:\n", table)
correlations = table["correlation"]
print
print("this is an astropy column:\n", correlations)
print
print("this is the numpy data:\n", correlations.data)
# short: table["correlation"].data
this is an astropy table:
     names         correlation    mutual_information
-------------- ----------------- ------------------
         E_FeH  -0.0140682230539     0.454241867922
           E_L     0.68906191649     0.740406133788
          E_Lz  -0.0940402089536      1.07067379295
E_random_index   -0.129443880426      1.71328535326
          E_vx -0.00628067231182     0.146994270548
          E_vy   0.0178629990641     0.163085415088
          E_vz   0.0192109914801     0.145872704207
           E_x  -0.0124357646655     0.375754586847
           E_y -0.00609911330955     0.431742332643
           E_z   0.0124451898721     0.392278514807
           ...               ...                ...
          vx_z  -0.0217533088781     0.116265435751
         vy_vz  0.00991657068383     0.131678254412
          vy_x 0.000140187982396      0.15943598552
          vy_y -0.00411498090037     0.109791928498
          vy_z   0.0298835512664      0.11377910177
          vz_x   0.0204497795785     0.109913506419
          vz_y  -0.0284776386006     0.115389523393
          vz_z -0.00965800489983     0.109029066776
           x_y  -0.0669130860888     0.151181452638
           x_z  -0.0265631290892     0.184391805851
           y_z   0.0308385726987     0.214187606889
Length = 55 rows
this is an astropy column:
    correlation
-----------------
 -0.0140682230539
    0.68906191649
 -0.0940402089536
  -0.129443880426
-0.00628067231182
  0.0178629990641
  0.0192109914801
 -0.0124357646655
-0.00609911330955
  0.0124451898721
              ...
 -0.0217533088781
 0.00991657068383
0.000140187982396
-0.00411498090037
  0.0298835512664
  0.0204497795785
 -0.0284776386006
-0.00965800489983
 -0.0669130860888
 -0.0265631290892
  0.0308385726987
Length = 55 rows
this is the numpy data:
 [ -1.40682231e-02   6.89061916e-01  -9.40402090e-02  -1.29443880e-01
  -6.28067231e-03   1.78629991e-02   1.92109915e-02  -1.24357647e-02
  -6.09911331e-03   1.24451899e-02  -8.14425745e-02   4.65325848e-01
   2.15069011e-01   1.04888394e-02  -1.10551837e-02   3.75742601e-03
   5.26185620e-03   1.50032953e-02  -2.41379834e-02  -1.28941177e-01
  -1.19528657e-02  -7.52091076e-03   2.34884369e-02   3.10765714e-02
  -2.56628625e-02  -8.38158892e-03   3.23103461e-03  -2.21598103e-01
   2.35921918e-02  -2.31232497e-02   3.29646418e-02  -3.02940553e-04
   2.72600498e-02  -6.33489649e-02  -5.22915569e-03  -7.78718468e-04
  -1.13217622e-02   2.15747649e-03  -2.74098656e-03   2.83874504e-02
  -3.52460433e-02   5.55099095e-03  -7.79178982e-03   1.80491100e-02
  -2.17533089e-02   9.91657068e-03   1.40187982e-04  -4.11498090e-03
   2.98835513e-02   2.04497796e-02  -2.84776386e-02  -9.65800490e-03
  -6.69130861e-02  -2.65631291e-02   3.08385727e-02]

Where to go from here?

Continue reading on:

This tutorial covers the basics, more can be learned by reading the API documentation. But note that every docstring can be read from the notebook using shift-tab, or using for instance ds.plot?.

If you think a particular topic should be addressed here, please open an issue at github