Datasets to download

Here we list a few datasets, that might be interesting to explore with vaex

New york taxi dataset

See for instance Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance for some ideas.

In [12]:
import vaex
ds = vaex.open("/Users/users/breddels/.vaex/data/nyc_taxi/nyc_taxi2015.hdf5")
ds.plot(ds.col.pickup_longitude, ds.col.pickup_latitude, f="log1p", show=True, limits="96%");
_images/datasets_2_0.png

SDSS - dereddened

Only: ra, dec, g, r, g_r (deredenned using Schlegel maps).

The original query at SDSS archive was (although split in small parts):

SELECT ra, dec, g, r from PhotoObjAll WHERE type = 6 and  clean = 1 and r>=10.0 and r<23.5;
In [22]:
sdss = vaex.open("/Users/maartenbreddels/vaex/data/sdss/sdss_dereddened.hdf5")
sdss.healpix_plot(sdss.col.healpix, show=True, f="log", healpix_max_level=9, healpix_level=9,
                healpix_input='galactic', healpix_output='galactic', rotation=(0,45)
               )
/Users/maartenbreddels/vaex/src/vaex/vaex/dataset.py:2071: RuntimeWarning: divide by zero encountered in log
  fgrid = f(grid)
/Users/maartenbreddels/anaconda3/lib/python3.5/site-packages/numpy/core/numeric.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  a = empty(shape, dtype, order)
_images/datasets_4_1.png

Gaia

See the Gaia Science Homepage for details, and you may want to try the Gaia Archive for ADQL (SQL like) queries.

In [9]:
gaia = vaex.open("/Users/users/breddels/gaia/gaia-dr1.hdf5")
gaia.plot("ra", "dec", f="log", limits=[[360, 0], [-90, 90]], show=True);
_images/datasets_7_0.png

Helmi & de Zeeuw 2000

Result of an N-body simulation of the accretion of 33 satellite galaxies into a Milky Way dark matter halo * 3 million rows - 252MB

In [26]:
hdz = vaex.datasets.helmi_de_zeeuw.fetch() # this will download it on the fly
hdz.plot([["x", "y"], ["Lz", "E"]], f="log", figsize=(12,5), show=True);
_images/datasets_9_0.png