Miso Dataset 0.3 Release

This post was written in collaboration with Alex Graul, Miso Project‘s co-creator.

We are excited to release version 0.3.0 of Miso Dataset today that is full of new features. For the gory details, you can take a look at the closed issues, but this post will cover the major enhancements to the Dataset library.

Miso Dataset has been on quite a world tour! It has helped visualize the Australian Census, Bosnian Media, and explored Electoral College votes for the US election. Thanks to all your valuable feedback, we are making improvements all the time. We wanted to share a few major ones in this release.

Miso Dataset now supports:

  • Computed columns
  • User defined ID column
  • A revamped update method, and
  • A much faster sort method

Computed Columns

Until now, if you wanted to add columns to your dataset that were somehow based on your existing set of columns, you had to manually create a column, compute the data and update the rows. This was both computationally expensive and somewhat cumbersome. In this release, we’ve added the ability to add a computed column – a column that is based off of the existing rows that also updates its values as data is added or updated.

Here is an example of creating a computed column:

computed.columns.js

var data = [
  { age : 23, weight : 140, height : 65 },
  { age : 40, weight : 290, height : 72 },
  { age : 13, weight : 110, height : 60 }
];

var healthData = new Miso.Dataset({
  data: data
});

healthData.fetch().then(function() {
  // Let's add a BMI Column
  healthData.addComputedColumn("BMI", "number", function(row) {
    return [row.weight / Math.pow(row.height, 2)] * 703;
  });

  console.log(healthData.column("BMI").data);
  // => [23.294674556213018, 39.326774691358025, 21.480555555555554]

  // If we add a row:
  healthData.add({
    age : 30,
    weight : 180,
    height: 68
  });

  // the computed column will add the correct value at the
  // correct place:
  console.log(healthData.column("BMI").data);
  // => [23.294674556213018, 39.326774691358025, 21.480555555555554, 27.3659169550173]

  // if we update a row:
  var firstRow = healthData.rowByPosition(0);
  healthData.update({
    _id : firstRow._id,
    weight: 120
  });

  // Our computed column will recompute the appropriate
  // cell:
  console.log(healthData.column("BMI").data);
  // => [19.966863905325443, 39.326774691358025, 21.480555555555554, 27.3659169550173]
});


Custom ID Column

Prior to this release, when creating a Dataset, a custom column was created called _id as a unique identifier for your rows. Most of the time however, our data already contains unique identifiers that we would much rather use. Dataset has been updated to support this functionality which you can enable by setting the idAttribute property on dataset creation. This also makes it much simpler to access a specific row by its custom identifier. For example, if your dataset is using its ISO3 column as IDs, you can now simply write dataset.rowById('AU').population.

Here is an example:

idAttribute.js

var data = [
  { userId : 1, age : 23, weight : 140, height : 65 },
  { userId : 2, age : 40, weight : 290, height : 72 },
  { userId : 3, age : 13, weight : 110, height : 60 }
];

var healthData = new Miso.Dataset({
  data: data,
  idAttribute : 'userId'
});

healthData.fetch().then(function() {

  // Will return our first row
  // => {"userId":1,"age":23,"weight":140,"height":65}
  console.log(JSON.Stringify(healthData.rowById(1)));

});

Improved API for ‘update’ Method

Our update method was one of our trickier APIs to remember. Not only did it allow for updating a single row, sets of rows or function-based updating, but each one of those updates required a slightly different signature. In this release we are changing how the update function looks but keeping the functionality intact.

Here is an example of all the ways to update your dataset:

update.js

var data = [
  { userId : 1, age : 23, weight : 140, height : 65 },
  { userId : 2, age : 40, weight : 290, height : 72 },
  { userId : 3, age : 13, weight : 110, height : 60 }
];

var healthData = new Miso.Dataset({
  data : data,
  idAttribute : 'userId'
});

healthData.fetch().then(function() {

  // let's update a single record:
  healthData.update({
    userId : 1,
    age : 24
  });

  // The age is now 24:
  // => {"userId":1,"age":24,"weight":140,"height":65}
  console.log(JSON.stringify(healthData.rowById(1)));

  // let's update two records
  healthData.update([
    { userId : 1, age : 25, weight : 140, height : 65 },
    { userId : 2, age : 42, weight : 140, height : 65 }
  ]);

  // Our age column should now look like this:
  // => [25, 42, 13]
  console.log(healthData.column("age").data);

  // Let's update all the ages at once
  healthData.update(function(row) {
    row.age += 2;
    return row;
  });

  // Our age column should now look like this:
  // => [25, 42, 15]
  console.log(healthData.column("age").data);

});

This makes it much easier to update a set of arbitary rows with individual changes in one go and only generate a single event.

‘sort’ Method Performane Improvements

Last but not least, we have rewritten our sort routine to increase its performance substantially. You should now see an improvement of about 8x in your routines that utilize the sort method.

Thank you all for the invaluable feedback and keep telling us what you want to see Miso Dataset do!

— Irene and Alex

Comments

Contact Us

We'd love to hear from you. Get in touch!

Phone

+1 617-283-2807

Mail

P.O. Box 961436
Boston, MA 02196