Miso Dataset 0.3 Release
Posted by Irene Ros
This post was written in collaboration with Alex Graul, Miso Project‘s co-creator.
We are excited to release version 0.3.0 of Miso Dataset today that is full of new features. For the gory details, you can take a look at the closed issues, but this post will cover the major enhancements to the Dataset library.
Miso Dataset has been on quite a world tour! It has helped visualize the Australian Census, Bosnian Media, and explored Electoral College votes for the US election. Thanks to all your valuable feedback, we are making improvements all the time. We wanted to share a few major ones in this release.
Miso Dataset now supports:
- Computed columns
- User defined ID column
- A revamped
update
method, and - A much faster
sort
method
Computed Columns
Until now, if you wanted to add columns to your dataset that were somehow based on your existing set of columns, you had to manually create a column, compute the data and update the rows. This was both computationally expensive and somewhat cumbersome. In this release, we’ve added the ability to add a computed column – a column that is based off of the existing rows that also updates its values as data is added or updated.
Here is an example of creating a computed column:
computed.columns.js
var data = [
{ age : 23, weight : 140, height : 65 },
{ age : 40, weight : 290, height : 72 },
{ age : 13, weight : 110, height : 60 }
];
var healthData = new Miso.Dataset({
data: data
});
healthData.fetch().then(function() {
// Let's add a BMI Column
healthData.addComputedColumn("BMI", "number", function(row) {
return [row.weight / Math.pow(row.height, 2)] * 703;
});
console.log(healthData.column("BMI").data);
// => [23.294674556213018, 39.326774691358025, 21.480555555555554]
// If we add a row:
healthData.add({
age : 30,
weight : 180,
height: 68
});
// the computed column will add the correct value at the
// correct place:
console.log(healthData.column("BMI").data);
// => [23.294674556213018, 39.326774691358025, 21.480555555555554, 27.3659169550173]
// if we update a row:
var firstRow = healthData.rowByPosition(0);
healthData.update({
_id : firstRow._id,
weight: 120
});
// Our computed column will recompute the appropriate
// cell:
console.log(healthData.column("BMI").data);
// => [19.966863905325443, 39.326774691358025, 21.480555555555554, 27.3659169550173]
});
Custom ID Column
Prior to this release, when creating a Dataset, a custom column was created called _id
as a unique identifier for your rows. Most of the time however, our data already
contains unique identifiers that we would much rather use. Dataset has been updated to support this
functionality which you can enable by setting the idAttribute
property on dataset
creation. This also makes it much simpler to access a specific row by
its custom identifier. For example, if your dataset is using its ISO3 column as
IDs, you can now simply write dataset.rowById('AU').population
.
Here is an example:
idAttribute.js
var data = [
{ userId : 1, age : 23, weight : 140, height : 65 },
{ userId : 2, age : 40, weight : 290, height : 72 },
{ userId : 3, age : 13, weight : 110, height : 60 }
];
var healthData = new Miso.Dataset({
data: data,
idAttribute : 'userId'
});
healthData.fetch().then(function() {
// Will return our first row
// => {"userId":1,"age":23,"weight":140,"height":65}
console.log(JSON.Stringify(healthData.rowById(1)));
});
Improved API for ‘update’ Method
Our update
method was one of our trickier APIs to remember. Not only did it allow for updating
a single row, sets of rows or function-based updating, but each one of those updates required a slightly different
signature. In this release we are changing how the update function looks but keeping the functionality intact.
Here is an example of all the ways to update your dataset:
update.js
var data = [
{ userId : 1, age : 23, weight : 140, height : 65 },
{ userId : 2, age : 40, weight : 290, height : 72 },
{ userId : 3, age : 13, weight : 110, height : 60 }
];
var healthData = new Miso.Dataset({
data : data,
idAttribute : 'userId'
});
healthData.fetch().then(function() {
// let's update a single record:
healthData.update({
userId : 1,
age : 24
});
// The age is now 24:
// => {"userId":1,"age":24,"weight":140,"height":65}
console.log(JSON.stringify(healthData.rowById(1)));
// let's update two records
healthData.update([
{ userId : 1, age : 25, weight : 140, height : 65 },
{ userId : 2, age : 42, weight : 140, height : 65 }
]);
// Our age column should now look like this:
// => [25, 42, 13]
console.log(healthData.column("age").data);
// Let's update all the ages at once
healthData.update(function(row) {
row.age += 2;
return row;
});
// Our age column should now look like this:
// => [25, 42, 15]
console.log(healthData.column("age").data);
});
This makes it much easier to update a set of arbitary rows with individual changes in one go and only generate a single event.
‘sort’ Method Performane Improvements
Last but not least, we have rewritten our sort routine to increase its performance substantially.
You should now see an improvement of about 8x in your routines that utilize the sort
method.
Thank you all for the invaluable feedback and keep telling us what you want to see Miso Dataset do!
— Irene and Alex