A glance at unified FHR/Telemetry

Lots is changing in Telemetry land. If you do occasionally run data analyses with our Spark infrastructure you might want to keep reading.

Background

The Telemetry and FHR collection systems on desktop are in the process of being unified. Both systems will be sending their data through a common data pipeline which has some features of both the current Telemetry pipeline as well the Cloud Services one that we use to ingest server logs.

The goals of the unification are to:

  • avoid measuring the same metric in multiple systems on the client side;
  • reduce the latency from the time a measurement occurs until it can be analyzed on the server;
  • increase the accuracy of measurements so that they can be better correlated with factors in the user environment such as the specific build, enabled add-ons, and other hardware or software characteristics;
  • use a common data pipeline for client telemetry and service log data.

The unified pipeline is currently sending data for Nightly, Aurora and Beta. Classic FHR and Telemetry pipelines are going to keep sending data to the very least until the new unified pipeline has not been fully validated. The plan is to land this feature in 40 Release. We’ll also continue to respect existing user preferences. If the user has opted out of FHR or Telemetry, we’ll continue to respect that for the equivalent data sets. Similarly, the opt-out and opt-in defaults will remain the same for equivalent data sets.

Data format

A Telemetry ping, stored as JSON object on the client, encapsulates the data sent to our backend. The main differences between the new unified Telemetry ping format (v4) and the classic Telemetry one (v2) are that:

  • multiple ping types are supported beyond the classic saved-session ping, like the main ping;
  • pings have a common top-level which contains basic information shared between types, like build-id and channel;
  • pings have an optional environment field which consists of data that is expected to be characteristic for performance and other behavior.

From an analysis point of view, the most important addition is the main ping which includes the very same histograms and other performance and diagnostic data as the v2 saved-session pings. Unlike in “classic” Telemetry though, there can be multiple main pings during a single session. A main ping is triggered by different scenarios, which are documented by the reason field:

  • aborted-session: periodically saved to disk and deleted at shutdown – if a previous aborted session ping is found at startup it gets sent to our backend;
  • environment-change: generated when the environment changes;
  • shutdown: triggered when the browser session ends;
  • daily: a session split triggered in 24h hour intervals at local midnight; this is needed to make sure we keep receiving data also from clients that have very long sessions.

Data access through Spark

Once you connect to a Spark enabled IPython notebook launched from our self-service dashboard, you will be prompted with a new tutorial based on the v4 dataset. The v4 data is fetched through the get_pings function by passing “v4” as the schema parameter. The following parameters are valid for the new data format:

  • app: an application name, e.g.: “Firefox”;
  • channel: a channel name, e.g.: “nightly”;
  • version: the application version, e.g.: “40.0a1”;
  • build_id: a build id or a range of build ids, e.g.:”20150601000000″ or (“20150601000000”, “20150610999999”)
  • submission_date: a submission date or a range of submission dates, e.g: “20150601” or (“20150601”, “20150610”)
  • doc_type: ping type, e.g: “main”, set to “saved_session” by default
  • fraction: the fraction of pings to return, set to 1.0 by default

Once you have a RDD, you can further filter the pings down by reason. There is also a new experimental API that returns the history of submissions for a subset of profiles, which can be used for longitudinal analyses.

Dashboards made simple with Spark and Plotly

In my previous about our new Spark infrastructure, I went into the details on how to launch a Spark cluster on AWS to perform custom analyses on Telemetry data. Sometimes though one has the need to rerun an analysis recurrently over a certain timeframe, usually to feed data into dashboards of various kinds. We are going to roll out a new feature that allows users to upload an IPython notebook to the self-serve data analysis dashboard and run it on a scheduled basis. The notebook will be executed periodically with the chosen frequency and the result will be made available as an updated IPython notebook.

To schedule a Spark job:

  1. Visit the analysis provisioning dashboard at telemetry-dash.mozilla.org and sign in using Persona with an @mozilla.com email address.
  2. Click “Schedule a Spark Job”.
  3. Enter some details:
    • The “Job Name” field should be a short descriptive name, like “chromehangs analysis”.
    • Upload your IPython notebook containt the analysis.
    • Set the number of workers of the cluster in the “Cluster Size” field.
    • Set a schedule frequency using the remaining fields.

Once a new scheduled job is created it will appear in the top listing of the scheduling dashboard. When the job is run its result will be made available as an IPython notebook visible by clicking on the “View Data” entry of your job.

As I briefly mentioned at the beginning, periodic jobs are typically used to feed data to dashboards. Writing dashboards for a custom job isn’t very pleasant and I wrote in the past some simple tool to help with that. It turns out though that thanks to IPython one doesn’t need necessarily to write a dashboard from scratch but can simple re-use the notebook as the dashboard itself! I mean, why not? That might not be good enough for management facing dashboards but acceptable for ones aimed at engineers.

In fact with IPython we are not limited at all to matplotlib’s static charts. Thanks to Plotly, it’s easy enough to generate interactive plots which allow to:

  • Check the x and y coordinates of every point on the plot by hovering with the cursor.
  • Zoom in on the plot and resize lines, points and axes by clicking and dragging the cursor over a region.
  • Pan by holding the shift key while clicking and dragging.
  • Zooms back out to the original version by double clicking on the plot.

Plotly comes with its own API but if you have already a matplotlib based chart then it’s trivial to convert it to an interactive plot. As a concrete example, I updated my Spark Hello World example with a plotly chart.

fig = plt.figure(figsize=(18, 7))
frame["WINNT"].plot(kind="hist", bins=50)
plt.title("startup distribution for Windows")
plt.ylabel("count")
plt.xlabel("log(firstPaint)")
py.iplot_mpl(fig, strip_style=True)

As you can see, just a single extra line of code is needed for the conversion.

startup_distribution_for_windows

As WordPress doesn’t support iframes, you are going to have to click on the image and follow the link to see the interactive plot in action.

Next-gen Data Analysis Framework for Telemetry

The easier it is to get answers, the more questions will be asked

In that spirit me and Mark Reid have been working for a while now on a new analysis infrastracture to make it as easy as possible for engineers to get answers to data related questions.

Our shiny new analysis infrastructure is based primarily on IPython and Spark. I blogged about Spark before, I even gave a short tutorial on it at our last workweek in Portland (slides and tutorial); IPython might be something you are not familiar with unless you have a background in science. In a nutshell it’s a browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media.

IPython
An IPython notebook in all its glory

The combination of IPython and Spark allows to write data analyses interactively from a browser and seemingly parallelize them over multiple machines thanks to a rich API with over 80 distributed operators! It’s a huge leap forward in terms of productivity compared to traditional batch oriented map-reduce frameworks. An IPython notebook contains both the code and the product of the execution of that code, like plots. Once executed, a notebook can simply be serialized and uploaded to Github. Then, thanks to nbviewer, it can be visualized and shared among colleagues.

In fact, the issue with sharing just the end product of an analysis is that it’s all too easy for bugs to creep in or to make wrong assumptions. If your end result is a plot, how do you test it? How do you know that what you are looking at does actually reflect the truth? Having the code side by side with its evaluation allows more people to  inspect it and streamlines the review process.

This is what you need to do to start your IPython backed Spark cluster with access to Telemetry data:

  1. Visit the analysis provisioning dashboard at telemetry-dash.mozilla.org and sign in using Persona with an @mozilla.com email address.
  2. Click “Launch an ad-hoc Spark cluster”.
  3. Enter some details:
    • The “Cluster Name” field should be a short descriptive name, like “chromehangs analysis”.
    • Set the number of workers for the cluster. Please keep in mind to use resources sparingly; use a single worker to write and debug your job.
    • Upload your SSH public key.
  4. Click “Submit”.
  5. A cluster will be launched on AWS preconfigured with Spark, IPython and some handy data analysis libraries like pandas and matplotlib.

Once the cluster is ready, you can tunnel IPython through SSH by following the instructions on the dashboard, e.g.:

ssh -i my-private-key -L 8888:localhost:8888 hadoop@ec2-54-70-129-221.us-west-2.compute.amazonaws.com

Finally, you can launch IPython in Firefox by visiting http://localhost:8888.

Now what? Glad you asked. In your notebook listing you will see a Hello World notebook. It’s a very simple analysis that produces the distribution of startup times faceted by operating system for a small fraction of Telemetry submissions; let’s quickly review it here.

We start by importing a telemetry utility to fetch pings and some commonly needed libraries for analysis: a json parser, pandas and matplotlib.

import ujson as json
import matplotlib.pyplot as plt
import pandas as pd
from moztelemetry.spark import get_pings

To execute a block of code in IPython, aka cell, press Shift-Enter. While a cell is being executed, a gray circle will appear in the upper right border of the notebook. When the circle is full, your code is being executed by the IPython kernel; when only the borders of the circle are visible then the kernel is idle and waiting for commands.

Spark exploits parallelism across all cores of your cluster. To see the degree of parallelism you have at your disposal simply yield:


sc.defaultParallelism

Now, let’s fetch a set of telemetry submissions and load it in a RDD using the get_pings utility function from the moztelemetry library:

pings = get_pings(sc,
                  appName="Firefox",
                  channel="nightly",
                  version="*",
                  buildid="*",
                  submission_date="20141208",
                  fraction=0.1)

That’s pretty much self documenting. The fraction parameter, which defaults to 1, selects a random subset of the selected submissions. This comes in handy when you first write your analysis and don’t need to load lots of data to test and debug it.

Note that both the buildid and submission_date parameters accept also a tuple specifying, inclusively, a range of dates, e.g.:


pings = get_pings(sc,
                  appName="Firefox",
                  channel="nightly",
                  version="*",
                  buildid=("20141201", "20141202"),
                  submission_date=("20141202", ""20141208"))

Let’s do something with those pings. Since we are interested in the distribution of the startup time of Firefox faceted by operating system, let’s extract the needed fields from our submissions:

def extract(ping):
    ping = json.loads(ping)
    os = ping["info"]["OS"]
    startup = ping["simpleMeasurements"].get("firstPaint", -1)
    return (os, startup)

cached = pings.map(lambda ping: extract(ping)).filter(lambda p: p[1] > 0).cache()

As the Python API matches closely the one used from Scala, I suggest to have a look at my older Spark tutorial if you are not familiar with Spark. Another good resource are the hands-on exercises from AMP Camp 4.

Now, let’s collect the results back and stuff it into a pandas DataFrame. This is a very common pattern, once you reduce your dataset to a manageable size with Spark you collect it back on your driver (aka the master machine) and finalize your analysis with statistical tests, plots and whatnot.

grouped = cached.groupByKey().collectAsMap()

frame = pd.DataFrame({x: log(pd.Series(list(y))) for x, y in grouped.items()})
frame.boxplot()
plt.ylabel("log(firstPaint)")
plt.show()

startup_hello
Startup distribution by OS

Finally, you can save the notebook, upload it to Github or Bugzilla and visualize it on nbviewer, it’s that simple. Here is the nbviewer powered Hello World notebook. I warmly suggest that you open a bug report on Bugzilla for your custom Telemetry analysis and ask me or Vladan Djeric to review it. Mozilla has been doing code reviews for years and with good reasons, why should data analyses be different?

Congrats, you just completed your first Spark analysis with IPython! If you need any help with your custom job feel free to drop me a line in #telemetry.

A/B test for Telemetry histograms

A/B tests are a simple way to determine the effect caused by a change in a software product against a baseline, i.e. version A against version B. An A/B test is essentially an experiment that indiscriminately assigns a control or experiment condition to each user. It’s an extremely effective method to ascertain causality which is hard, at best, to infer with statistical methods alone. Telemetry comes with its own A/B test implementation, Telemetry Experiments.

Depending on the type of data collected and the question asked, different statistical techniques are used to verify if there is a difference between the experiment and control version:

  1. Does the rate of success of X differ between the two versions?
  2. Does the average value of  Y differ between the two versions?
  3. Does the average time to event Z differ between the two versions?

Those are just the most commonly used methods.

The frequentist statistical hypothesis testing framework is based on a conceptually simple idea: assuming that we live in a world where a certain baseline hypothesis (null hypothesis) is valid, what’s the probability of obtaining the results we observed? If the probability is very low, i.e. under a certain threshold, we gain confidence that the effect we are seeing is genuine.

To give you a concrete example, say I have reason to believe that the average battery duration of my new phone is 5 hours but the manufacturer claims it’s 5.5 hours. If we assume the average battery has indeed a duration of 5.5 hours (null hypothesis), what’s the probability of measuring an average duration that is 30 minutes lower? If the probability is small enough, say under 5%, we “reject” the null hypothesis. Note that there are many things that can go wrong with this framework and one has to be careful in interpreting the results.

Telemetry histograms are a different beast though. Each user submits its own histogram for a certain metric, the histograms are then aggregated across all users for version A and version B. How do you determine if there is a real difference or if what you are looking at is just due to noise? A chi-squared test would seem the most natural choice but on second thought its assumptions are not met as entries in the aggregated histograms are not independent from each other. Luckily we can avoid to sit down and come up with a new mathematically sound statistical test. Meet the permutation test.

Say you have a sample of metric M for users of version A and a sample of metric M for users of version B. You measure a difference of d between the means of the samples. Now you assume there is no difference between A and B and randomly shuffle entries between the two samples and compute again the difference of the means. You do this again, and again, and again… What you end up with is a distribution D of the differences of the means for the all the reshuffled samples. Now, you compute the probability of getting the original difference d, or a more extreme value, by chance and welcome our newborn hypothesis test!

Going back to our original problem of comparing aggregated histograms for the experiment and control group, instead of having means we have aggregated histograms and instead of computing the difference we are considering the distance; everything else remains the same as in the previous example:


def mc_permutation_test(xs, ys, num):
    n, k = len(xs), 0
    h1 = xs.sum()
    h2 = ys.sum()

    diff = histogram_distance(h1, h2)
    zs = pd.concat([xs, ys])
    zs.index = np.arange(0, len(zs))

    for j in range(num):
        zs = zs.reindex(np.random.permutation(zs.index))
        h1 = zs[:n].sum()
        h2 = zs[n:].sum()
        k += diff < histogram_distance(h1, h2)

    return k / num

Most statistical tests were created in a time where there were no [fast] computers around, but nowadays churning a Monte-Carlo permutation test is not a big deal and one can easily run such a test in a reasonable time.

Recommending Firefox add-ons with Spark

We are currently evaluating possible replacements for our Telemetry map-reduce infrastructure. As our current data munging machinery isn’t distributed, analyzing days worth of data can be quite a pain. Also, many algorithms can’t easily be expressed with a simple map/reduce interface.

So I decided to give Spark another try. “Another” because I have played with it in the past but I didn’t feel it was mature enough to be run in production. And I wasn’t the only one to think that apparently. I feel like things have changed though with the latest 1.1 release and I want to share my joy with you.

What is Spark?

In a nutshell, “Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.”

Spark primary abstraction is the Resilient Distributed Dataset (RDD), which one can imagine as a distributed pandas or R data frame. The RDD API comes with all kinds of distributed operations, among which also our dear map and reduce. Many RDD operations accept user-defined Scala or Python functions as input which allow average Joe to write distributed applications like a pro.

A RDD can also be converted to a local Scala/Python data structure, assuming the dataset is small enough to fit in memory. The idea is that once you chopped the data you are not interested in, what you are left with fits comfortably on a single machine. Oh and did I mention that you can issue Spark queries directly from a Scala REPL? That’s great for performing exploratory data analyses.

The greatest strength of Spark though is the ability to cache RDDs in memory. This allows you to run iterative algorithms up to 100x faster than using the typical Hadoop based map-reduce framework! It has to be remarked though that this feature is purely optional. Spark works flawlessly without caching, albeit slower. In fact in a recent benchmark Spark was able to sort 1PB of data 3X faster using 10X fewer machines than Hadoop, without using the in-memory cache.

Setup

A Spark cluster can run in standalone mode or on top of YARN or Mesos. To the very least for a cluster you will need some sort of distributed filesystem, e.g. HDFS or NFS. But the easiest way to play with it though is to run Spark locally, i.e. on OSX:


brew install spark
spark-shell --master "local[*]"

The above commands start a Scala shell with a local Spark context. If you are more inclined to run a real cluster, the easiest way to get you going is to launch an EMR cluster on AWS:


aws emr create-cluster --name SparkCluster --ami-version 3.3 --instance-type m3.xlarge \
  --instance-count 5 --ec2-attributes KeyName=vitillo --applications Name=Hive \
  --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark

Then, once connected to the master node, launch Spark on YARN:


yarn-client /home/hadoop/spark/bin/spark-shell --num-executors 4 --executor-cores 8 \
  --executor-memory 8g —driver-memory 8g

The parameters of the executors (aka worker nodes) should obviously be tailored to the kind of instances you launched. It’s imperative to spend some time understanding and tuning the configuration options as Spark doesn’t automagically do it for you.

Now what?

Time for some real code. Since Spark makes it so easy to write distributed analyses, the bar for a Hello World application should be consequently be much higher. Let’s write then a simple, albeit functional, Recommender Engine for Firefox add-ons.

In order to do that, let’s first go over quickly the math involved. It turns out that given a matrix of the rankings of each user for each add-on, the problem of finding a good recommendation can be reduced to matrix factorization problem:

factorization

The model maps both users and add-ons to a joint latent factor space of dimensionality F. Both users and add-ons are thus seen as vectors in that space. The factors express latent characteristics of add-ons, e.g. if a an add-on is related to security or to UI customization. The ratings are then modeled as inner products in that space, which is proportional to the angle of the two vectors. The closer the characteristics of an add-on align to the preferences of the user in the latent factor space, the higher the rating.

But wait, Firefox users don’t really rate add-ons. In fact the only information we have in Telemetry is binary: either a user has a certain add-on installed or he hasn’t. Let’s assume that if someone has a certain add-on installed, he probably likes that add-on. That’s not true in all cases and a more significant metric like “usage time” or similar should be used.

I am not going to delve into the details, but having binary ratings changes the underlying model slightly from the conceptual one we have just seen. The interested reader should read this paper. Mllib, a machine learning library for Spark, comes out of the box with a distributed implementation of ALS which implements the factorization.

Implementation

Now that we have an idea of the theory, let’s have a look at how the implementation looks like in practice. Let’s start by initializing Spark:


val conf = new SparkConf().setAppName("AddonRecommender")
val sc = new SparkContext(conf)

As the ALS algorithm requires tuples of (user, addon, rating), let’s munge the data into place:


val ratings = sc.textFile("s3://mreid-test-src/split/").map(raw => {
  val parsedPing = parse(raw.substring(37))
  (parsedPing \ "clientID", parsedPing \ "addonDetails" \ "XPI")
}).filter{
  // Remove sessions with missing id or add-on list
  case (JNothing, _) => false
  case (_, JNothing) => false
  case (_, JObject(List())) => false
  case _ => true
}.map{ case (id, xpi) => {
  val addonList = xpi.children.
    map(addon => addon \ "name").
    filter(addon => addon != JNothing && addon != JString("Default"))
  (id, addonList)
}}.filter{ case (id, addonList) => {
  // Remove sessions with empty add-on lists
  !addonList.isEmpty
}}.flatMap{ case (id, addonList) => {
  // Create add-on ratings for each user
  addonList.map(addon => (id.extract[String], addon.extract[String], 1.0))
}}

Here we extract the add-on related data from our json Telemetry pings and filter out missing or invalid data. The ratings variable is a RDD and as you can see we used the distributed map, filter and flatMap operations on it. In fact it’s hard to tell apart vanilla Scala code from the distributed one.

As the current ALS implementation doesn’t accept strings for the user and add-on representations, we will have to convert them to numeric ones. A quick and dirty way of doing that is to hash the strings:


// Positive hash function
def hash(x: String) = x.hashCode & 0x7FFFFF

val hashedRatings = ratings.map{ case(u, a, r) => (hash(u), hash(a), r) }.cache
val addonIDs = ratings.map(_._2).distinct.map(addon => (hash(addon), addon)).cache

We are nearly there. To avoid overfitting, ALS uses regularization, the strength of which is determined by a parameter \lambda . As we don’t know beforehand the optimal value of the parameter, we can try to find it by minimizing the mean squared error over a pre-defined grid of \lambda values using k-fold cross-validation.


// Use cross validation to find the optimal number of latent factors
val folds = MLUtils.kFold(hashedRatings, 10, 42)
val lambdas = List(0.1, 0.2, 0.3, 0.4, 0.5)
val iterations = 10
val factors = 100 // use as many factors as computationally possible

val factorErrors = lambdas.flatMap(lambda => {
  folds.map{ case(train, test) =>
    val model = ALS.trainImplicit(train.map{ case(u, a, r) => Rating(u, a, r) }, factors, iterations, lambda, 1.0)
    val usersAddons = test.map{ case (u, a, r) => (u, a) }
    val predictions = model.predict(usersAddons).map{ case Rating(u, a, r) => ((u, a), r) }
    val ratesAndPreds = test.map{ case (u, a, r) => ((u, a), r) }.join(predictions)
    val rmse = sqrt(ratesAndPreds.map { case ((u, a), (r1, r2)) =>
      val err = (r1 - r2)
      err * err
    }.mean)

    (model, lambda, rmse)
  }
}).groupBy(_._2)
  .map{ case(k, v) => (k, v.map(_._3).reduce(_ + _) / v.length) }

Finally, it’s just a matter of training ALS on the whole dataset with the optimal \lambda value and we are good to go to use the recommender:


// Train model with optimal number of factors on all available data
val model = ALS.trainImplicit(hashedRatings.map{case(u, a, r) => Rating(u, a, r)}, factors, iterations, optimalLambda._1, 1.0)

def recommend(userID: Int) = {
  val predictions = model.predict(addonIDs.map(addonID => (userID, addonID._1)))
  val top = predictions.top(10)(Ordering.by[Rating,Double](_.rating))
  top.map(r => (addonIDs.lookup(r.product)(0), r.rating))
}

recommend(hash("UUID..."))

I omitted some details but you can find the complete source on my github repository.

To submit the packaged job to YARN run:


spark-submit --class AddonRecommender --master yarn-client --num-executors 4 \
  --executor-cores 8 --executor-memory 8g addon-recommender_2.10-1.0.jar

So what?

Question is, how well does it perform? The mean squared error isn’t really telling us much so let’s take some fictional user session and see what the recommender spits out.

For user A that has only the add-on Ghostery installed, the top recommendations are, in order:

  • NoScript
  • Web of Trust
  • Symantec Vulnerability Protection
  • Better Privacy
  • LastPass
  • DuckDuckGo Plus
  • HTTPS-Everywhere
  • Lightbeam
  • Google Translator for Firefox

One could argue that 1 out of 10 recommendations isn’t appropriate for a security aficionado. Now it’s the turn of user B who has only the Firebug add-on installed:

  • Web Developer
  • FiddlerHook
  • Greasemonkey
  • ColorZilla
  • User Agent Switcher
  • McAfee
  • RealPlayer Browser Record Plugin
  • FirePHP
  • Session Manager

There are just a couple of add-ons that don’t look that great but the rest could fit the profile of a developer. Now, considering that the recommender was trained only on a couple of days of data for Nightly, I feel like the result could easily be improved with more data and tuning, like filtering out known Antivirus, malware and bloatware.

Dashboard generator for custom Telemetry jobs

So you wrote your custom analysis for Telemetry, your map-reduce job is finally giving you the desired data and you want to set it up so that it runs periodically. You will need some sort of dashboard to monitor the weekly runs but since you don’t really care how it’s done what do you do? You copy paste the code of one of our current dashboards, a little tweak here and there and off you go.

That basically describes all of the recent dashboards, like the one for main-thread IO (mea culpa). Writing dashboards is painful when the only thing you care about is data. Once you finally have what you were looking for, the way you present is often considered an afterthought at best. But maintaining N dashboards becomes quickly unpleasant.

But what makes writing and maintaining dashboards so painful exactly? It’s simply that the more controls you have, the more different kind events you have to handle and the easier things get out of hand quickly. You start with something small and beautiful that just displays some csv and presto you end up with what should have been properly described as a state machine but instead is a mess of intertwined event handlers.

What I was looking for was something on the line of Shiny for R, but in javascript and with the option to have a client-only based interface. It turns out that React does more or less what I want. It’s not necessary meant for data analysis so there aren’t any plotting facilities but everything is there to roll your own. What makes exactly Shiny and React so useful is that they embrace reactive programming. Once you define a state and a set of dependencies, i.e. a data flow graph in practical terms, changes that affect the state end up being automatically propagated to the right components. Even though this can be seen as overkill for small dashboards, it makes it extremely easy to extend them when the set of possible states expands, which is almost always what happens.

To make things easier for developers I wrote a dashboard generator, iacumus, for use-cases similar to the ones we currently have. It can be used in simple scenarios when:

  • the data is collected in csv files on a weekly basis, usually using build-ids;
  • the dashboard should compare the current week against the previous one and mark differences in rankings;
  • it should be possible to go back back and forward in time;
  • the dashboard should provide some filtering and sorting criterias.

Iacumus is customizable through a configuration file that is specified through a GET parameter. Since it’s hosted on github, it means you just have to provide the data and don’t even have to spend time deploying the dashboard somewhere, assuming the machine serving the configuration file supports CORS.

My next immediate goal is to simplify writing map-reduce jobs for the above mentioned use cases or to the very least write down some guidelines. For instance, some of our dashboards are based on Firefox’s version numbers and not on build-ids, which is really what you want when you desire to make comparisons of Nightly on a weekly basis.

Another interesting thought would be to automatically detect differences in the dashboards and send alerts. That might be not as easy with the current data, since a quick look at the dashboards makes it clear that the rankings fluctuate quite a bit. We would have to collect daily reports and account for the variance of the ranking in those as just using a few weekly datapoints is not reliable enough to account for the deviation.