Distributed Computing Working Group
Goals and Purpose
The Distributed Computing Working Group will endorse the design of a common abstraction for distributed data structures in R. We aim to have at least one open-source implementation, as well as a SQL implementation, released within a year of forming the group.
- Michael Lawrence (Genentech)
- Indrajit Roy (HP Enterprise)
- Joe Rickert (Microsoft)
- Bernd Bischl (LMU)
- Matt Dowle (H2O)
- Mario Inchiosa (Microsoft)
- Michael Kane (Yale)
- Javier Luraschi (RStudio)
- Edward Ma (HP Enterprise)
- Luke Tierney (University of Iowa)
- Simon Urbanek (AT&T)
- Round table introduction
- (Michael) Goals for the group:
* Make a common abstraction/interfaces to make it easier to work with distributed data and R * Unify the interface * Working group will run for a year. Get an API defined, get at least one open source reference implementations * not everyone needs to work hands on. We will create smaller groups to focus on those aspects. * We tried to get a diverse group of participants
- Logistics: meet monthly, focus groups may meet more often
- R Consoritum may be able to figure ways to fund smaller projects that come out of the working group
- Michael Kane: Should we start with an inventory of what is available and people are using?
* Michael Lawrence: Yes, we should find the collection of tools as well as the use cases that are common. * Joe: I will figure out a wiki space.
- javier: Who are the end users? Simon: Common layer needed to get algorithms working. We started from algos and tried to find the minimal common api. One of the goals is to make sure everyone is on the same page and not trying to create his/her own custom interface.
- Javier: Should we try to get people with more algo expertise?
- Joe: Simon do you have a stack diagram?
- Simon: Can we get R Consortium to help write things up and draw things?
- Next meeting: Javier is going to present SparkR next time.