<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="https://wiki.r-consortium.org/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>https://wiki.r-consortium.org/index.php?feed=atom&amp;namespace=0&amp;title=Special%3ANewPages</id>
		<title>R Consortium Wiki - New pages [en]</title>
		<link rel="self" type="application/atom+xml" href="https://wiki.r-consortium.org/index.php?feed=atom&amp;namespace=0&amp;title=Special%3ANewPages"/>
		<link rel="alternate" type="text/html" href="https://wiki.r-consortium.org/view/Special:NewPages"/>
		<updated>2026-05-11T05:02:27Z</updated>
		<subtitle>From R Consortium Wiki</subtitle>
		<generator>MediaWiki 1.23.15</generator>

	<entry>
		<id>https://wiki.r-consortium.org/view/Top_Level_Projects</id>
		<title>Top Level Projects</title>
		<link rel="alternate" type="text/html" href="https://wiki.r-consortium.org/view/Top_Level_Projects"/>
				<updated>2017-11-28T17:27:24Z</updated>
		
		<summary type="html">&lt;p&gt;Jmertic: Created page with &amp;quot;== Top-level Projects == ''Approved by R Consortium TSC on 2017-10-18''  This document outlines a proposed process by which an ISC project might graduate to top-level project,...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Top-level Projects ==&lt;br /&gt;
''Approved by R Consortium TSC on 2017-10-18''&lt;br /&gt;
&lt;br /&gt;
This document outlines a proposed process by which an ISC project might graduate to top-level project, as well as the process by which such a project may be terminated. A top-level project implies long-term support (with 3 year review) by the R Consortium for the project, regardless of the person running it. &lt;br /&gt;
=== Context ===&lt;br /&gt;
We currently have three top-level projecst: &lt;br /&gt;
* R-hub, by Gabor Csardi.&lt;br /&gt;
* RUGS program (incl. small conferences)&lt;br /&gt;
* R-Ladies&lt;br /&gt;
Top-level projects imply long term support, and give the project a seat on the ISC.&lt;br /&gt;
=== Project Promotion=== &lt;br /&gt;
Generally, we will consider the following factors when deciding if a project should become a top-level project:&lt;br /&gt;
# The project is important, and is having a significant impact on the R community.&lt;br /&gt;
# The project has completed one year of successful funding, and delivered their first annual report.&lt;br /&gt;
# Commitment (to some extent) independently of the personnel on initial project.&lt;br /&gt;
Project would be nominated by ISC member, and confirmed by a simple majority vote. Then ISC chair would reach out to project and discuss budget etc&lt;br /&gt;
=== Budget=== &lt;br /&gt;
The project would prepare a rough 3 year plan, including discussion of personnel (i.e. either a commitment from the original grantee or a transition plan). &lt;br /&gt;
Top level project would be allotted a line item on the ISC budget to ensure priority funding.  &lt;br /&gt;
Upon graduating, any remaining funding from initial grant will return to ISC.&lt;br /&gt;
=== Reporting=== &lt;br /&gt;
The project would then be expected to have a status report at every regular meeting of the ISC to be presented by their ISC representative.&lt;br /&gt;
&lt;br /&gt;
In lieu of regular project proposals, top-level projects would submit a proposed yearly budget by October 31 (in order to get budgeted for following financial year). This would serve as a regular review point for top-level projects, and would occur in a separate meeting to the other project proposals, with the ISC member associated with the project recusing themselves. &lt;br /&gt;
=== Concluding support=== &lt;br /&gt;
Top-level projects will be reviewed every three years. An explicit positive vote would be required to continue funding.&lt;br /&gt;
&lt;br /&gt;
In exceptional circumstances, a top-level project may be terminated at any time by a simple majority vote of the ISC.&lt;br /&gt;
&lt;br /&gt;
A top-level project will typically not be terminated if the grantee resigns, provided that succession plan is in place.&lt;/div&gt;</summary>
		<author><name>Jmertic</name></author>	</entry>

	<entry>
		<id>https://wiki.r-consortium.org/view/Distributed_Computing_Working_Group_Progress_Report_2016</id>
		<title>Distributed Computing Working Group Progress Report 2016</title>
		<link rel="alternate" type="text/html" href="https://wiki.r-consortium.org/view/Distributed_Computing_Working_Group_Progress_Report_2016"/>
				<updated>2017-05-01T18:43:10Z</updated>
		
		<summary type="html">&lt;p&gt;MichaelLawrence: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Authors: Michael Lawrence and Indrajit Roy&lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Data sizes continue to increase, while single core performance has&lt;br /&gt;
stagnated. We scale computations by leveraging multiple cores and&lt;br /&gt;
machines. Large datasets are expensive to replicate, so we minimize&lt;br /&gt;
data movement by moving the computation to the data. Many systems,&lt;br /&gt;
such as Hadoop, Spark, and massively parallel processing (MPP)&lt;br /&gt;
databases, have emerged to support these strategies, and each exposes&lt;br /&gt;
its own unique interface, with little standardization. &lt;br /&gt;
&lt;br /&gt;
Developing and executing an algorithm in the distributed context is a&lt;br /&gt;
complex task that requires specific knowledge of and dependency on the&lt;br /&gt;
system storing the data. It is also a task orthogonal to the primary&lt;br /&gt;
role of a data scientist or statistician: extracting knowledge from&lt;br /&gt;
data. The task thus falls to the data analysis environment, which&lt;br /&gt;
should mask the complexity behind a familiar interface, maintaining&lt;br /&gt;
user productivity. However, it is not always feasible to automatically&lt;br /&gt;
determine the optimal strategy for a given problem, so user input is&lt;br /&gt;
often beneficial. The environment should only abstract the details to&lt;br /&gt;
the extent deemed appropriate by the user.&lt;br /&gt;
&lt;br /&gt;
R needs a standardized, layered and idiomatic abstraction for&lt;br /&gt;
computing on distributed data structures. R has many packages that&lt;br /&gt;
provide parallelism constructs as well as bridges to distributed&lt;br /&gt;
systems such as Hadoop. Unfortunately, each interface has its own&lt;br /&gt;
syntax, parallelism techniques, and supported platform(s).  As a&lt;br /&gt;
consequence, contributors are forced to learn multiple idiosyncratic&lt;br /&gt;
interfaces, and to restrict each implementation to a particular&lt;br /&gt;
interface, thus limiting the applicability and adoption of their&lt;br /&gt;
software and hampering interoperability.&lt;br /&gt;
&lt;br /&gt;
The idea of a unified interface stemmed from a cross-industry workshop&lt;br /&gt;
organized at HP Labs in early 2015. The workshop was attended by&lt;br /&gt;
different companies, universities, and R-core members. Immediately&lt;br /&gt;
after the workshop, Indrajit Roy, Edward Ma, and Michael Lawrence began&lt;br /&gt;
designing an abstraction that later became known as the CRAN package&lt;br /&gt;
ddR (Distributed Data in R)[1]. It declares a unified API for distributed&lt;br /&gt;
computing in R and ensures that R programs written using the API are&lt;br /&gt;
portable across different systems, such as Distributed R, Spark, etc.&lt;br /&gt;
&lt;br /&gt;
The ddR package has completed its initial phase of development; the&lt;br /&gt;
first release is now on CRAN. Three ddR machine-learning algorithms&lt;br /&gt;
are also on CRAN, randomForest.ddR, glm.ddR, and kmeans.ddR. Two&lt;br /&gt;
reference backends for ddR have been completed, one for R’s parallel&lt;br /&gt;
package, and one for HP Distributed R. Example code and scripts to run&lt;br /&gt;
algorithms and code on both of these backends are available in our&lt;br /&gt;
public repository at https://github.com/vertica/ddR.&lt;br /&gt;
&lt;br /&gt;
The overarching goal of the ddR project was for it to be a starting&lt;br /&gt;
point in a collaborative effort, ultimately leading to a standard API&lt;br /&gt;
for working with distributed data in R.  We decided that it was&lt;br /&gt;
natural for the R Consortium to sponsor the collaboration, as it&lt;br /&gt;
should involve both industry and R-core members. To this end, we&lt;br /&gt;
established the R Consortium Working Group on Distributed Computing,&lt;br /&gt;
with a planned duration of a single year and the following aims:&lt;br /&gt;
&lt;br /&gt;
# Agree on the goal of the group, i.e., we should have a unifying framework for distributed computing. Define success metric.&lt;br /&gt;
# Brainstorm on what primitives should be included in the API. We can use ddR’s API of distributed data-structures and dmapply as the starting proposal. Understand relationship with existing packages such as parallel, foreach, etc.&lt;br /&gt;
# Explore how ddR like interface will interact with databases. Are there connections or redundancies with dplyr and multiplyr?&lt;br /&gt;
# Decide on a reference implementation for the API.&lt;br /&gt;
# Decide on whether we should also implement a few ecosystem packages, e.g., distributed algorithms written using the API.&lt;br /&gt;
&lt;br /&gt;
We declared the following milestones:&lt;br /&gt;
&lt;br /&gt;
# Mid-year milestone: Finalize API. Decide who all will help with developing the top-level implementation and backends.&lt;br /&gt;
# End-year milestone: Summary report and reference implementation. Socialize the final package.&lt;br /&gt;
&lt;br /&gt;
This report outlines the progress we have made on the above goals and&lt;br /&gt;
milestones, and how we plan to continue progress in the second half of&lt;br /&gt;
the working group term.&lt;br /&gt;
&lt;br /&gt;
== Results and Current Status ==&lt;br /&gt;
&lt;br /&gt;
The working group has achieved the first goal by agreeing that we&lt;br /&gt;
should aim for a unifying distributed computing abstraction, and we&lt;br /&gt;
have treated ddR as an informal API proposal.&lt;br /&gt;
&lt;br /&gt;
We have discussed many of the issues related to the second goal,&lt;br /&gt;
deciding which primitives should be part of the API.  We aim for the&lt;br /&gt;
API to support three shapes of data --- lists, arrays and data frames&lt;br /&gt;
--- and to enable the loading and basic manipulation of distributed&lt;br /&gt;
data, including multiple modes of functional iteration (e.g., apply()&lt;br /&gt;
operations). We aim to preserve consistency with base R data&lt;br /&gt;
structures and functions, so as to provide a simple path for users to&lt;br /&gt;
port computations to distributed systems.&lt;br /&gt;
&lt;br /&gt;
The ddR constructs permit a user to express a wide variety of&lt;br /&gt;
applications, including machine-learning algorithms, that will run on&lt;br /&gt;
different backends. We have successfully implemented distributed&lt;br /&gt;
versions of algorithms such as K-means, Regression, Random Forest, and&lt;br /&gt;
PageRank using the ddR API. Some of these ddR algorithms are now&lt;br /&gt;
available on CRAN.  In addition, the package provides several generic&lt;br /&gt;
definitions of common operators (such as colSums) that can be invoked&lt;br /&gt;
on distributed objects residing in the supporting backends.&lt;br /&gt;
&lt;br /&gt;
Each custom ddR backend is encapsulated in its own driver package. In&lt;br /&gt;
the conventional style of functional OOP, the driver registers methods&lt;br /&gt;
for generics declared by the backend API, such that ddR can dispatch&lt;br /&gt;
the backend-specific instructions by only calling the generics.&lt;br /&gt;
&lt;br /&gt;
The working group explored potential new backends with the aim of&lt;br /&gt;
broadening the applicability of the ddR interface. We hosted&lt;br /&gt;
presentations from external speakers on Spark and TensorFlow, and also&lt;br /&gt;
considered a generic SQL backend. The discussion focused on Spark&lt;br /&gt;
integration, and the R Consortium-funded intern Clark Fitzgerald took&lt;br /&gt;
on the task of developing a prototype Spark backend. The development&lt;br /&gt;
of the Spark backend encountered some obstacles, including the&lt;br /&gt;
immaturity of Spark and its R interfaces. Development is currently&lt;br /&gt;
paused, as we await additional funding.&lt;br /&gt;
&lt;br /&gt;
During the monthly meetings, the working group deliberated on&lt;br /&gt;
different design improvements for ddR itself. We list two key topics&lt;br /&gt;
that were discussed.  First, Michael Kane and Bryan Lewis argued for a&lt;br /&gt;
lower level API that directly operates on chunks of data. While ddR&lt;br /&gt;
supports chunk-wise data processing, via a combination of dmapply()&lt;br /&gt;
and parts(), its focus on distributed data structures means that&lt;br /&gt;
the chunk-based processing is exposed as the manipulation of these&lt;br /&gt;
data structures. Second, Clark Fitzgerald proposed restructuring the&lt;br /&gt;
ddR code into two layers that includes chunk-wise processing while&lt;br /&gt;
retaining the emphasis on distributed data structures[2]. The lower&lt;br /&gt;
level API, which will interface with backends, will use a Map() like&lt;br /&gt;
primitive to evaluate functions on chunks of data, while the higher&lt;br /&gt;
level ddR API will expose distributed data structures, dmapply, and&lt;br /&gt;
other convenience functions. This refactoring would facilitate the&lt;br /&gt;
implementation of additional backends.&lt;br /&gt;
&lt;br /&gt;
== Discussion and Future Plans ==&lt;br /&gt;
&lt;br /&gt;
The R Consortium-funded working group and internship has helped us&lt;br /&gt;
start a conversation on distributed computing APIs for R.  The ddR&lt;br /&gt;
CRAN package is a concrete outcome of this working group, and serves&lt;br /&gt;
as a platform for exploring APIs and their integration with different&lt;br /&gt;
backends. While ddR is still maturing, we have arrived at a consensus&lt;br /&gt;
for how we should improve and finalize the ddR API.&lt;br /&gt;
&lt;br /&gt;
As part of our goal for a reference implementation, we aim to develop&lt;br /&gt;
one or more prototype backends that will make the ddR interface useful&lt;br /&gt;
in practice. A good candidate backend is any open-source system that&lt;br /&gt;
is effective at R use cases and has strong community support. Spark&lt;br /&gt;
remains a viable candidate, and we also aim to further explore&lt;br /&gt;
TensorFlow.&lt;br /&gt;
&lt;br /&gt;
We plan for a second intern to perform three tasks: (1) refactor the&lt;br /&gt;
ddR API to a more final form, (2) compare Spark and TensorFlow in&lt;br /&gt;
detail, with an eye towards the feasibility of implementing a useful&lt;br /&gt;
backend, and (3) implement a prototype backend based on Spark or&lt;br /&gt;
TensorFlow, depending on the recommendation of the working group.&lt;br /&gt;
&lt;br /&gt;
By the conclusion of the working group, it will have produced:&lt;br /&gt;
* A stable version of the ddR package and at least one practical backend, released on CRAN,&lt;br /&gt;
* A list of requirements that are relevant and of interest to the community but have not yet been met by ddR, including alternative implementations that remain independent,&lt;br /&gt;
* A list of topics that the group believes worthy of further investigation.&lt;br /&gt;
&lt;br /&gt;
[1] http://h30507.www3.hp.com/t5/Behind-the-scenes-Labs/Enhancing-R-for-Distributed-Computing/ba-p/6795535#.VjE1K7erQQj&lt;br /&gt;
&lt;br /&gt;
[2] Clark Fitzgerald. https://github.com/vertica/ddR/wiki/Design&lt;/div&gt;</summary>
		<author><name>MichaelLawrence</name></author>	</entry>

	<entry>
		<id>https://wiki.r-consortium.org/view/Ddr2016</id>
		<title>Ddr2016</title>
		<link rel="alternate" type="text/html" href="https://wiki.r-consortium.org/view/Ddr2016"/>
				<updated>2017-05-01T17:59:37Z</updated>
		
		<summary type="html">&lt;p&gt;MichaelLawrence: Created page with &amp;quot;= Distributed Computing Working Group Progress Report: 2016 =  == Introduction ==   Data sizes continue to increase, while single core performance has stagnated. We scale comp...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Distributed Computing Working Group Progress Report: 2016 =&lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Data sizes continue to increase, while single core performance has&lt;br /&gt;
stagnated. We scale computations by leveraging multiple cores and&lt;br /&gt;
machines. Large datasets are expensive to replicate, so we minimize&lt;br /&gt;
data movement by moving the computation to the data. Many systems,&lt;br /&gt;
such as Hadoop, Spark, and massively parallel processing (MPP)&lt;br /&gt;
databases, have emerged to support these strategies, and each exposes&lt;br /&gt;
its own unique interface, with little standardization. &lt;br /&gt;
&lt;br /&gt;
Developing and executing an algorithm in the distributed context is a&lt;br /&gt;
complex task that requires specific knowledge of and dependency on the&lt;br /&gt;
system storing the data. It is also a task orthogonal to the primary&lt;br /&gt;
role of a data scientist or statistician: extracting knowledge from&lt;br /&gt;
data. The task thus falls to the data analysis environment, which&lt;br /&gt;
should mask the complexity behind a familiar interface, maintaining&lt;br /&gt;
user productivity. However, it is not always feasible to automatically&lt;br /&gt;
determine the optimal strategy for a given problem, so user input is&lt;br /&gt;
often beneficial. The environment should only abstract the details to&lt;br /&gt;
the extent deemed appropriate by the user.&lt;br /&gt;
&lt;br /&gt;
R needs a standardized, layered and idiomatic abstraction for&lt;br /&gt;
computing on distributed data structures. R has many packages that&lt;br /&gt;
provide parallelism constructs as well as bridges to distributed&lt;br /&gt;
systems such as Hadoop. Unfortunately, each interface has its own&lt;br /&gt;
syntax, parallelism techniques, and supported platform(s).  As a&lt;br /&gt;
consequence, contributors are forced to learn multiple idiosyncratic&lt;br /&gt;
interfaces, and to restrict each implementation to a particular&lt;br /&gt;
interface, thus limiting the applicability and adoption of their&lt;br /&gt;
software and hampering interoperability.&lt;br /&gt;
&lt;br /&gt;
The idea of a unified interface stemmed from a cross-industry workshop&lt;br /&gt;
organized at HP Labs in early 2015. The workshop was attended by&lt;br /&gt;
different companies, universities, and R-core members. Immediately&lt;br /&gt;
after the workshop, Indrajit Roy, Edward Ma, and Michael Lawrence began&lt;br /&gt;
designing an abstraction that later became known as the CRAN package&lt;br /&gt;
ddR (Distributed Data in R)[1]. It declares a unified API for distributed&lt;br /&gt;
computing in R and ensures that R programs written using the API are&lt;br /&gt;
portable across different systems, such as Distributed R, Spark, etc.&lt;br /&gt;
&lt;br /&gt;
The ddR package has completed its initial phase of development; the&lt;br /&gt;
first release is now on CRAN. Three ddR machine-learning algorithms&lt;br /&gt;
are also on CRAN, randomForest.ddR, glm.ddR, and kmeans.ddR. Two&lt;br /&gt;
reference backends for ddR have been completed, one for R’s parallel&lt;br /&gt;
package, and one for HP Distributed R. Example code and scripts to run&lt;br /&gt;
algorithms and code on both of these backends are available in our&lt;br /&gt;
public repository at https://github.com/vertica/ddR.&lt;br /&gt;
&lt;br /&gt;
The overarching goal of the ddR project was for it to be a starting&lt;br /&gt;
point in a collaborative effort, ultimately leading to a standard API&lt;br /&gt;
for working with distributed data in R.  We decided that it was&lt;br /&gt;
natural for the R Consortium to sponsor the collaboration, as it&lt;br /&gt;
should involve both industry and R-core members. To this end, we&lt;br /&gt;
established the R Consortium Working Group on Distributed Computing,&lt;br /&gt;
with a planned duration of a single year and the following aims:&lt;br /&gt;
&lt;br /&gt;
# Agree on the goal of the group, i.e., we should have a unifying framework for distributed computing. Define success metric.&lt;br /&gt;
# Brainstorm on what primitives should be included in the API. We can use ddR’s API of distributed data-structures and dmapply as the starting proposal. Understand relationship with existing packages such as parallel, foreach, etc.&lt;br /&gt;
# Explore how ddR like interface will interact with databases. Are there connections or redundancies with dplyr and multiplyr?&lt;br /&gt;
# Decide on a reference implementation for the API.&lt;br /&gt;
# Decide on whether we should also implement a few ecosystem packages, e.g., distributed algorithms written using the API.&lt;br /&gt;
&lt;br /&gt;
We declared the following milestones:&lt;br /&gt;
&lt;br /&gt;
# Mid-year milestone: Finalize API. Decide who all will help with developing the top-level implementation and backends.&lt;br /&gt;
# End-year milestone: Summary report and reference implementation. Socialize the final package.&lt;br /&gt;
&lt;br /&gt;
This report outlines the progress we have made on the above goals and&lt;br /&gt;
milestones, and how we plan to continue progress in the second half of&lt;br /&gt;
the working group term.&lt;br /&gt;
&lt;br /&gt;
== Results and Current Status ==&lt;br /&gt;
&lt;br /&gt;
The working group has achieved the first goal by agreeing that we&lt;br /&gt;
should aim for a unifying distributed computing abstraction, and we&lt;br /&gt;
have treated ddR as an informal API proposal.&lt;br /&gt;
&lt;br /&gt;
We have discussed many of the issues related to the second goal,&lt;br /&gt;
deciding which primitives should be part of the API.  We aim for the&lt;br /&gt;
API to support three shapes of data --- lists, arrays and data frames&lt;br /&gt;
--- and to enable the loading and basic manipulation of distributed&lt;br /&gt;
data, including multiple modes of functional iteration (e.g., apply()&lt;br /&gt;
operations). We aim to preserve consistency with base R data&lt;br /&gt;
structures and functions, so as to provide a simple path for users to&lt;br /&gt;
port computations to distributed systems.&lt;br /&gt;
&lt;br /&gt;
The ddR constructs permit a user to express a wide variety of&lt;br /&gt;
applications, including machine-learning algorithms, that will run on&lt;br /&gt;
different backends. We have successfully implemented distributed&lt;br /&gt;
versions of algorithms such as K-means, Regression, Random Forest, and&lt;br /&gt;
PageRank using the ddR API. Some of these ddR algorithms are now&lt;br /&gt;
available on CRAN.  In addition, the package provides several generic&lt;br /&gt;
definitions of common operators (such as colSums) that can be invoked&lt;br /&gt;
on distributed objects residing in the supporting backends.&lt;br /&gt;
&lt;br /&gt;
Each custom ddR backend is encapsulated in its own driver package. In&lt;br /&gt;
the conventional style of functional OOP, the driver registers methods&lt;br /&gt;
for generics declared by the backend API, such that ddR can dispatch&lt;br /&gt;
the backend-specific instructions by only calling the generics.&lt;br /&gt;
&lt;br /&gt;
The working group explored potential new backends with the aim of&lt;br /&gt;
broadening the applicability of the ddR interface. We hosted&lt;br /&gt;
presentations from external speakers on Spark and TensorFlow, and also&lt;br /&gt;
considered a generic SQL backend. The discussion focused on Spark&lt;br /&gt;
integration, and the R Consortium-funded intern Clark Fitzgerald took&lt;br /&gt;
on the task of developing a prototype Spark backend. The development&lt;br /&gt;
of the Spark backend encountered some obstacles, including the&lt;br /&gt;
immaturity of Spark and its R interfaces. Development is currently&lt;br /&gt;
paused, as we await additional funding.&lt;br /&gt;
&lt;br /&gt;
During the monthly meetings, the working group deliberated on&lt;br /&gt;
different design improvements for ddR itself. We list two key topics&lt;br /&gt;
that were discussed.  First, Michael Kane and Bryan Lewis argued for a&lt;br /&gt;
lower level API that directly operates on chunks of data. While ddR&lt;br /&gt;
supports chunk-wise data processing, via a combination of dmapply()&lt;br /&gt;
and parts(), its focus on distributed data structures means that&lt;br /&gt;
the chunk-based processing is exposed as the manipulation of these&lt;br /&gt;
data structures. Second, Clark Fitzgerald proposed restructuring the&lt;br /&gt;
ddR code into two layers that includes chunk-wise processing while&lt;br /&gt;
retaining the emphasis on distributed data structures[2]. The lower&lt;br /&gt;
level API, which will interface with backends, will use a Map() like&lt;br /&gt;
primitive to evaluate functions on chunks of data, while the higher&lt;br /&gt;
level ddR API will expose distributed data structures, dmapply, and&lt;br /&gt;
other convenience functions. This refactoring would facilitate the&lt;br /&gt;
implementation of additional backends.&lt;br /&gt;
&lt;br /&gt;
== Discussion and Future Plans ==&lt;br /&gt;
&lt;br /&gt;
The R Consortium-funded working group and internship has helped us&lt;br /&gt;
start a conversation on distributed computing APIs for R.  The ddR&lt;br /&gt;
CRAN package is a concrete outcome of this working group, and serves&lt;br /&gt;
as a platform for exploring APIs and their integration with different&lt;br /&gt;
backends. While ddR is still maturing, we have arrived at a consensus&lt;br /&gt;
for how we should improve and finalize the ddR API.&lt;br /&gt;
&lt;br /&gt;
As part of our goal for a reference implementation, we aim to develop&lt;br /&gt;
one or more prototype backends that will make the ddR interface useful&lt;br /&gt;
in practice. A good candidate backend is any open-source system that&lt;br /&gt;
is effective at R use cases and has strong community support. Spark&lt;br /&gt;
remains a viable candidate, and we also aim to further explore&lt;br /&gt;
TensorFlow.&lt;br /&gt;
&lt;br /&gt;
We plan for a second intern to perform three tasks: (1) refactor the&lt;br /&gt;
ddR API to a more final form, (2) compare Spark and TensorFlow in&lt;br /&gt;
detail, with an eye towards the feasibility of implementing a useful&lt;br /&gt;
backend, and (3) implement a prototype backend based on Spark or&lt;br /&gt;
TensorFlow, depending on the recommendation of the working group.&lt;br /&gt;
&lt;br /&gt;
By the conclusion of the working group, it will have produced:&lt;br /&gt;
* A stable version of the ddR package and at least one practical backend, released on CRAN,&lt;br /&gt;
* A list of requirements that are relevant and of interest to the community but have not yet been met by ddR, including alternative implementations that remain independent,&lt;br /&gt;
* A list of topics that the group believes worthy of further investigation.&lt;br /&gt;
&lt;br /&gt;
[1] http://h30507.www3.hp.com/t5/Behind-the-scenes-Labs/Enhancing-R-for-Distributed-Computing/ba-p/6795535#.VjE1K7erQQj&lt;br /&gt;
&lt;br /&gt;
[2] Clark Fitzgerald. https://github.com/vertica/ddR/wiki/Design&lt;/div&gt;</summary>
		<author><name>MichaelLawrence</name></author>	</entry>

	</feed>