R Native API meeting 2016-06-30

From R Consortium Wiki
Jump to: navigation, search

Informal meeting after the end of the useR! 2016 conference.

Participants: Michael Sannella, Torsten Hothorn, Dirk Eddelbuettel, Karl Millar, Simon Urbanek, Mick Jordan, Lukas Stadler

Discussion topics:

  • (Dirk) From the POV of Rcpp, lots of useful functionality is hidden and not part of the official API. It hasn't changed in a long time, why not make available?
  • It's not uncommon that people copy out code to make it available.
  • comment on data.table: it has a tiny dependency trail, and keeps working with very old R versions.
  • (Torsten) Packages like stats do not export their functionality at the native level (or there are problems with dependency resolution).
  • Another case where people start copying out code.
  • Is it possible to get symbols from specific package? yes...
  • "eval" could be much more efficient if it had a "prepare" and an "execute" step, like prepared DB statements.
  • Combined with a concise API, this would allow much more R functions to be reused on the native side, without a need for explicit C API.
  • Or have simple C wrappers, which can be replaced with a direct implementation in case of performance problems.
  • Do connection functions, e.g., have to be efficient?
  • Makes for good documentation - "behaves like as.integer" (maybe "sans S3/S4 dispatch")
  • Is it "future proofing the API" or "future proofing packages"?
  • Discussions related to CRAN:
  • Abandoned but popular packages sometimes get fixed by CRAN maintainers.
  • How could a larger set of changes produced by API renamings be handled?
  • Hard in the current system...
  • Having "master" versions of all packages on github would help.
  • Licensing / openness concerns with github?
  • Testing of GNUR with modified API?
  • Many packages require additional steps, installed libraries, etc.
  • Maybe r-hub could help? (Lukas will contact Gabor Csardi)
  • Two levels where changes can cause packages to fail: installing (compiling) and testing (where examples exist)
  • What's the reason for the different prefixes?
  • Rf_..., R..., or no prefix, camel case, upper case, underscores, etc.
  • Historical reasons - cleanup could be done with tools or sed scripts.
  • USE_RINTERNALS does two things: additional functionality and better performance
  • the former could be achieved by different include files
  • the latter should not be necessary (why not have everything at top speed, but leave the API in a state that can be verified?)
  • it should be possible to create a wrapper around the API that checks the (documented) contract as tightly as possible
  • The manual still explains functionality that is generally considered to be wrong (e.g., "TYPE(x) = LANGSXP;")
  • There should be no global variables, only functions (or at least a contract that allows them to be implemented as functions)
  • Not only CRAN - we need to describe the universe of (important?) packages.
  • Dependencies between functions? (sic!)
  • General steps this WG should/could take:
  • Tighten API - remove stuff that is not used
  • Remove altogether, or deprecate (or hide behind a #define USE_DEPRECATED_API)
  • Renaming functions?
  • Maybe we want to introduce a new naming scheme?
  • Maybe have a period with both naming schemes
  • Document the functions
  • Describe the arguments and its contract.
  • Who could do that? For some functions only core R developers can give a real account of their intended contract.
  • Some functions are tightly related to R functions - maybe describe them in relation to these?
  • Breaking packages is ok, to a certain degree
  • You could do a lot via eval if the details of its behavior were defined well and non-surprising
  • Getting proper error context at the C level?
  • Java solved this with the Java Virtual Machine Tooling API (JVMTI)
  • Maybe create shims of R functions as a new API? docs?
  • Immediate next step:
  • (Lukas) Define the "tighten API" task, what it entails, as a (student?) project, and find a "volunteer"