R Native API meeting 2016-06-30
From R Consortium Wiki
Informal meeting after the end of the useR! 2016 conference.
Participants: Michael Sannella, Torsten Hothorn, Dirk Eddelbuettel, Karl Millar, Simon Urbanek, Mick Jordan, Lukas Stadler
Discussion topics:
- (Dirk) From the POV of Rcpp, lots of useful functionality is hidden and not part of the official API. It hasn't changed in a long time, why not make available?
- It's not uncommon that people copy out code to make it available.
- comment on data.table: it has a tiny dependency trail, and keeps working with very old R versions.
- (Torsten) Packages like stats do not export their functionality at the native level (or there are problems with dependency resolution).
- Another case where people start copying out code.
- Is it possible to get symbols from specific package? yes...
- "eval" could be much more efficient if it had a "prepare" and an "execute" step, like prepared DB statements.
- Combined with a concise API, this would allow much more R functions to be reused on the native side, without a need for explicit C API.
- Or have simple C wrappers, which can be replaced with a direct implementation in case of performance problems.
- Do connection functions, e.g., have to be efficient?
- Makes for good documentation - "behaves like as.integer" (maybe "sans S3/S4 dispatch")
- Is it "future proofing the API" or "future proofing packages"?
- Discussions related to CRAN:
- Abandoned but popular packages sometimes get fixed by CRAN maintainers.
- How could a larger set of changes produced by API renamings be handled?
- Hard in the current system...
- Having "master" versions of all packages on github would help.
- Licensing / openness concerns with github?
- Testing of GNUR with modified API?
- Many packages require additional steps, installed libraries, etc.
- Maybe r-hub could help? (Lukas will contact Gabor Csardi)
- Two levels where changes can cause packages to fail: installing (compiling) and testing (where examples exist)
- What's the reason for the different prefixes?
- Rf_..., R..., or no prefix, camel case, upper case, underscores, etc.
- Historical reasons - cleanup could be done with tools or sed scripts.
- USE_RINTERNALS does two things: additional functionality and better performance
- the former could be achieved by different include files
- the latter should not be necessary (why not have everything at top speed, but leave the API in a state that can be verified?)
- it should be possible to create a wrapper around the API that checks the (documented) contract as tightly as possible
- The manual still explains functionality that is generally considered to be wrong (e.g., "TYPE(x) = LANGSXP;")
- There should be no global variables, only functions (or at least a contract that allows them to be implemented as functions)
- Not only CRAN - we need to describe the universe of (important?) packages.
- Dependencies between functions? (sic!)
- General steps this WG should/could take:
- Tighten API - remove stuff that is not used
- Remove altogether, or deprecate (or hide behind a #define USE_DEPRECATED_API)
- Renaming functions?
- Maybe we want to introduce a new naming scheme?
- Maybe have a period with both naming schemes
- Document the functions
- Describe the arguments and its contract.
- Who could do that? For some functions only core R developers can give a real account of their intended contract.
- Some functions are tightly related to R functions - maybe describe them in relation to these?
- Breaking packages is ok, to a certain degree
- You could do a lot via eval if the details of its behavior were defined well and non-surprising
- Getting proper error context at the C level?
- Java solved this with the Java Virtual Machine Tooling API (JVMTI)
- Maybe create shims of R functions as a new API? docs?
- Immediate next step:
- (Lukas) Define the "tighten API" task, what it entails, as a (student?) project, and find a "volunteer"