Get a report analogous to a Freeform Table visualization in CJA Workspace. The function uses the arguments to construct and execute a JSON-based query to the CJA API and then returns the results as a data frame.

cja_freeform_table(
  dataviewId = Sys.getenv("CJA_DATAVIEW_ID"),
  date_range = c(Sys.Date() - 30, Sys.Date() - 1),
  dimensions = c("page", "lasttouchchannel", "mobiledevicetype"),
  metrics = c("visits", "visitors"),
  top = c(5),
  page = 0,
  filterType = "breakdown",
  segmentId = NA,
  metricSort = "desc",
  include_unspecified = TRUE,
  search = NA,
  prettynames = FALSE,
  allowRemoteLoad = "default",
  useCache = TRUE,
  useResultsCache = FALSE,
  includeOberonXml = FALSE,
  includePlatformPredictiveObjects = FALSE,
  debug = FALSE,
  check_components = FALSE
)

Arguments

dataviewId

CJA Data View ID (dv). If an environment variable called CJA_DATAVIEW_ID exists in .Renviron or elsewhere and no dataviewId argument is provided, then the CJA_DATAVIEW_ID value will be used. Use cja_get_dataviews() to get a list of available dataviewId. Required

date_range

A length-2 vector with a start date and an end date. POSIXt objects are sent as is, for fine control over the date range. Numeric values are automatically converted to dates.

dimensions

A character vector of dimensions. There is currently a limit of 20 dimension breakdowns. Each dimension value that gets broken down by another dimension requires an additional API call, so the more dimensions that are included, the longer the function will take to return results. This is how the CJA API works. Use cja_get_dimensions() to get a list of available dimensions IDs.

metrics

A character vector of metrics. Use cja_get_metrics() and cja_get_calculatedmetrics() to get a list of available metrics IDs.

top

The number of values to be pulled for each dimension. The default is 5 and the "top" is based on the first metric value (along with metricSort). If there are multiple dimensions, then this argument can either be a vector that includes the number of values to include at each level (each breakdown) or, if a single value is used, then that will be the maximum number of values to return at each level. See the Details for information on the unique handling of daterange... values.

page

Used in combination with top to return the next page of results. Uses 0-based numbering (e.g., top = 50000 and page = 1 will return the top 50,000 items starting at 50,001).

filterType

This is a placeholder argument for use as additional functionality is added to the package. Currently, it defaults to breakdown, and that is the only supported value.

segmentId

A single segment ID or a vector of multiple segment IDs to apply to the overall report. If multiple segmentId values are included, the segments will be effectived ANDed together, just as if multiple segments were added to the header of an Analysis Workspace panel. Use cja_get_filters() to get a list of available segmentId values.

metricSort

Pre-sorts the table by metrics. Values are either asc (ascending) or desc (descending).

include_unspecified

Whether or not to include Unspecified values in the results. This is the equivalent of the Include Unspecified (None) checkbox in freeform tables in Analysis Workspace. This defaults to TRUE, which includes Unspecified values in the results.

search

Criteria to filter the results by one or more dimensions. Searches are case-insenstive. Refer to the Details for more information on constructing values for this argument.

prettynames

A logical that determines whether the column names in the results use the API field name (e.g., "mobiledevicetype", "pageviews") or the "pretty name" for the field (e.g., "Mobile Device Type", "Page Views"). This applies to both dimensions and metrics. The default value is FALSE, which returns the API field names. For custom eVars, props, and events, the non-pretty values are simply the variable number (e.g., "evar2", "prop3", "event15"). If TRUE, undoes any efficiency gains from setting check_components to FALSE.

allowRemoteLoad

Controls if Oberon should remote load data. Default behavior is true with fallback to false if remote data does not exist. The default is "default" but options include: "true", "false", or "default".

useCache

Use caching for faster requests (Use cached dimensions to speed up permission checks - This does not do any report caching). TRUE (default) or FALSE

useResultsCache

Use results caching for faster reporting times (This is a pass through to Oberon which manages the Cache) FALSE (default) or TRUE

includeOberonXml

Controls if Oberon XML should be returned in the response - DEBUG ONLY. FALSE (default) or TRUE

includePlatformPredictiveObjects

Controls if platform Predictive Objects should be returned in the response. Only available when using Anomaly Detection or Forecasting- DEBUG ONLY. FALSE (default) or TRUE

debug

Set to TRUE to publish the full JSON request(s) being sent to the API to the console when the function is called. The default is FALSE.

check_components

Logical, whether to check the validity of metrics and dimensions before running the query. Defaults to TRUE, but causes cja_freeform_report to request all dimensions and metrics from the API, which may be inefficient if you're running many queries. If you have many queries, it's more efficient to implement validity checking yourself on either side of your queries.

Value

A data frame with dimensions and metrics.

Details

This function is based on the Freeform Table visualization in Analysis Workspace. It is accessing the same API call type that is used to generate those visualizations.

Dimension Ordering

CJA only queries one dimension at a time, even though the results get returned in a single data frame (or table in the case of Analysis Workspace). The more dimensions are included in the report--the more breakdowns of the data--the more queries are required. As a result, the order of the dimensions can have a dramatic impact on the total query time, even if the resulting data is essentially identical.

One way to understand this is to consider how much dragging and dropping would be required to return the data in Analysis Workspace if you were not able to <Shift>-<click> to highlight multiple values before dragging a new dimension to break down existing values.

Consider a scenario where you are pulling metrics for the last 30 days (daterangeday) for Mobile Device Type (mobiledevicetype), which has 7 unique values. Setting dimensions = c("daterangeday", "mobiledevicetype") would make one query to get the values of the 30 days included. The query would then run a separate query for each of those 30 days to get the mobiledevicetype results for each day. So, this would be 31 API calls.

If, instead, the function was called with the dimension values reversed (dimensions = c("mobiledevicetype", "daterangeday")), then the first query would return the 7 mobiledevicetype values, and then would run an additional query for each of those 7 mobile device type values to return the results for the 30 days within each device type. This would be only 7 API calls.

Strategically ordering dimensions--and then wrangling the resulting data set as needed--is one of the best ways to improve query performance.

Date Handling

Date handling has several special characteristics that are worth getting familiar with:

  • The API names for day, week, month, etc. are prepended with daterange, so daily data uses daterangeday, weekly data uses daterangeweek, monthly data uses daterangemonth, etc.

  • When setting the argument for top, if the first (or only) dimension value is a daterange... object, then, if this argument is not explicitly specified or if it uses only a single value (e.g., top = 10), the function will still return all of the values that fall in that date range. For instance, if the date_range was set for a 30-day period and the first dimension value was daterangeday, and no value is specified for top, rather than simply returning the first 5 dates in the range, all 30 days will be returned. In the same scenario, if top = 10 was set, then all 30 days would still be returned, and the 10 would simply be applied to the additional dimensions.

  • If you want to return all of the date/time values but then have specific control over the number of values returned for each of the drilldown dimensions, then set 0 as the first value in the top argument and then specify different numbers for each breakdown (e.g., top = c(0, 3, 10) would return all of the date/time values for the specified date_range, the top 3 values for the second specified dimension, and then the top 10 values for each of the next dimension's results).

  • If you are using a daterange... value not as the first dimension, then simply using 0 at the same level in the top argument specification will return all of the values for that date/time value.

Search/Filtering

There are powerful filtering abilities within the function. However, to support that power requires a syntax that can feel a bit cumbersome for simple queries. Note: search filters are case-insensitive. This is CJA API functionality and can not be specified otherwise in queries.

The search argument takes a vector of search strings, with each value in the vector corresponding to the dimension value that is at the same position. These search strings support a range of operators, including AND, OR, NOT, MATCH, CONTAINS, BEGINS-WITH, and ENDS-WITH.

The default for any search string is to use CONTAINS. Consider a query where dimensions = c("mobiledevicetype", "lasttouchchannel"):

  • search = "CONTAINS 'mobile'" will return results where mobiledevicetype contains "mobile", so would return all rows for Mobile Phone.

  • This could be shortened to search = "'mobile'" and would behave exactly the same, since CONTAINS is the default operator

  • search = c("CONTAINS 'mobile'", "CONTAINS 'search'") will return results where mobiledevicetype contains "mobile" and, within those results, results where lasttouchchannel contains "search".

  • search = c("(CONTAINS 'mobile') OR (CONTAINS 'tablet')", "(MATCH 'paid search')") will return results where mobiledevicetype contains "mobile" or "tablet" and, within those results, will only include results where lasttouchchannel exactly matches "paid search" (but is case-insensitive, so would return "Paid Search" values).