Search Shortcut cmd + k | ctrl + k
datasketches

By utilizing the Apache DataSketches library this extension can efficiently compute approximate distinct item counts and estimations of quantiles, while allowing the sketches to be serialized.

Maintainer(s): rustyconover

Installing and Loading

INSTALL datasketches FROM community;
LOAD datasketches;

About datasketches

For more information regarding usage, see the documentation.

Added Functions

function_name function_type description comment examples
datasketch_cpc aggregate Creates a sketch_cpc data sketch by aggregating values or by aggregating other CPC data sketches NULL [datasketch_cpc(k, data)]
datasketch_cpc_describe scalar Return a string representation of the sketch NULL [datasketch_cpc_describe(sketch)]
datasketch_cpc_estimate scalar Return the estimate of the number of distinct items seen by the sketch NULL [datasketch_cpc_estimate(sketch)]
datasketch_cpc_is_empty scalar Return a boolean indicating if the sketch is empty NULL [datasketch_cpc_is_empty(sketch)]
datasketch_cpc_lower_bound scalar Return the lower bound of the number of distinct items seen by the sketch NULL [datasketch_cpc_lower_bound(sketch, std_dev)]
datasketch_cpc_union aggregate Creates a sketch_CPC data sketch by aggregating other CPC data sketches NULL [datasketch_cpc_union(k, data)]
datasketch_cpc_upper_bound scalar Return the upper bound of the number of distinct items seen by the sketch NULL [datasketch_cpc_upper_bound(sketch, std_dev)]
datasketch_hll aggregate Creates a sketch_hll data sketch by aggregating values or by aggregating other HLL data sketches NULL [datasketch_hll(k, data)]
datasketch_hll_describe scalar Return a string representation of the sketch NULL [datasketch_hll_describe(sketch, include_summary, include_detail)]
datasketch_hll_estimate scalar Return the estimate of the number of distinct items seen by the sketch NULL [datasketch_hll_estimate(sketch)]
datasketch_hll_is_compact scalar Return whether the sketch is in compact form NULL [datasketch_hll_is_compact(sketch)]
datasketch_hll_is_empty scalar Return a boolean indicating if the sketch is empty NULL [datasketch_hll_is_empty(sketch)]
datasketch_hll_lg_config_k scalar Return the value of log base 2 K for this sketch NULL [datasketch_hll_lg_config_k(sketch)]
datasketch_hll_lower_bound scalar Return the lower bound of the number of distinct items seen by the sketch NULL [datasketch_hll_lower_bound(sketch, std_dev)]
datasketch_hll_union aggregate Creates a sketch_HLL data sketch by aggregating other HLL data sketches NULL [datasketch_hll_union(k, data)]
datasketch_hll_upper_bound scalar Return the upper bound of the number of distinct items seen by the sketch NULL [datasketch_hll_upper_bound(sketch, std_dev)]
datasketch_kll aggregate Creates a sketch_kll data sketch by aggregating values or by aggregating other KLL data sketches NULL [datasketch_kll(k, data)]
datasketch_kll_cdf scalar Return the Cumulative Distribution Function (CDF) of the sketch for a series of points NULL [datasketch_kll_cdf(sketch, points, inclusive)]
datasketch_kll_describe scalar Return a description of this sketch NULL [datasketch_kll_describe(sketch, include_levels, include_items)]
datasketch_kll_is_empty scalar Return a boolean indicating if the sketch is empty NULL [datasketch_kll_is_empty(sketch)]
datasketch_kll_is_estimation_mode scalar Return a boolean indicating if the sketch is in estimation mode NULL [datasketch_kll_is_estimation_mode(sketch)]
datasketch_kll_k scalar Return the value of K for this sketch NULL [datasketch_kll_k(sketch)]
datasketch_kll_max_item scalar Return the maxium item in the sketch NULL [datasketch_kll_max_item(sketch)]
datasketch_kll_min_item scalar Return the minimum item in the sketch NULL [datasketch_kll_min_item(sketch)]
datasketch_kll_n scalar Return the number of items contained in the sketch NULL [datasketch_kll_rank(sketch)]
datasketch_kll_normalized_rank_error scalar Return the normalized rank error of the sketch NULL [datasketch_kll_normalized_rank_error(sketch, is_pmf)]
datasketch_kll_num_retained scalar Return the number of retained items in the sketch NULL [datasketch_kll_num_retained(sketch)]
datasketch_kll_pmf scalar Return the Probability Mass Function (PMF) of the sketch for a series of points NULL [datasketch_kll_pmf(sketch, points, inclusive)]
datasketch_kll_quantile scalar Return the quantile of a rank in the sketch NULL [datasketch_kll_rank(sketch, rank, inclusive)]
datasketch_kll_rank scalar Return the rank of an item in the sketch NULL [datasketch_kll_rank(sketch, item, inclusive)]
datasketch_quantiles aggregate Creates a sketch_quantiles data sketch by aggregating values or by aggregating other Quantiles data sketches NULL [datasketch_quantiles(k, data)]
datasketch_quantiles_cdf scalar Return the Cumulative Distribution Function (CDF) of the sketch for a series of points NULL [datasketch_quantiles_cdf(sketch, points, inclusive)]
datasketch_quantiles_describe scalar Return a description of this sketch NULL [datasketch_quantiles_describe(sketch, include_levels, include_items)]
datasketch_quantiles_is_empty scalar Return a boolean indicating if the sketch is empty NULL [datasketch_quantiles_is_empty(sketch)]
datasketch_quantiles_is_estimation_mode scalar Return a boolean indicating if the sketch is in estimation mode NULL [datasketch_quantiles_is_estimation_mode(sketch)]
datasketch_quantiles_k scalar Return the value of K for this sketch NULL [datasketch_quantiles_k(sketch)]
datasketch_quantiles_max_item scalar Return the maxium item in the sketch NULL [datasketch_quantiles_max_item(sketch)]
datasketch_quantiles_min_item scalar Return the minimum item in the sketch NULL [datasketch_quantiles_min_item(sketch)]
datasketch_quantiles_n scalar Return the number of items contained in the sketch NULL [datasketch_quantiles_rank(sketch)]
datasketch_quantiles_normalized_rank_error scalar Return the normalized rank error of the sketch NULL [datasketch_quantiles_normalized_rank_error(sketch, is_pmf)]
datasketch_quantiles_num_retained scalar Return the number of retained items in the sketch NULL [datasketch_quantiles_num_retained(sketch)]
datasketch_quantiles_pmf scalar Return the Probability Mass Function (PMF) of the sketch for a series of points NULL [datasketch_quantiles_pmf(sketch, points, inclusive)]
datasketch_quantiles_quantile scalar Return the quantile of a rank in the sketch NULL [datasketch_quantiles_rank(sketch, rank, inclusive)]
datasketch_quantiles_rank scalar Return the rank of an item in the sketch NULL [datasketch_quantiles_rank(sketch, item, inclusive)]
datasketch_req aggregate Creates a sketch_req data sketch by aggregating values or by aggregating other REQ data sketches NULL [datasketch_req(k, data)]
datasketch_req_cdf scalar Return the Cumulative Distribution Function (CDF) of the sketch for a series of points NULL [datasketch_req_cdf(sketch, points, inclusive)]
datasketch_req_describe scalar Return a description of this sketch NULL [datasketch_req_describe(sketch, include_levels, include_items)]
datasketch_req_is_empty scalar Return a boolean indicating if the sketch is empty NULL [datasketch_req_is_empty(sketch)]
datasketch_req_is_estimation_mode scalar Return a boolean indicating if the sketch is in estimation mode NULL [datasketch_req_is_estimation_mode(sketch)]
datasketch_req_k scalar Return the value of K for this sketch NULL [datasketch_req_k(sketch)]
datasketch_req_max_item scalar Return the maxium item in the sketch NULL [datasketch_req_max_item(sketch)]
datasketch_req_min_item scalar Return the minimum item in the sketch NULL [datasketch_req_min_item(sketch)]
datasketch_req_n scalar Return the number of items contained in the sketch NULL [datasketch_req_rank(sketch)]
datasketch_req_num_retained scalar Return the number of retained items in the sketch NULL [datasketch_req_num_retained(sketch)]
datasketch_req_pmf scalar Return the Probability Mass Function (PMF) of the sketch for a series of points NULL [datasketch_req_pmf(sketch, points, inclusive)]
datasketch_req_quantile scalar Return the quantile of a rank in the sketch NULL [datasketch_req_rank(sketch, rank, inclusive)]
datasketch_req_rank scalar Return the rank of an item in the sketch NULL [datasketch_req_rank(sketch, item, inclusive)]
datasketch_tdigest aggregate Creates a sketch_tdigest data sketch by aggregating values or by aggregating other TDigest data sketches NULL [datasketch_tdigest(k, data)]
datasketch_tdigest_cdf scalar Return the Cumulative Distribution Function (CDF) of the sketch for a series of points NULL [datasketch_tdigest_cdf(sketch, points)]
datasketch_tdigest_describe scalar Return a description of this sketch NULL [datasketch_tdigest_describe(sketch, include_centroids)]
datasketch_tdigest_is_empty scalar Return a boolean indicating if the sketch is empty NULL [datasketch_tdigest_is_empty(sketch)]
datasketch_tdigest_k scalar Return the value of K for this sketch NULL [datasketch_tdigest_k(sketch)]
datasketch_tdigest_pmf scalar Return the Probability Mass Function (PMF) of the sketch for a series of points NULL [datasketch_tdigest_pmf(sketch, points)]
datasketch_tdigest_quantile scalar Return the quantile of a rank in the sketch NULL [datasketch_tdigest_quantile(sketch, rank)]
datasketch_tdigest_rank scalar Return the rank of an item in the sketch NULL [datasketch_tdigest_rank(sketch, item)]
datasketch_tdigest_total_weight scalar Return the total weight of this sketch NULL [datasketch_tdigest_total_weight(sketch)]

Added Types

type_name type_size logical_type type_category internal
sketch_cpc 16 BLOB NULL true
sketch_hll 16 BLOB NULL true
sketch_kll_bigint 16 BLOB NULL true
sketch_kll_double 16 BLOB NULL true
sketch_kll_float 16 BLOB NULL true
sketch_kll_integer 16 BLOB NULL true
sketch_kll_smallint 16 BLOB NULL true
sketch_kll_tinyint 16 BLOB NULL true
sketch_kll_ubigint 16 BLOB NULL true
sketch_kll_uinteger 16 BLOB NULL true
sketch_kll_usmallint 16 BLOB NULL true
sketch_kll_utinyint 16 BLOB NULL true
sketch_quantiles_bigint 16 BLOB NULL true
sketch_quantiles_double 16 BLOB NULL true
sketch_quantiles_float 16 BLOB NULL true
sketch_quantiles_integer 16 BLOB NULL true
sketch_quantiles_smallint 16 BLOB NULL true
sketch_quantiles_tinyint 16 BLOB NULL true
sketch_quantiles_ubigint 16 BLOB NULL true
sketch_quantiles_uinteger 16 BLOB NULL true
sketch_quantiles_usmallint 16 BLOB NULL true
sketch_quantiles_utinyint 16 BLOB NULL true
sketch_req_bigint 16 BLOB NULL true
sketch_req_double 16 BLOB NULL true
sketch_req_float 16 BLOB NULL true
sketch_req_integer 16 BLOB NULL true
sketch_req_smallint 16 BLOB NULL true
sketch_req_tinyint 16 BLOB NULL true
sketch_req_ubigint 16 BLOB NULL true
sketch_req_uinteger 16 BLOB NULL true
sketch_req_usmallint 16 BLOB NULL true
sketch_req_utinyint 16 BLOB NULL true
sketch_tdigest_double 16 BLOB NULL true
sketch_tdigest_float 16 BLOB NULL true

Added Settings

name description input_type scope aliases
auto_fallback_to_full_download Allows automatically falling back to full file downloads when possible. BOOLEAN GLOBAL []
ca_cert_file Path to a custom certificate file for self-signed certificates. VARCHAR GLOBAL []
enable_curl_server_cert_verification Enable server side certificate verification for CURL backend. BOOLEAN GLOBAL []
enable_server_cert_verification Enable server side certificate verification. BOOLEAN GLOBAL []
force_download Forces upfront download of file BOOLEAN GLOBAL []
hf_max_per_page Debug option to limit number of items returned in list requests UBIGINT GLOBAL []
http_keep_alive Keep alive connections. Setting this to false can help when running into connection failures BOOLEAN GLOBAL []
http_retries HTTP retries on I/O error UBIGINT GLOBAL []
http_retry_backoff Backoff factor for exponentially increasing retry wait time FLOAT GLOBAL []
http_retry_wait_ms Time between retries UBIGINT GLOBAL []
http_timeout HTTP timeout read/write/connection/retry (in seconds) UBIGINT GLOBAL []
httpfs_client_implementation Select which is the HTTPUtil implementation to be used VARCHAR GLOBAL []
merge_http_secret_into_s3_request Merges http secret params into S3 requests BOOLEAN GLOBAL []
s3_access_key_id S3 Access Key ID VARCHAR GLOBAL []
s3_endpoint S3 Endpoint VARCHAR GLOBAL []
s3_kms_key_id S3 KMS Key ID VARCHAR GLOBAL []
s3_region S3 Region VARCHAR GLOBAL []
s3_requester_pays S3 use requester pays mode BOOLEAN GLOBAL []
s3_secret_access_key S3 Access Key VARCHAR GLOBAL []
s3_session_token S3 Session Token VARCHAR GLOBAL []
s3_uploader_max_filesize S3 Uploader max filesize (between 50GB and 5TB) VARCHAR GLOBAL []
s3_uploader_max_parts_per_file S3 Uploader max parts per file (between 1 and 10000) UBIGINT GLOBAL []
s3_uploader_thread_limit S3 Uploader global thread limit UBIGINT GLOBAL []
s3_url_compatibility_mode Disable Globs and Query Parameters on S3 URLs BOOLEAN GLOBAL []
s3_url_style S3 URL style VARCHAR GLOBAL []
s3_use_ssl S3 use SSL BOOLEAN GLOBAL []
unsafe_disable_etag_checks Disable checks on ETag consistency BOOLEAN GLOBAL []