By utilizing the Apache DataSketches library this extension can efficiently compute approximate distinct item counts and estimations of quantiles, while allowing the sketches to be serialized.
Maintainer(s):
rustyconover
Installing and Loading
INSTALL datasketches FROM community;
LOAD datasketches;
About datasketches
For more information regarding usage, see the documentation.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| datasketch_cpc | aggregate | Creates a sketch_cpc data sketch by aggregating values or by aggregating other CPC data sketches | NULL | [datasketch_cpc(k, data)] |
| datasketch_cpc_describe | scalar | Return a string representation of the sketch | NULL | [datasketch_cpc_describe(sketch)] |
| datasketch_cpc_estimate | scalar | Return the estimate of the number of distinct items seen by the sketch | NULL | [datasketch_cpc_estimate(sketch)] |
| datasketch_cpc_is_empty | scalar | Return a boolean indicating if the sketch is empty | NULL | [datasketch_cpc_is_empty(sketch)] |
| datasketch_cpc_lower_bound | scalar | Return the lower bound of the number of distinct items seen by the sketch | NULL | [datasketch_cpc_lower_bound(sketch, std_dev)] |
| datasketch_cpc_union | aggregate | Creates a sketch_CPC data sketch by aggregating other CPC data sketches | NULL | [datasketch_cpc_union(k, data)] |
| datasketch_cpc_upper_bound | scalar | Return the upper bound of the number of distinct items seen by the sketch | NULL | [datasketch_cpc_upper_bound(sketch, std_dev)] |
| datasketch_hll | aggregate | Creates a sketch_hll data sketch by aggregating values or by aggregating other HLL data sketches | NULL | [datasketch_hll(k, data)] |
| datasketch_hll_describe | scalar | Return a string representation of the sketch | NULL | [datasketch_hll_describe(sketch, include_summary, include_detail)] |
| datasketch_hll_estimate | scalar | Return the estimate of the number of distinct items seen by the sketch | NULL | [datasketch_hll_estimate(sketch)] |
| datasketch_hll_is_compact | scalar | Return whether the sketch is in compact form | NULL | [datasketch_hll_is_compact(sketch)] |
| datasketch_hll_is_empty | scalar | Return a boolean indicating if the sketch is empty | NULL | [datasketch_hll_is_empty(sketch)] |
| datasketch_hll_lg_config_k | scalar | Return the value of log base 2 K for this sketch | NULL | [datasketch_hll_lg_config_k(sketch)] |
| datasketch_hll_lower_bound | scalar | Return the lower bound of the number of distinct items seen by the sketch | NULL | [datasketch_hll_lower_bound(sketch, std_dev)] |
| datasketch_hll_union | aggregate | Creates a sketch_HLL data sketch by aggregating other HLL data sketches | NULL | [datasketch_hll_union(k, data)] |
| datasketch_hll_upper_bound | scalar | Return the upper bound of the number of distinct items seen by the sketch | NULL | [datasketch_hll_upper_bound(sketch, std_dev)] |
| datasketch_kll | aggregate | Creates a sketch_kll data sketch by aggregating values or by aggregating other KLL data sketches | NULL | [datasketch_kll(k, data)] |
| datasketch_kll_cdf | scalar | Return the Cumulative Distribution Function (CDF) of the sketch for a series of points | NULL | [datasketch_kll_cdf(sketch, points, inclusive)] |
| datasketch_kll_describe | scalar | Return a description of this sketch | NULL | [datasketch_kll_describe(sketch, include_levels, include_items)] |
| datasketch_kll_is_empty | scalar | Return a boolean indicating if the sketch is empty | NULL | [datasketch_kll_is_empty(sketch)] |
| datasketch_kll_is_estimation_mode | scalar | Return a boolean indicating if the sketch is in estimation mode | NULL | [datasketch_kll_is_estimation_mode(sketch)] |
| datasketch_kll_k | scalar | Return the value of K for this sketch | NULL | [datasketch_kll_k(sketch)] |
| datasketch_kll_max_item | scalar | Return the maxium item in the sketch | NULL | [datasketch_kll_max_item(sketch)] |
| datasketch_kll_min_item | scalar | Return the minimum item in the sketch | NULL | [datasketch_kll_min_item(sketch)] |
| datasketch_kll_n | scalar | Return the number of items contained in the sketch | NULL | [datasketch_kll_rank(sketch)] |
| datasketch_kll_normalized_rank_error | scalar | Return the normalized rank error of the sketch | NULL | [datasketch_kll_normalized_rank_error(sketch, is_pmf)] |
| datasketch_kll_num_retained | scalar | Return the number of retained items in the sketch | NULL | [datasketch_kll_num_retained(sketch)] |
| datasketch_kll_pmf | scalar | Return the Probability Mass Function (PMF) of the sketch for a series of points | NULL | [datasketch_kll_pmf(sketch, points, inclusive)] |
| datasketch_kll_quantile | scalar | Return the quantile of a rank in the sketch | NULL | [datasketch_kll_rank(sketch, rank, inclusive)] |
| datasketch_kll_rank | scalar | Return the rank of an item in the sketch | NULL | [datasketch_kll_rank(sketch, item, inclusive)] |
| datasketch_quantiles | aggregate | Creates a sketch_quantiles data sketch by aggregating values or by aggregating other Quantiles data sketches | NULL | [datasketch_quantiles(k, data)] |
| datasketch_quantiles_cdf | scalar | Return the Cumulative Distribution Function (CDF) of the sketch for a series of points | NULL | [datasketch_quantiles_cdf(sketch, points, inclusive)] |
| datasketch_quantiles_describe | scalar | Return a description of this sketch | NULL | [datasketch_quantiles_describe(sketch, include_levels, include_items)] |
| datasketch_quantiles_is_empty | scalar | Return a boolean indicating if the sketch is empty | NULL | [datasketch_quantiles_is_empty(sketch)] |
| datasketch_quantiles_is_estimation_mode | scalar | Return a boolean indicating if the sketch is in estimation mode | NULL | [datasketch_quantiles_is_estimation_mode(sketch)] |
| datasketch_quantiles_k | scalar | Return the value of K for this sketch | NULL | [datasketch_quantiles_k(sketch)] |
| datasketch_quantiles_max_item | scalar | Return the maxium item in the sketch | NULL | [datasketch_quantiles_max_item(sketch)] |
| datasketch_quantiles_min_item | scalar | Return the minimum item in the sketch | NULL | [datasketch_quantiles_min_item(sketch)] |
| datasketch_quantiles_n | scalar | Return the number of items contained in the sketch | NULL | [datasketch_quantiles_rank(sketch)] |
| datasketch_quantiles_normalized_rank_error | scalar | Return the normalized rank error of the sketch | NULL | [datasketch_quantiles_normalized_rank_error(sketch, is_pmf)] |
| datasketch_quantiles_num_retained | scalar | Return the number of retained items in the sketch | NULL | [datasketch_quantiles_num_retained(sketch)] |
| datasketch_quantiles_pmf | scalar | Return the Probability Mass Function (PMF) of the sketch for a series of points | NULL | [datasketch_quantiles_pmf(sketch, points, inclusive)] |
| datasketch_quantiles_quantile | scalar | Return the quantile of a rank in the sketch | NULL | [datasketch_quantiles_rank(sketch, rank, inclusive)] |
| datasketch_quantiles_rank | scalar | Return the rank of an item in the sketch | NULL | [datasketch_quantiles_rank(sketch, item, inclusive)] |
| datasketch_req | aggregate | Creates a sketch_req data sketch by aggregating values or by aggregating other REQ data sketches | NULL | [datasketch_req(k, data)] |
| datasketch_req_cdf | scalar | Return the Cumulative Distribution Function (CDF) of the sketch for a series of points | NULL | [datasketch_req_cdf(sketch, points, inclusive)] |
| datasketch_req_describe | scalar | Return a description of this sketch | NULL | [datasketch_req_describe(sketch, include_levels, include_items)] |
| datasketch_req_is_empty | scalar | Return a boolean indicating if the sketch is empty | NULL | [datasketch_req_is_empty(sketch)] |
| datasketch_req_is_estimation_mode | scalar | Return a boolean indicating if the sketch is in estimation mode | NULL | [datasketch_req_is_estimation_mode(sketch)] |
| datasketch_req_k | scalar | Return the value of K for this sketch | NULL | [datasketch_req_k(sketch)] |
| datasketch_req_max_item | scalar | Return the maxium item in the sketch | NULL | [datasketch_req_max_item(sketch)] |
| datasketch_req_min_item | scalar | Return the minimum item in the sketch | NULL | [datasketch_req_min_item(sketch)] |
| datasketch_req_n | scalar | Return the number of items contained in the sketch | NULL | [datasketch_req_rank(sketch)] |
| datasketch_req_num_retained | scalar | Return the number of retained items in the sketch | NULL | [datasketch_req_num_retained(sketch)] |
| datasketch_req_pmf | scalar | Return the Probability Mass Function (PMF) of the sketch for a series of points | NULL | [datasketch_req_pmf(sketch, points, inclusive)] |
| datasketch_req_quantile | scalar | Return the quantile of a rank in the sketch | NULL | [datasketch_req_rank(sketch, rank, inclusive)] |
| datasketch_req_rank | scalar | Return the rank of an item in the sketch | NULL | [datasketch_req_rank(sketch, item, inclusive)] |
| datasketch_tdigest | aggregate | Creates a sketch_tdigest data sketch by aggregating values or by aggregating other TDigest data sketches | NULL | [datasketch_tdigest(k, data)] |
| datasketch_tdigest_cdf | scalar | Return the Cumulative Distribution Function (CDF) of the sketch for a series of points | NULL | [datasketch_tdigest_cdf(sketch, points)] |
| datasketch_tdigest_describe | scalar | Return a description of this sketch | NULL | [datasketch_tdigest_describe(sketch, include_centroids)] |
| datasketch_tdigest_is_empty | scalar | Return a boolean indicating if the sketch is empty | NULL | [datasketch_tdigest_is_empty(sketch)] |
| datasketch_tdigest_k | scalar | Return the value of K for this sketch | NULL | [datasketch_tdigest_k(sketch)] |
| datasketch_tdigest_pmf | scalar | Return the Probability Mass Function (PMF) of the sketch for a series of points | NULL | [datasketch_tdigest_pmf(sketch, points)] |
| datasketch_tdigest_quantile | scalar | Return the quantile of a rank in the sketch | NULL | [datasketch_tdigest_quantile(sketch, rank)] |
| datasketch_tdigest_rank | scalar | Return the rank of an item in the sketch | NULL | [datasketch_tdigest_rank(sketch, item)] |
| datasketch_tdigest_total_weight | scalar | Return the total weight of this sketch | NULL | [datasketch_tdigest_total_weight(sketch)] |
Added Types
| type_name | type_size | logical_type | type_category | internal |
|---|---|---|---|---|
| sketch_cpc | 16 | BLOB | NULL | true |
| sketch_hll | 16 | BLOB | NULL | true |
| sketch_kll_bigint | 16 | BLOB | NULL | true |
| sketch_kll_double | 16 | BLOB | NULL | true |
| sketch_kll_float | 16 | BLOB | NULL | true |
| sketch_kll_integer | 16 | BLOB | NULL | true |
| sketch_kll_smallint | 16 | BLOB | NULL | true |
| sketch_kll_tinyint | 16 | BLOB | NULL | true |
| sketch_kll_ubigint | 16 | BLOB | NULL | true |
| sketch_kll_uinteger | 16 | BLOB | NULL | true |
| sketch_kll_usmallint | 16 | BLOB | NULL | true |
| sketch_kll_utinyint | 16 | BLOB | NULL | true |
| sketch_quantiles_bigint | 16 | BLOB | NULL | true |
| sketch_quantiles_double | 16 | BLOB | NULL | true |
| sketch_quantiles_float | 16 | BLOB | NULL | true |
| sketch_quantiles_integer | 16 | BLOB | NULL | true |
| sketch_quantiles_smallint | 16 | BLOB | NULL | true |
| sketch_quantiles_tinyint | 16 | BLOB | NULL | true |
| sketch_quantiles_ubigint | 16 | BLOB | NULL | true |
| sketch_quantiles_uinteger | 16 | BLOB | NULL | true |
| sketch_quantiles_usmallint | 16 | BLOB | NULL | true |
| sketch_quantiles_utinyint | 16 | BLOB | NULL | true |
| sketch_req_bigint | 16 | BLOB | NULL | true |
| sketch_req_double | 16 | BLOB | NULL | true |
| sketch_req_float | 16 | BLOB | NULL | true |
| sketch_req_integer | 16 | BLOB | NULL | true |
| sketch_req_smallint | 16 | BLOB | NULL | true |
| sketch_req_tinyint | 16 | BLOB | NULL | true |
| sketch_req_ubigint | 16 | BLOB | NULL | true |
| sketch_req_uinteger | 16 | BLOB | NULL | true |
| sketch_req_usmallint | 16 | BLOB | NULL | true |
| sketch_req_utinyint | 16 | BLOB | NULL | true |
| sketch_tdigest_double | 16 | BLOB | NULL | true |
| sketch_tdigest_float | 16 | BLOB | NULL | true |
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| auto_fallback_to_full_download | Allows automatically falling back to full file downloads when possible. | BOOLEAN | GLOBAL | [] |
| ca_cert_file | Path to a custom certificate file for self-signed certificates. | VARCHAR | GLOBAL | [] |
| enable_curl_server_cert_verification | Enable server side certificate verification for CURL backend. | BOOLEAN | GLOBAL | [] |
| enable_server_cert_verification | Enable server side certificate verification. | BOOLEAN | GLOBAL | [] |
| force_download | Forces upfront download of file | BOOLEAN | GLOBAL | [] |
| hf_max_per_page | Debug option to limit number of items returned in list requests | UBIGINT | GLOBAL | [] |
| http_keep_alive | Keep alive connections. Setting this to false can help when running into connection failures | BOOLEAN | GLOBAL | [] |
| http_retries | HTTP retries on I/O error | UBIGINT | GLOBAL | [] |
| http_retry_backoff | Backoff factor for exponentially increasing retry wait time | FLOAT | GLOBAL | [] |
| http_retry_wait_ms | Time between retries | UBIGINT | GLOBAL | [] |
| http_timeout | HTTP timeout read/write/connection/retry (in seconds) | UBIGINT | GLOBAL | [] |
| httpfs_client_implementation | Select which is the HTTPUtil implementation to be used | VARCHAR | GLOBAL | [] |
| merge_http_secret_into_s3_request | Merges http secret params into S3 requests | BOOLEAN | GLOBAL | [] |
| s3_access_key_id | S3 Access Key ID | VARCHAR | GLOBAL | [] |
| s3_endpoint | S3 Endpoint | VARCHAR | GLOBAL | [] |
| s3_kms_key_id | S3 KMS Key ID | VARCHAR | GLOBAL | [] |
| s3_region | S3 Region | VARCHAR | GLOBAL | [] |
| s3_requester_pays | S3 use requester pays mode | BOOLEAN | GLOBAL | [] |
| s3_secret_access_key | S3 Access Key | VARCHAR | GLOBAL | [] |
| s3_session_token | S3 Session Token | VARCHAR | GLOBAL | [] |
| s3_uploader_max_filesize | S3 Uploader max filesize (between 50GB and 5TB) | VARCHAR | GLOBAL | [] |
| s3_uploader_max_parts_per_file | S3 Uploader max parts per file (between 1 and 10000) | UBIGINT | GLOBAL | [] |
| s3_uploader_thread_limit | S3 Uploader global thread limit | UBIGINT | GLOBAL | [] |
| s3_url_compatibility_mode | Disable Globs and Query Parameters on S3 URLs | BOOLEAN | GLOBAL | [] |
| s3_url_style | S3 URL style | VARCHAR | GLOBAL | [] |
| s3_use_ssl | S3 use SSL | BOOLEAN | GLOBAL | [] |
| unsafe_disable_etag_checks | Disable checks on ETag consistency | BOOLEAN | GLOBAL | [] |