Utility functions for LSH models
ft_lsh_utils
Description
Utility functions for LSH models
Usage
ml_approx_nearest_neighbors(
model,
dataset,
key,
num_nearest_neighbors,
dist_col = "distCol"
)
ml_approx_similarity_join(
model,
dataset_a,
dataset_b,
threshold,
dist_col = "distCol"
)Arguments
| Arguments | Description |
|---|---|
| model | A fitted LSH model, returned by either ft_minhash_lsh() or ft_bucketed_random_projection_lsh(). |
| dataset | The dataset to search for nearest neighbors of the key. |
| key | Feature vector representing the item to search for. |
| num_nearest_neighbors | The maximum number of nearest neighbors. |
| dist_col | Output column for storing the distance between each result row and the key. |
| dataset_a | One of the datasets to join. |
| dataset_b | Another dataset to join. |
| threshold | The threshold for the distance of row pairs. |