marsi.nearest_neighbors package¶
Submodules¶
marsi.nearest_neighbors.model module¶
-
class
marsi.nearest_neighbors.model.
KNN
(fingerprint, k, mode)[source]¶ Bases:
object
K-Nearest Neighbors runner object.
It is assigned to a model and runs the knn function.
-
fp
¶ numpy.array – A numpy.array with the fingerprint values.
-
k
¶ int – The maximum number of neighbors to retrieve.
-
mode
¶ str – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
Methods
__call__
(nn)-
-
class
marsi.nearest_neighbors.model.
RNN
(fingerprint, radius, mode)[source]¶ Bases:
object
R-Nearest Neighbors runner object.
It is assigned to a model and runs the rnn function.
-
fp
¶ numpy.array – A numpy.array with the fingerprint values.
-
radius
¶ float – A distance radius ]0, 1].
-
mode
¶ str – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
Methods
__call__
(nn)-
-
class
marsi.nearest_neighbors.model.
Distance
(fingerprint, mode)[source]¶ Bases:
object
Distance runner object.
It is assigned to a model and runs the distance function.
-
fp
¶ numpy.array – A numpy.array with the fingerprint values.
-
mode
¶ str – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
Methods
__call__
(nn)-
-
class
marsi.nearest_neighbors.model.
DistributedNearestNeighbors
(nns)[source]¶ Bases:
object
Nearest Neighbors distributed implementation.
-
index
¶ numpy.array – The index of all entries across multiple models.
Attributes
index
Methods
distance_matrix
([mode])Generates a distance matrix between all elements in the models. distances
(fingerprint[, mode, view])Retrieves the distance a fingerprint and all elements in the model. feature
(index)Retrieves the fingerprint at a given index. k_nearest_neighbors
(fingerprint[, k, mode, view])Retrieves the K nearest neighbors to a fingerprint. radius_nearest_neighbors
(fingerprint[, …])Retrieves the nearest neighbors to a fingerprint within a distance radius. -
k_nearest_neighbors
(fingerprint, k=5, mode='native', view=<cameo.parallel.SequentialView object>)[source]¶ Retrieves the K nearest neighbors to a fingerprint.
Parameters: - fingerprint (list, np.array, tuple) – A fingerprint to use as query.
- k (int) – The number of neighbors to retrieve.
- mode (str) – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
- view (cameo.parallel.ParallelView, cameo.parallel.SequentialView) – A parallel mode runner.
Returns: A dictionary with the InChI Key as key and the distance as value.
Return type:
-
radius_nearest_neighbors
(fingerprint, radius=0.25, mode='native', view=<cameo.parallel.SequentialView object>)[source]¶ Retrieves the nearest neighbors to a fingerprint within a distance radius.
Parameters: - fingerprint (list, np.array, tuple) – A fingerprint to use as query.
- radius (float) – A distance radius ]0, 1].
- mode (str) – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
- view (cameo.parallel.ParallelView, cameo.parallel.SequentialView) – A parallel mode runner.
Returns: A dictionary with the InChI Key as key and the distance as value.
Return type:
-
distances
(fingerprint, mode='native', view=<cameo.parallel.SequentialView object>)[source]¶ Retrieves the distance a fingerprint and all elements in the model.
Parameters: Returns: A dictionary with the InChI Key as key and the distance as value.
Return type:
-
index
-
-
class
marsi.nearest_neighbors.model.
NearestNeighbors
(index, features, features_lengths, use_cl=False, opencl_context=None)[source]¶ Bases:
marsi.nearest_neighbors.model_ext.CNearestNeighbors
Attributes
cl_context
data_frame
Create a DataFrame with this model. features
features_lengths
index
program
start_positions
Methods
distances
(fingerprint[, mode])distances_cl
(*args, **kwargs)distances_py
input_buffer
(*args, **kwargs)knn
(fingerprint, k[, mode])K-Nearest Neighbors max_memory_allocation_size
()output_buffer
(*args, **kwargs)queue
()rnn
(fingerprint, radius[, mode])Radius-Nearest Neighbors run_kernel
(*args, **kwargs)-
cl_context
¶
-
program
¶
-
knn
(fingerprint, k, mode='native')[source]¶ K-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
rnn
(fingerprint, radius, mode='native')[source]¶ Radius-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
distances_cl
(*args, **kwargs)¶
-
input_buffer
(*args, **kwargs)¶
-
output_buffer
(*args, **kwargs)¶
-
run_kernel
(*args, **kwargs)¶
-
index
¶
-
data_frame
¶ Create a DataFrame with this model.
Returns: A data frame. Return type: pandas.DataFrame
-
-
class
marsi.nearest_neighbors.model.
DBNearestNeighbors
(index, session, fingerprint_format, metric='jaccard')[source]¶ Bases:
object
Attributes
data_frame
Create a DataFrame with this model. features
index
neighbors
Methods
distances
(fingerprint[, mode])knn
(fingerprint, k[, mode])K-Nearest Neighbors rnn
(fingerprint, radius[, mode])Radius-Nearest Neighbors -
neighbors
¶
-
knn
(fingerprint, k, mode='native')[source]¶ K-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
rnn
(fingerprint, radius, mode='native')[source]¶ Radius-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
index
¶
-
features
¶
-
data_frame
¶ Create a DataFrame with this model.
Returns: A data frame. Return type: pandas.DataFrame
-
marsi.nearest_neighbors.model_ext module¶
-
class
marsi.nearest_neighbors.model_ext.
CNearestNeighbors
¶ Bases:
object
Attributes
features
features_lengths
start_positions
Methods
distances_py
-
distances_py
()¶
-
features
¶
-
features_lengths
¶
-
start_positions
¶
-
Module contents¶
-
marsi.nearest_neighbors.
build_nearest_neighbors_model
(database, fpformat='fp4', solubility='high', n_models=5, chunk_size=1000000.0, view=<class 'cameo.parallel.SequentialView'>)[source]¶ Loads a NN model.
If a ‘default_model.pickle’ exists in data it will load the model. Otherwise it will build a model from the Database. This can take several hours depending on the size of the database.
Parameters: - database (marsi.io.mongodb.CollectionWrapper) – A Database interface to the metabolites.
- chunk_size (int) – Maximum number of entries per chunk.
- fpformat (str) – The format of the fingerprint (see pybel.fps)
- solubility (str) – One of high, medium, low or all.
- view (cameo.parallel.SequentialView, cameo.parallel.MultiprocesingView) – A view to control parallelization.
- n_models (int) – The number of NearestNeighbors models.
-
marsi.nearest_neighbors.
load_nearest_neighbors_model
(chunk_size=1000000.0, fpformat='fp4', solubility='all', session=<sqlalchemy.orm.session.Session object>, view=<cameo.parallel.SequentialView object>, model_size=100000, source='db', costum_query=None)[source]¶ Loads a NN model.
If a ‘default_model.pickle’ exists in data it will load the model. Otherwise it will build a model from the Database. This can take several hours depending on the size of the database.
Parameters: - chunk_size (int) – Maximum number of entries per chunk.
- fpformat (str) – The format of the fingerprint (see pybel.fps)
- solubility (str) – One of high, medium, low or all.
- view (cameo.parallel.SequentialView, cameo.parallel.MultiprocesingView) – A view to control parallelization.
- model_size (int) – The size of each NearestNeighbor in the ensemble.