marsi.nearest_neighbors package¶
Submodules¶
marsi.nearest_neighbors.model module¶
-
class
marsi.nearest_neighbors.model.KNN(fingerprint, k, mode)[source]¶ Bases:
objectK-Nearest Neighbors runner object.
It is assigned to a model and runs the knn function.
-
fp¶ numpy.array – A numpy.array with the fingerprint values.
-
k¶ int – The maximum number of neighbors to retrieve.
-
mode¶ str – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
Methods
__call__(nn)-
-
class
marsi.nearest_neighbors.model.RNN(fingerprint, radius, mode)[source]¶ Bases:
objectR-Nearest Neighbors runner object.
It is assigned to a model and runs the rnn function.
-
fp¶ numpy.array – A numpy.array with the fingerprint values.
-
radius¶ float – A distance radius ]0, 1].
-
mode¶ str – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
Methods
__call__(nn)-
-
class
marsi.nearest_neighbors.model.Distance(fingerprint, mode)[source]¶ Bases:
objectDistance runner object.
It is assigned to a model and runs the distance function.
-
fp¶ numpy.array – A numpy.array with the fingerprint values.
-
mode¶ str – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
Methods
__call__(nn)-
-
class
marsi.nearest_neighbors.model.DistributedNearestNeighbors(nns)[source]¶ Bases:
objectNearest Neighbors distributed implementation.
-
index¶ numpy.array – The index of all entries across multiple models.
Attributes
indexMethods
distance_matrix([mode])Generates a distance matrix between all elements in the models. distances(fingerprint[, mode, view])Retrieves the distance a fingerprint and all elements in the model. feature(index)Retrieves the fingerprint at a given index. k_nearest_neighbors(fingerprint[, k, mode, view])Retrieves the K nearest neighbors to a fingerprint. radius_nearest_neighbors(fingerprint[, …])Retrieves the nearest neighbors to a fingerprint within a distance radius. -
k_nearest_neighbors(fingerprint, k=5, mode='native', view=<cameo.parallel.SequentialView object>)[source]¶ Retrieves the K nearest neighbors to a fingerprint.
Parameters: - fingerprint (list, np.array, tuple) – A fingerprint to use as query.
- k (int) – The number of neighbors to retrieve.
- mode (str) – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
- view (cameo.parallel.ParallelView, cameo.parallel.SequentialView) – A parallel mode runner.
Returns: A dictionary with the InChI Key as key and the distance as value.
Return type:
-
radius_nearest_neighbors(fingerprint, radius=0.25, mode='native', view=<cameo.parallel.SequentialView object>)[source]¶ Retrieves the nearest neighbors to a fingerprint within a distance radius.
Parameters: - fingerprint (list, np.array, tuple) – A fingerprint to use as query.
- radius (float) – A distance radius ]0, 1].
- mode (str) – ‘native’ to run python implementation or ‘cl’ to run OpenCL implementation if available.
- view (cameo.parallel.ParallelView, cameo.parallel.SequentialView) – A parallel mode runner.
Returns: A dictionary with the InChI Key as key and the distance as value.
Return type:
-
distances(fingerprint, mode='native', view=<cameo.parallel.SequentialView object>)[source]¶ Retrieves the distance a fingerprint and all elements in the model.
Parameters: Returns: A dictionary with the InChI Key as key and the distance as value.
Return type:
-
index
-
-
class
marsi.nearest_neighbors.model.NearestNeighbors(index, features, features_lengths, use_cl=False, opencl_context=None)[source]¶ Bases:
marsi.nearest_neighbors.model_ext.CNearestNeighborsAttributes
cl_contextdata_frameCreate a DataFrame with this model. featuresfeatures_lengthsindexprogramstart_positionsMethods
distances(fingerprint[, mode])distances_cl(*args, **kwargs)distances_pyinput_buffer(*args, **kwargs)knn(fingerprint, k[, mode])K-Nearest Neighbors max_memory_allocation_size()output_buffer(*args, **kwargs)queue()rnn(fingerprint, radius[, mode])Radius-Nearest Neighbors run_kernel(*args, **kwargs)-
cl_context¶
-
program¶
-
knn(fingerprint, k, mode='native')[source]¶ K-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
rnn(fingerprint, radius, mode='native')[source]¶ Radius-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
distances_cl(*args, **kwargs)¶
-
input_buffer(*args, **kwargs)¶
-
output_buffer(*args, **kwargs)¶
-
run_kernel(*args, **kwargs)¶
-
index¶
-
data_frame¶ Create a DataFrame with this model.
Returns: A data frame. Return type: pandas.DataFrame
-
-
class
marsi.nearest_neighbors.model.DBNearestNeighbors(index, session, fingerprint_format, metric='jaccard')[source]¶ Bases:
objectAttributes
data_frameCreate a DataFrame with this model. featuresindexneighborsMethods
distances(fingerprint[, mode])knn(fingerprint, k[, mode])K-Nearest Neighbors rnn(fingerprint, radius[, mode])Radius-Nearest Neighbors -
neighbors¶
-
knn(fingerprint, k, mode='native')[source]¶ K-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
rnn(fingerprint, radius, mode='native')[source]¶ Radius-Nearest Neighbors
Parameters: Returns: (Index –> Distance)
Return type:
-
index¶
-
features¶
-
data_frame¶ Create a DataFrame with this model.
Returns: A data frame. Return type: pandas.DataFrame
-
marsi.nearest_neighbors.model_ext module¶
-
class
marsi.nearest_neighbors.model_ext.CNearestNeighbors¶ Bases:
objectAttributes
featuresfeatures_lengthsstart_positionsMethods
distances_py-
distances_py()¶
-
features¶
-
features_lengths¶
-
start_positions¶
-
Module contents¶
-
marsi.nearest_neighbors.build_nearest_neighbors_model(database, fpformat='fp4', solubility='high', n_models=5, chunk_size=1000000.0, view=<class 'cameo.parallel.SequentialView'>)[source]¶ Loads a NN model.
If a ‘default_model.pickle’ exists in data it will load the model. Otherwise it will build a model from the Database. This can take several hours depending on the size of the database.
Parameters: - database (marsi.io.mongodb.CollectionWrapper) – A Database interface to the metabolites.
- chunk_size (int) – Maximum number of entries per chunk.
- fpformat (str) – The format of the fingerprint (see pybel.fps)
- solubility (str) – One of high, medium, low or all.
- view (cameo.parallel.SequentialView, cameo.parallel.MultiprocesingView) – A view to control parallelization.
- n_models (int) – The number of NearestNeighbors models.
-
marsi.nearest_neighbors.load_nearest_neighbors_model(chunk_size=1000000.0, fpformat='fp4', solubility='all', session=<sqlalchemy.orm.session.Session object>, view=<cameo.parallel.SequentialView object>, model_size=100000, source='db', costum_query=None)[source]¶ Loads a NN model.
If a ‘default_model.pickle’ exists in data it will load the model. Otherwise it will build a model from the Database. This can take several hours depending on the size of the database.
Parameters: - chunk_size (int) – Maximum number of entries per chunk.
- fpformat (str) – The format of the fingerprint (see pybel.fps)
- solubility (str) – One of high, medium, low or all.
- view (cameo.parallel.SequentialView, cameo.parallel.MultiprocesingView) – A view to control parallelization.
- model_size (int) – The size of each NearestNeighbor in the ensemble.