ComputeMixin module¶
-
class
graphistry.compute.ComputeMixin.
ComputeMixin
(*args, **kwargs) Bases:
object
-
chain
(*args, **kwargs) Experimental: Chain a list of operations
Return subgraph of matches according to the list of node & edge matchers If any matchers are named, add a correspondingly named boolean-valued column to the output
- Parameters
ops – List[ASTObject] Various node and edge matchers
- Returns
Plotter
- Return type
Plotter
Example: Find nodes of some type
from graphistry.ast import n people_nodes_df = g.chain([ n({"type": "person"}) ])._nodes
Example: Find 2-hop edge sequences with some attribute
from graphistry.ast import e_forward g_2_hops = g.chain([ e_forward({"interesting": True}, hops=2) ]) g_2_hops.plot()
Example: Find any node 1-2 hops out from another node, and label each hop
from graphistry.ast import n, e_undirected g_2_hops = g.chain([ n({g._node: "a"}), e_undirected(name="hop1"), e_undirected(name="hop2") ]) print('# first-hop edges:', len(g_2_hops._edges[ g_2_hops._edges.hop1 == True ]))
Example: Transaction nodes between two kinds of risky nodes
from graphistry.ast import n, e_forward, e_reverse g_risky = g.chain([ n({"risk1": True}), e_forward(to_fixed=True), n({"type": "transaction"}, name="hit"), e_reverse(to_fixed=True), n({"risk2": True}) ]) print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ]))
-
collapse
(node, attribute, column, self_edges=False, unwrap=False, verbose=False) Topology-aware collapse by given column attribute starting at node
Traverses directed graph from start node node and collapses clusters of nodes that share the same property so that topology is preserved.
- Parameters
node (
Union
[str
,int
]) – start node to begin traversalattribute (
Union
[str
,int
]) – the given attribute to collapse over within columncolumn (
Union
[str
,int
]) – the column of nodes DataFrame that contains attribute to collapse overself_edges (
bool
) – whether to include self edges in the collapsed graphunwrap (
bool
) – whether to unwrap the collapsed graph into a single nodeverbose (
bool
) – whether to print out collapse summary information
:returns:A new Graphistry instance with nodes and edges DataFrame containing collapsed nodes and edges given by column attribute – nodes and edges DataFrames contain six new columns collapse_{node | edges} and final_{node | edges}, while original (node, src, dst) columns are left untouched :rtype: Plottable
-
drop_nodes
(nodes) return g with any nodes/edges involving the node id series removed
-
filter_edges_by_dict
(*args, **kwargs) filter edges to those that match all values in filter_dict
-
filter_nodes_by_dict
(*args, **kwargs) filter nodes to those that match all values in filter_dict
-
get_degrees
(col='degree', degree_in='degree_in', degree_out='degree_out') Decorate nodes table with degree info
Edges must be dataframe-like: pandas, cudf, …
Parameters determine generated column names
Warning: Self-cycles are currently double-counted. This may change.
Example: Generate degree columns
edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']}) g = graphistry.edges(edges, 's', 'd') print(g._nodes) # None g2 = g.get_degrees() print(g2._nodes) # pd.DataFrame with 'id', 'degree', 'degree_in', 'degree_out'
- Parameters
col (
str
) –degree_in (
str
) –degree_out (
str
) –
-
get_indegrees
(col='degree_in') See get_degrees
- Parameters
col (
str
) –
-
get_outdegrees
(col='degree_out') See get_degrees
- Parameters
col (
str
) –
-
get_topological_levels
(level_col='level', allow_cycles=True, warn_cycles=True, remove_self_loops=True) Label nodes on column level_col based on topological sort depth Supports pandas + cudf, using parallelism within each level computation Options: * allow_cycles: if False and detects a cycle, throw ValueException, else break cycle by picking a lowest-in-degree node * warn_cycles: if True and detects a cycle, proceed with a warning * remove_self_loops: preprocess by removing self-cycles. Avoids allow_cycles=False, warn_cycles=True messages.
Example:
edges_df = gpd.DataFrame({‘s’: [‘a’, ‘b’, ‘c’, ‘d’],’d’: [‘b’, ‘c’, ‘e’, ‘e’]}) g = graphistry.edges(edges_df, ‘s’, ‘d’) g2 = g.get_topological_levels() g2._nodes.info() # pd.DataFrame with | ‘id’ , ‘level’ |
- Parameters
level_col (
str
) –allow_cycles (
bool
) –warn_cycles (
bool
) –remove_self_loops (
bool
) –
- Return type
Plottable
-
hop
(*args, **kwargs) Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources
g: Plotter nodes: dataframe with id column matching g._node. None signifies all nodes (default). hops: how many hops to consider, if any bound (default 1) to_fixed_point: keep hopping until no new nodes are found (ignores hops) direction: ‘forward’, ‘reverse’, ‘undirected’ edge_match: dict of kv-pairs to exact match (see also: filter_edges_by_dict) source_node_match: dict of kv-pairs to match nodes before hopping destination_node_match: dict of kv-pairs to match nodes after hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use)
-
keep_nodes
(nodes) Limit nodes and edges to those selected by parameter nodes For edges, both source and destination must be in nodes Nodes can be a list or series of node IDs, or a dictionary When a dictionary, each key corresponds to a node column, and nodes will be included when all match
-
materialize_nodes
(reuse=True, engine='auto') Generate g._nodes based on g._edges
Uses g._node for node id if exists, else ‘id’
Edges must be dataframe-like: cudf, pandas, …
When reuse=True and g._nodes is not None, use it
Example: Generate nodes
edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']}) g = graphistry.edges(edges, 's', 'd') print(g._nodes) # None g2 = g.materialize_nodes() print(g2._nodes) # pd.DataFrame
- Parameters
reuse (
bool
) –engine (
Union
[Engine
,Literal
[‘auto’]]) –
- Return type
Plottable
-
prune_self_edges
()
-
Chain¶
-
graphistry.compute.chain.
chain
(self, ops) Experimental: Chain a list of operations
Return subgraph of matches according to the list of node & edge matchers If any matchers are named, add a correspondingly named boolean-valued column to the output
- Parameters
ops (
List
[ASTObject
]) – List[ASTObject] Various node and edge matchers- Returns
Plotter
- Return type
Plotter
Example: Find nodes of some type
from graphistry.ast import n people_nodes_df = g.chain([ n({"type": "person"}) ])._nodes
Example: Find 2-hop edge sequences with some attribute
from graphistry.ast import e_forward g_2_hops = g.chain([ e_forward({"interesting": True}, hops=2) ]) g_2_hops.plot()
Example: Find any node 1-2 hops out from another node, and label each hop
from graphistry.ast import n, e_undirected g_2_hops = g.chain([ n({g._node: "a"}), e_undirected(name="hop1"), e_undirected(name="hop2") ]) print('# first-hop edges:', len(g_2_hops._edges[ g_2_hops._edges.hop1 == True ]))
Example: Transaction nodes between two kinds of risky nodes
from graphistry.ast import n, e_forward, e_reverse g_risky = g.chain([ n({"risk1": True}), e_forward(to_fixed=True), n({"type": "transaction"}, name="hit"), e_reverse(to_fixed=True), n({"risk2": True}) ]) print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ]))
- Parameters
self (
Plottable
) –
-
graphistry.compute.chain.
combine_steps
(g, kind, steps) Collect nodes and edges, taking care to deduplicate and tag any names
- Parameters
g (
Plottable
) –kind (
str
) –steps (
List
[Tuple
[ASTObject
,Plottable
]]) –
- Return type
DataFrame
Cluster¶
-
class
graphistry.compute.cluster.
ClusterMixin
(*args, **kwargs) Bases:
object
-
dbscan
(min_dist=0.2, min_samples=1, cols=None, kind='nodes', fit_umap_embedding=True, target=False, verbose=False, engine_dbscan='sklearn', *args, **kwargs) - DBSCAN clustering on cpu or gpu infered automatically. Adds a _dbscan column to nodes or edges.
NOTE: g.transform_dbscan(..) currently unsupported on GPU.
Examples:
g = graphistry.edges(edf, 'src', 'dst').nodes(ndf, 'node') # cluster by UMAP embeddings kind = 'nodes' | 'edges' g2 = g.umap(kind=kind).dbscan(kind=kind) print(g2._nodes['_dbscan']) | print(g2._edges['_dbscan']) # dbscan in umap or featurize API g2 = g.umap(dbscan=True, min_dist=1.2, min_samples=2, **kwargs) # or, here dbscan is infered from features, not umap embeddings g2 = g.featurize(dbscan=True, min_dist=1.2, min_samples=2, **kwargs) # and via chaining, g2 = g.umap().dbscan(min_dist=1.2, min_samples=2, **kwargs) # cluster by feature embeddings g2 = g.featurize().dbscan(**kwargs) # cluster by a given set of feature column attributes, or with target=True g2 = g.featurize().dbscan(cols=['ip_172', 'location', 'alert'], target=False, **kwargs) # equivalent to above (ie, cols != None and umap=True will still use features dataframe, rather than UMAP embeddings) g2 = g.umap().dbscan(cols=['ip_172', 'location', 'alert'], umap=True | False, **kwargs) g2.plot() # color by `_dbscan` column
- Useful:
Enriching the graph with cluster labels from UMAP is useful for visualizing clusters in the graph by color, size, etc, as well as assessing metrics per cluster, e.g. https://github.com/graphistry/pygraphistry/blob/master/demos/ai/cyber/cyber-redteam-umap-demo.ipynb
- Args:
- min_dist float
The maximum distance between two samples for them to be considered as in the same neighborhood.
- kind str
‘nodes’ or ‘edges’
- cols
list of columns to use for clustering given g.featurize has been run, nice way to slice features or targets by fragments of interest, e.g. [‘ip_172’, ‘location’, ‘ssh’, ‘warnings’]
- fit_umap_embedding bool
whether to use UMAP embeddings or features dataframe to cluster DBSCAN
- min_samples
The number of samples in a neighborhood for a point to be considered as a core point. This includes the point itself.
- target
whether to use the target column as the clustering feature
- Parameters
min_dist (
float
) –min_samples (
int
) –cols (
Union
[List
,str
,None
]) –kind (
str
) –fit_umap_embedding (
bool
) –target (
bool
) –verbose (
bool
) –engine_dbscan (
str
) –
-
transform_dbscan
(df, y=None, min_dist='auto', infer_umap_embedding=False, sample=None, n_neighbors=None, kind='nodes', return_graph=True, verbose=False) Transforms a minibatch dataframe to one with a new column ‘_dbscan’ containing the DBSCAN cluster labels on the minibatch and generates a graph with the minibatch and the original graph, with edges between the minibatch and the original graph inferred from the umap embedding or features dataframe. Graph nodes | edges will be colored by ‘_dbscan’ column.
Examples:
fit: g = graphistry.edges(edf, 'src', 'dst').nodes(ndf, 'node') g2 = g.featurize().dbscan() predict: :: emb, X, _, ndf = g2.transform_dbscan(ndf, return_graph=False) # or g3 = g2.transform_dbscan(ndf, return_graph=True) g3.plot()
likewise for umap:
fit: g = graphistry.edges(edf, 'src', 'dst').nodes(ndf, 'node') g2 = g.umap(X=.., y=..).dbscan() predict: :: emb, X, y, ndf = g2.transform_dbscan(ndf, ndf, return_graph=False) # or g3 = g2.transform_dbscan(ndf, ndf, return_graph=True) g3.plot()
- Args:
- df
dataframe to transform
- y
optional labels dataframe
- min_dist
The maximum distance between two samples for them to be considered as in the same neighborhood. smaller values will result in less edges between the minibatch and the original graph. Default ‘auto’, infers min_dist from the mean distance and std of new points to the original graph
- fit_umap_embedding
whether to use UMAP embeddings or features dataframe when inferring edges between the minibatch and the original graph. Default False, uses the features dataframe
- sample
number of samples to use when inferring edges between the minibatch and the original graph, if None, will only use closest point to the minibatch. If greater than 0, will sample the closest sample points in existing graph to pull in more edges. Default None
- kind
‘nodes’ or ‘edges’
- return_graph
whether to return a graph or the (emb, X, y, minibatch df enriched with DBSCAN labels), default True infered graph supports kind=’nodes’ only.
- verbose
whether to print out progress, default False
- Parameters
df (
DataFrame
) –y (
Optional
[DataFrame
]) –min_dist (
Union
[float
,str
]) –infer_umap_embedding (
bool
) –sample (
Optional
[int
]) –n_neighbors (
Optional
[int
]) –kind (
str
) –return_graph (
bool
) –verbose (
bool
) –
-
-
graphistry.compute.cluster.
dbscan_fit
(g, dbscan, kind='nodes', cols=None, use_umap_embedding=True, target=False, verbose=False) - Fits clustering on UMAP embeddings if umap is True, otherwise on the features dataframe
or target dataframe if target is True.
- Args:
- g
graphistry graph
- kind
‘nodes’ or ‘edges’
- cols
list of columns to use for clustering given g.featurize has been run
- use_umap_embedding
whether to use UMAP embeddings or features dataframe for clustering (default: True)
- Parameters
g (
Any
) –dbscan (
Any
) –kind (
str
) –cols (
Union
[List
,str
,None
]) –use_umap_embedding (
bool
) –target (
bool
) –verbose (
bool
) –
-
graphistry.compute.cluster.
dbscan_predict
(X, model) DBSCAN has no predict per se, so we reverse engineer one here from https://stackoverflow.com/questions/27822752/scikit-learn-predicting-new-points-with-dbscan
- Parameters
X (
DataFrame
) –model (
Any
) –
-
graphistry.compute.cluster.
get_model_matrix
(g, kind, cols, umap, target) Allows for a single function to get the model matrix for both nodes and edges as well as targets, embeddings, and features
- Args:
- g
graphistry graph
- kind
‘nodes’ or ‘edges’
- cols
list of columns to use for clustering given g.featurize has been run
- umap
whether to use UMAP embeddings or features dataframe
- target
whether to use the target dataframe or features dataframe
- Returns:
pd.DataFrame: dataframe of model matrix given the inputs
- Parameters
kind (
str
) –cols (
Union
[List
,str
,None
]) –
-
graphistry.compute.cluster.
lazy_cudf_import_has_dependancy
()
-
graphistry.compute.cluster.
lazy_dbscan_import_has_dependency
()
-
graphistry.compute.cluster.
make_safe_gpu_dataframes
(X, y, engine) helper method to coerce a dataframe to the correct type (pd vs cudf)
-
graphistry.compute.cluster.
resolve_cpu_gpu_engine
(engine) - Parameters
engine (
Literal
[typing.Literal[‘cuml’, ‘umap_learn’], ‘auto’]) –- Return type
Literal
[‘cuml’, ‘umap_learn’]
Collapse¶
-
graphistry.compute.collapse.
check_default_columns_present_and_coerce_to_string
(g) Helper to set COLLAPSE columns to nodes and edges dataframe, while converting src, dst, node to dtype(str) :type g:
Plottable
:param g: graphistry instance- Returns
graphistry instance
-
graphistry.compute.collapse.
check_has_set
(ndf, parent, child)
-
graphistry.compute.collapse.
collapse_algo
(g, child, parent, attribute, column, seen) Basically candy crush over graph properties in a topology aware manner
Checks to see if child node has desired property from parent, we will need to check if (start_node=parent: has_attribute , children nodes: has_attribute) by case (T, T), (F, T), (T, F) and (F, F),we start recursive collapse (or not) on the children, reassigning nodes and edges.
if (T, T), append children nodes to start_node, re-assign the name of the node, and update the edge table with new name,
if (F, T) start k-(potentially new) super nodes, with k the number of children of start_node. Start node keeps k outgoing edges.
if (T, F) it is the end of the cluster, and we keep new node as is; keep going
if (F, F); keep going
- Parameters
seen (
dict
) –g (
Plottable
) – graphistry instancechild (
Union
[str
,int
]) – child node to start traversal, for first traversal, set child=parent or vice versa.parent (
Union
[str
,int
]) – parent node to start traversal, in main call, this is set to child.attribute (
Union
[str
,int
]) – attribute to collapse bycolumn (
Union
[str
,int
]) – column in nodes dataframe to collapse over.
- Returns
graphistry instance with collapsed nodes.
-
graphistry.compute.collapse.
collapse_by
(self, parent, start_node, attribute, column, seen, self_edges=False, unwrap=False, verbose=True) Main call in collapse.py, collapses nodes and edges by attribute, and returns normalized graphistry object.
- Parameters
self (
Plottable
) – graphistry instanceparent (
Union
[str
,int
]) – parent node to start traversal, in main call, this is set to child.start_node (
Union
[str
,int
]) –attribute (
Union
[str
,int
]) – attribute to collapse bycolumn (
Union
[str
,int
]) – column in nodes dataframe to collapse over.seen (
dict
) – dict of previously collapsed pairs – {n1, n2) is seen as different from (n2, n1)verbose (
bool
) – bool, default True
:returns graphistry instance with collapsed and normalized nodes.
- Parameters
self_edges (
bool
) –unwrap (
bool
) –
- Return type
Plottable
-
graphistry.compute.collapse.
collapse_nodes_and_edges
(g, parent, child) Asserts that parent and child node in ndf should be collapsed into super node. Sets new ndf with COLLAPSE nodes in graphistry instance g
# this asserts that we SHOULD merge parent and child as super node # outside logic controls when that is the case # for example, it assumes parent is already in cluster keys of COLLAPSE node
- Parameters
g (
Plottable
) – graphistry instanceparent (
Union
[str
,int
]) – node with attribute in columnchild (
Union
[str
,int
]) – node with attribute in column
- Returns
graphistry instance
-
graphistry.compute.collapse.
get_children
(g, node_id, hops=1) Helper that gets children at k-hops from node node_id
:returns graphistry instance of hops
- Parameters
g (
Plottable
) –node_id (
Union
[str
,int
]) –hops (
int
) –
-
graphistry.compute.collapse.
get_cluster_store_keys
(ndf, node) Main innovation in finding and adding to super node. Checks if node is a segment in any collapse_node in COLLAPSE column of nodes DataFrame
- Parameters
ndf (
DataFrame
) – node DataFramenode (
Union
[str
,int
]) – node to find
- Returns
DataFrame of bools of where wrap_key(node) exists in COLLAPSE column
-
graphistry.compute.collapse.
get_edges_in_out_cluster
(g, node_id, attribute, column, directed=True) Traverses children of node_id and separates them into incluster and outcluster sets depending if they have attribute in node DataFrame column
- Parameters
g (
Plottable
) – graphistry instancenode_id (
Union
[str
,int
]) – node with attribute in columnattribute (
Union
[str
,int
]) – attribute to collapse in column overcolumn (
Union
[str
,int
]) – column to collapse overdirected (
bool
) –
-
graphistry.compute.collapse.
get_edges_of_node
(g, node_id, outgoing_edges=True, hops=1) Gets edges of node at k-hops from node
- Parameters
g (
Plottable
) – graphistry instancenode_id (
Union
[str
,int
]) – node to find edges fromoutgoing_edges (
bool
) – bool, if true, finds all outgoing edges of node, default Truehops (
int
) – the number of hops from node to take, default = 1
- Returns
DataFrame of edges
-
graphistry.compute.collapse.
get_new_node_name
(ndf, parent, child) If child in cluster group, melts name, else makes new parent_name from parent, child
- Parameters
ndf (
DataFrame
) – node DataFrameparent (
Union
[str
,int
]) – node with attribute in columnchild (
Union
[str
,int
]) – node with attribute in column
:returns new_parent_name
- Return type
str
-
graphistry.compute.collapse.
has_edge
(g, n1, n2, directed=True) Checks if n1 and n2 share an (directed or not) edge
- Parameters
g (
Plottable
) – graphistry instancen1 (
Union
[str
,int
]) – node to check if has edge to n2n2 (
Union
[str
,int
]) – node to check if has edge to n1directed (
bool
) – bool, if True, checks only outgoing edges from n1->`n2`, else finds undirected edges
- Return type
bool
- Returns
bool, if edge exists between n1 and n2
-
graphistry.compute.collapse.
has_property
(g, ref_node, attribute, column) Checks if ref_node is in node dataframe in column with attribute :type attribute:
Union
[str
,int
] :param attribute: :type column:Union
[str
,int
] :param column: :type g:Plottable
:param g: graphistry instance :type ref_node:Union
[str
,int
] :param ref_node: node to check if it as attribute in column- Return type
bool
- Returns
bool
-
graphistry.compute.collapse.
in_cluster_store_keys
(ndf, node) checks if node is in collapse_node in COLLAPSE column of nodes DataFrame
- Parameters
ndf (
DataFrame
) – nodes DataFramenode (
Union
[str
,int
]) – node to find
- Return type
bool
- Returns
bool
-
graphistry.compute.collapse.
melt
(ndf, node) Reduces node if in cluster store, otherwise passes it through. ex:
node = “4” will take any sequence from get_cluster_store_keys, “1 2 3”, “4 3 6” and returns “1 2 3 4 6” when they have a common entry (3).
:param ndf, node DataFrame :type node:
Union
[str
,int
] :param node: node to melt :returns new_parent_name of super node- Parameters
ndf (
DataFrame
) –- Return type
str
-
graphistry.compute.collapse.
normalize_graph
(g, self_edges=False, unwrap=False) Final step after collapse traversals are done, removes duplicates and moves COLLAPSE columns into respective(node, src, dst) columns of node, edges dataframe from Graphistry instance g.
- Parameters
g (
Plottable
) – graphistry instanceself_edges (
bool
) – bool, whether to keep duplicates from ndf, edf, default Falseunwrap (
bool
) – bool, whether to unwrap node text with ~, default True
- Return type
Plottable
- Returns
final graphistry instance
-
graphistry.compute.collapse.
reduce_key
(key) Takes “1 1 2 1 2 3” -> “1 2 3
- Parameters
key (
Union
[str
,int
]) – node name- Return type
str
- Returns
new node name with duplicates removed
-
graphistry.compute.collapse.
unpack
(g) Helper method that unpacks graphistry instance
ex:
ndf, edf, src, dst, node = unpack(g)
- Parameters
g (
Plottable
) – graphistry instance- Returns
node DataFrame, edge DataFrame, source column, destination column, node column
-
graphistry.compute.collapse.
unwrap_key
(name) Unwraps node name: ~name~ -> name
- Parameters
name (
Union
[str
,int
]) – node to unwrap- Return type
str
- Returns
unwrapped node name
-
graphistry.compute.collapse.
wrap_key
(name) Wraps node name -> ~name~
- Parameters
name (
Union
[str
,int
]) – node name- Return type
str
- Returns
wrapped node name
Conditional¶
-
class
graphistry.compute.conditional.
ConditionalMixin
(*args, **kwargs) Bases:
object
-
conditional_graph
(x, given, kind='nodes', *args, **kwargs) conditional_graph – p(x|given) = p(x, given) / p(given)
Useful for finding the conditional probability of a node or edge attribute
returned dataframe sums to 1 on each column
- Parameters
x – target column
given – the dependent column
kind – ‘nodes’ or ‘edges’
args/kwargs – additional arguments for g.bind(…)
- Returns
a graphistry instance with the conditional graph edges weighted by the conditional probability. edges are between x and given, keep in mind that g._edges.columns = [given, x, _probs]
-
conditional_probs
(x, given, kind='nodes', how='index') Produces a Dense Matrix of the conditional probability of x given y
- Args:
x: the column variable of interest given the column y=given given : the variabe to fix constant df pd.DataFrame: dataframe how (str, optional): One of ‘column’ or ‘index’. Defaults to ‘index’. kind (str, optional): ‘nodes’ or ‘edges’. Defaults to ‘nodes’.
- Returns:
pd.DataFrame: the conditional probability of x given the column y as dense array like dataframe
-
-
graphistry.compute.conditional.
conditional_probability
(x, given, df) - conditional probability function over categorical variables
p(x | given) = p(x, given)/p(given)
- Args:
x: the column variable of interest given the column ‘given’ given: the variabe to fix constant df: dataframe with columns [given, x]
- Returns:
pd.DataFrame: the conditional probability of x given the column ‘given’
- Parameters
df (
DataFrame
) –
-
graphistry.compute.conditional.
probs
(x, given, df, how='index') Produces a Dense Matrix of the conditional probability of x given y=given
- Args:
x: the column variable of interest given the column ‘y’ given : the variabe to fix constant df pd.DataFrame: dataframe how (str, optional): One of ‘column’ or ‘index’. Defaults to ‘index’.
- Returns:
pd.DataFrame: the conditional probability of x given the column ‘y’ as dense array like dataframe
- Parameters
df (
DataFrame
) –
Filter by Dictionary¶
-
graphistry.compute.filter_by_dict.
filter_by_dict
(df, filter_dict=None) return df where rows match all values in filter_dict
- Parameters
filter_dict (
Optional
[dict
]) –- Return type
DataFrame
-
graphistry.compute.filter_by_dict.
filter_edges_by_dict
(self, filter_dict) filter edges to those that match all values in filter_dict
- Parameters
self (
Plottable
) –filter_dict (
dict
) –
- Return type
Plottable
-
graphistry.compute.filter_by_dict.
filter_nodes_by_dict
(self, filter_dict) filter nodes to those that match all values in filter_dict
- Parameters
self (
Plottable
) –filter_dict (
dict
) –
- Return type
Plottable
Hop¶
-
graphistry.compute.hop.
hop
(self, nodes=None, hops=1, to_fixed_point=False, direction='forward', edge_match=None, source_node_match=None, destination_node_match=None, return_as_wave_front=False) Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources
g: Plotter nodes: dataframe with id column matching g._node. None signifies all nodes (default). hops: how many hops to consider, if any bound (default 1) to_fixed_point: keep hopping until no new nodes are found (ignores hops) direction: ‘forward’, ‘reverse’, ‘undirected’ edge_match: dict of kv-pairs to exact match (see also: filter_edges_by_dict) source_node_match: dict of kv-pairs to match nodes before hopping destination_node_match: dict of kv-pairs to match nodes after hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use)
- Parameters
self (
Plottable
) –nodes (
Optional
[DataFrame
]) –hops (
Optional
[int
]) –to_fixed_point (
bool
) –direction (
str
) –edge_match (
Optional
[dict
]) –source_node_match (
Optional
[dict
]) –destination_node_match (
Optional
[dict
]) –
- Return type
Plottable