Creates a k-nearest-neighbor graph from a tf-idf matrix.

# S3 method for BOWER
snn_graph(bower, max_features = 100, remove_stopwords = FALSE, k = 5, ...)

# S3 method for list
snn_graph(gs, max_features = 100, remove_stopwords = FALSE, k = 5, ...)

Arguments

max_features

use top features sorted by count to be used in bag of words matrix. The default value is set to 100.

remove_stopwords

a list of stopwords to use, by default it uses its inbuilt list of standard stopwords. The default value is FALSE.

k

the maximum number of nearest neighbors to search. The default value is set to 5.

...

passed to superml::TfIdfVectorizer

gs

genesets in list.

Value

Returns a matrix of tf-idf score of tokens.

Details

Given a list of text, it creates a sparse matrix consisting of tf-idf score for tokens from the text. See https://github.com/saraswatmks/superml/blob/master/R/TfidfVectorizer.R. A k shortest-nearest neighbor graph is then computed using the overlap of of the terms.

Examples

gmt_file <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "bowerbird")
bwr <- bower(gmt_file)
bwr <- snn_graph(bwr)
bwr
#> BOWER class
#> number of genesets:  50 
#> genesets kNN Graph: 
#> IGRAPH c6b89b1 UNW- 50 124 -- 
#> + attr: name (v/c), weight (e/n)
#> + edges from c6b89b1 (vertex names):
#> [1] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_HYPOXIA                          
#> [2] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_TGF_BETA_SIGNALING               
#> [3] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_IL6_JAK_STAT3_SIGNALING          
#> [4] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_APOPTOSIS                        
#> [5] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_MYOGENESIS                       
#> [6] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_COMPLEMENT                       
#> [7] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION
#> + ... omitted several edges