snn_graph.Rd
Creates a k-nearest-neighbor graph from a tf-idf matrix.
# S3 method for BOWER snn_graph(bower, max_features = 100, remove_stopwords = FALSE, k = 5, ...) # S3 method for list snn_graph(gs, max_features = 100, remove_stopwords = FALSE, k = 5, ...)
max_features | use top features sorted by count to be used in bag of words matrix. The default value is set to 100. |
---|---|
remove_stopwords | a list of stopwords to use, by default it uses its inbuilt list of standard stopwords. The default value is FALSE. |
k | the maximum number of nearest neighbors to search. The default value is set to 5. |
... | passed to superml::TfIdfVectorizer |
gs | genesets in list. |
Returns a matrix of tf-idf score of tokens.
Given a list of text, it creates a sparse matrix consisting of tf-idf score for tokens from the text. See https://github.com/saraswatmks/superml/blob/master/R/TfidfVectorizer.R
. A k shortest-nearest neighbor graph is then computed using the overlap of of the terms.
gmt_file <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "bowerbird") bwr <- bower(gmt_file) bwr <- snn_graph(bwr) bwr #> BOWER class #> number of genesets: 50 #> genesets kNN Graph: #> IGRAPH c6b89b1 UNW- 50 124 -- #> + attr: name (v/c), weight (e/n) #> + edges from c6b89b1 (vertex names): #> [1] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_HYPOXIA #> [2] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_TGF_BETA_SIGNALING #> [3] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_IL6_JAK_STAT3_SIGNALING #> [4] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_APOPTOSIS #> [5] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_MYOGENESIS #> [6] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_COMPLEMENT #> [7] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION #> + ... omitted several edges