summarize_clusters — summarize_clusters.BOWER • bowerbird

Summarize the terms of the cluster using pagerank algorithm

# S3 method for BOWER
summarize_clusters(
  bower,
  cluster = NULL,
  pattern = NULL,
  sep = NULL,
  ncpus = NULL,
  disconnect_graph = FALSE,
  ...
)

# S3 method for igraph
summarize_clusters(
  graph,
  cluster = NULL,
  pattern = NULL,
  sep = NULL,
  ncpus = NULL,
  disconnect_graph = FALSE,
  ...
)

Arguments

cluster	vector of cluster labels for each geneset.
pattern	search pattern to remove from the terms. Unless specified, will default to built-in pattern.
sep	separator used/found in gene set names to be changed to blank spaces. Default value is underscore ('_').
ncpus	number of cores used for parallelizing reconstruction.
disconnect_graph	return a graph connecting only nodes in a cluster.
...	passed to textrank::textrank_sentences.
graph	geneset overlap graph.

Value

Returns a matrix of tf-idf score of tokens.

Details

Given a list of text, it creates a sparse matrix consisting of tf-idf score for tokens from the text. See https://github.com/saraswatmks/superml/blob/master/R/TfidfVectorizer.R. A k shortest-nearest neighbor graph is then computed using the overlap of of the terms.

Examples

gmt_file <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "bowerbird")
bwr <- bower(gmt_file)
bwr <- snn_graph(bwr)
bwr <- find_clusters(bwr)
bwr <- summarize_clusters(bwr, ncpus = 1)
bwr
#> BOWER class
#> number of genesets:  50 
#> genesets kNN Graph: 
#> IGRAPH 64116cd UNW- 50 124 -- 
#> + attr: name (v/c), cluster (v/n), geneset_size (v/n), terms (v/c),
#> | labels (v/c), weight (e/n)
#> + edges from 64116cd (vertex names):
#> [1] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_HYPOXIA                
#> [2] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_TGF_BETA_SIGNALING     
#> [3] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_IL6_JAK_STAT3_SIGNALING
#> [4] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_APOPTOSIS              
#> [5] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_MYOGENESIS             
#> [6] HALLMARK_TNFA_SIGNALING_VIA_NFKB--HALLMARK_COMPLEMENT             
#> + ... omitted several edges
#> number of geneset clusters:  9 
#> Core genes:
#> 	First six genes shown
#> 	 XENOBIOTIC METABOLISM : LIFR DNAJB9 CD36 ACOX1 IDH1 ECH1 ...
#> 	 E2F TARGETS : SAC3D1 KIF11 KIF23 RACGAP1 NUMA1 KIF2C ...
#> 	 ESTROGEN RESPONSE EARLY : JAG1 CTNNB1 GNAI1 FDFT1 DHCR7 FASN ...
#> 	 APOPTOSIS : ATF3 IER3 BIRC3 JUN EGR3 IL1B ...
#> 	 INTERFERON ALPHA RESPONSE : MX1 ISG15 IFIT3 IFI44 IFI35 IRF7 ...
#> 	 HEDGEHOG SIGNALING : VEGFA VLDLR MYH9 ERO1A DDIT4 STC2 ...
#> 	 APICAL JUNCTION : EGFR ADAM10 CLTC AP2M1 ARF1 MAPK1 ...
#> 	 TGF BETA SIGNALING : TGFB1 PMEPA1 SERPINE1 ID2 THBS1 PPP1R15A ...
#> 	 IL6 JAK STAT3 SIGNALING : IL4R IFNGR1 IL1R2 IL3RA TNFRSF1B CSF1 ...