Overview
What's aruku?
aruku is a random walk engine for Apache Spark. It helps you program and model your random walk easily and lets a distributed, fault-tolerant, and optimized engine take care of running it.
How do I use it?
To run node2vec on a graph from Apache Spark Graphx :
import aruku._
import aruku.implicits._
import aruku.walks._
import org.apache.spark.graphx._
import org.apache.spark.graphx.utils._
val graph: Graph[Long, Int] = GraphGenerators
.logNormalGraph(sc, numVertices = 150000)
val numWalkers = 150000
val walkLength = 80
val p = 0.5
val q = 2
graph.randomWalk(edge => edge.attr.toDouble)
(Node2Vec.config(150000), Node2Vec.transition(0.5, 2, 80))
Ready to install?
libraryDependencies += "com.github.pierrenodet" %% "aruku-core" % "0.1.0"
Acknowledgement
This library is inspired by KnightKing [engine] and the [talk] of Min Shen at Spark Summit 2017.