Skip to main content

Configuration

Walker

A walker is a simple scala case class that contains an id, a step count, and data :

case class Walker[T](
id: Long,
step: Long,
data: T
)

But you can't make one by yourself; they are automatically generated by aruku thanks to the walker configuration.

Walker Configuration

The walker configuration defines how many walkers will be doing random walks, how many epochs, how a walker is initialized, how it's updated, and how it's affected to a vertice.

modules/aruku/WalkerConfig.scala
case class WalkerConfig[T](
numWalkers: Long,
numEpochs: Int,
parallelism: Int,
init: VertexId => T,
update: (Walker[T], VertexId, Edge[Double]) => T,
start: StartingStrategy
)

It can be constant, as the walker is not updated at every step. DeepWalk is one example of such a walker configuration :

modules/aruku/WalkerConfig.scala
object WalkerConfig {

def constant[T](
numWalkers: Long,
numEpochs: Int,
parallelism: Int,
init: VertexId => T,
start: StartingStrategy
)

}

Or dynamic if we need to update the data of the walker at every step.

modules/aruku/WalkerConfig.scala
object WalkerConfig {

def updating[T](
numWalkers: Long,
numEpochs: Int,
parallelism: Int,
init: VertexId => T,
update: (Walker[T], VertexId, Edge[Double]) => T,
start: StartingStrategy
)

}

Walker Engine Configuration

You can configure the engine itself at the moment with the "spark.graphx.pregel.checkpointInterval" to break the lineage of the random walk if you are doing a lot of steps.