| randomSplit {SparkR} | R Documentation |
Return a list of randomly split dataframes with the provided weights.
randomSplit(x, weights, seed) ## S4 method for signature 'SparkDataFrame,numeric' randomSplit(x, weights, seed)
x |
A SparkDataFrame |
weights |
A vector of weights for splits, will be normalized if they don't sum to 1 |
seed |
A seed to use for random split |
randomSplit since 2.0.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, arrange,
as.data.frame, attach,
cache, coalesce,
collect, colnames,
coltypes,
createOrReplaceTempView,
crossJoin, dapplyCollect,
dapply, describe,
dim, distinct,
dropDuplicates, dropna,
drop, dtypes,
except, explain,
filter, first,
gapplyCollect, gapply,
getNumPartitions, group_by,
head, histogram,
insertInto, intersect,
isLocal, join,
limit, merge,
mutate, ncol,
nrow, persist,
printSchema, rbind,
registerTempTable, rename,
repartition, sample,
saveAsTable, schema,
selectExpr, select,
showDF, show,
storageLevel, str,
subset, take,
union, unpersist,
withColumn, with,
write.df, write.jdbc,
write.json, write.orc,
write.parquet, write.text
## Not run:
##D sparkR.session()
##D df <- createDataFrame(data.frame(id = 1:1000))
##D df_list <- randomSplit(df, c(2, 3, 5), 0)
##D # df_list contains 3 SparkDataFrames with each having about 200, 300 and 500 rows respectively
##D sapply(df_list, count)
## End(Not run)