pyspark.pandas.window.Expanding.quantile#
- Expanding.quantile(quantile, accuracy=10000)[source]#
Calculate the expanding quantile of the values.
- Parameters
- quantilefloat
Value between 0 and 1 providing the quantile to compute.
- accuracyint, optional
Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy. This is a panda-on-Spark specific parameter.
- Returns
- Series or DataFrame
Returned object type is determined by the caller of the expanding calculation.
See also
pyspark.pandas.Series.expanding
Calling expanding with Series data.
pyspark.pandas.DataFrame.expanding
Calling expanding with DataFrames.
pyspark.pandas.Series.quantile
Aggregating quantile for Series.
pyspark.pandas.DataFrame.quantile
Aggregating quantile for DataFrame.
Notes
quantile in pandas-on-Spark are using distributed percentile approximation algorithm unlike pandas, the result might be different with pandas (the result is similar to the interpolation set to lower), also interpolation parameter is not supported yet.
the current implementation of this API uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.
Examples
The below examples will show expanding quantile calculations with window sizes of two and three, respectively.
>>> s = ps.Series([1, 2, 3, 4]) >>> s.expanding(2).quantile(0.5) 0 NaN 1 1.0 2 2.0 3 2.0 dtype: float64
>>> s.expanding(3).quantile(0.5) 0 NaN 1 NaN 2 2.0 3 2.0 dtype: float64