pyspark.sql.functions.mask#
- pyspark.sql.functions.mask(col, upperChar=None, lowerChar=None, digitChar=None, otherChar=None)[source]#
Masks the given string value. This can be useful for creating copies of tables with sensitive information removed.
New in version 3.5.0.
- Parameters
- col: :class:`~pyspark.sql.Column` or str
target column to compute on.
- upperChar: :class:`~pyspark.sql.Column` or str, optional
character to replace upper-case characters with. Specify NULL to retain original character.
- lowerChar: :class:`~pyspark.sql.Column` or str, optional
character to replace lower-case characters with. Specify NULL to retain original character.
- digitChar: :class:`~pyspark.sql.Column` or str, optional
character to replace digit characters with. Specify NULL to retain original character.
- otherChar: :class:`~pyspark.sql.Column` or str, optional
character to replace all other characters with. Specify NULL to retain original character.
- Returns
Examples
>>> df = spark.createDataFrame([("AbCD123-@$#",), ("abcd-EFGH-8765-4321",)], ['data']) >>> df.select(mask(df.data).alias('r')).collect() [Row(r='XxXXnnn-@$#'), Row(r='xxxx-XXXX-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y')).alias('r')).collect() [Row(r='YxYYnnn-@$#'), Row(r='xxxx-YYYY-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y'), lit('y')).alias('r')).collect() [Row(r='YyYYnnn-@$#'), Row(r='yyyy-YYYY-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y'), lit('y'), lit('d')).alias('r')).collect() [Row(r='YyYYddd-@$#'), Row(r='yyyy-YYYY-dddd-dddd')] >>> df.select(mask(df.data, lit('Y'), lit('y'), lit('d'), lit('*')).alias('r')).collect() [Row(r='YyYYddd****'), Row(r='yyyy*YYYY*dddd*dddd')]