Cache Levels and Replicas
Spark can cache data implicitly or explicitly as it runs a pipeline. Implicit caching controlled by Spark, such as when Spark performs shuffle operations, is stored in the Spark default location. At this time, the default location is memory and disk.
Explicit caching occurs when an origin or processor stage has the Cache Data stage property enabled.
The following pipeline properties define how Spark handles explicit caching:
- Cache Levels property
-
Use the Cache Levels advanced pipeline property to configure how data is cached for a pipeline. The Cache Levels property provides the following caching levels:
- None
- Disk only
- Memory only
- Memory only with serialization
- Memory and disk
- Memory and disk with serialization
- Off heap
- Cache Replicas property
- Use the Cache Replicas advanced pipeline property to determine how many replicas of the cache are kept.