BulkDeleteThe BulkDelete interface provides an API to perform bulk delete of files/objects in an object store or filesystem.
The API is designed to match the semantics of the AWS S3 Bulk Delete REST API call, but it is not exclusively restricted to this store. This is why the “provides no guarantees” restrictions do not state what the outcome will be when executed on other stores.
org.apache.hadoop.fs.BulkDeleteSourceThe interface BulkDeleteSource is offered by a FileSystem/FileContext class if it supports the API. The default implementation is implemented in base FileSystem class that returns an instance of org.apache.hadoop.fs.impl.DefaultBulkDeleteOperation. The default implementation details are provided in below sections.
@InterfaceAudience.Public
@InterfaceStability.Unstable
public interface BulkDeleteSource {
BulkDelete createBulkDelete(Path path)
throws UnsupportedOperationException, IllegalArgumentException, IOException;
}
org.apache.hadoop.fs.BulkDeleteThis is the bulk delete implementation returned by the createBulkDelete() call.
@InterfaceAudience.Public
@InterfaceStability.Unstable
public interface BulkDelete extends IOStatisticsSource, Closeable {
int pageSize();
Path basePath();
List<Map.Entry<Path, String>> bulkDelete(List<Path> paths)
throws IOException, IllegalArgumentException;
}
bulkDelete(paths)if length(paths) > pageSize: throw IllegalArgumentException
All paths which refer to files are removed from the set of files.
FS'Files = FS.Files - [paths]
No other restrictions are placed upon the outcome.
The BulkDeleteSource interface is exported by FileSystem and FileContext storage clients which is available for all FS via org.apache.hadoop.fs.impl.DefaultBulkDeleteSource. For integration in applications like Apache Iceberg to work seamlessly, all implementations of this interface MUST NOT reject the request but instead return a BulkDelete instance of size >= 1.
Use the PathCapabilities probe fs.capability.bulk.delete.
store.hasPathCapability(path, "fs.capability.bulk.delete")
The need for many libraries to compile against very old versions of Hadoop means that most of the cloud-first Filesystem API calls cannot be used except through reflection -And the more complicated The API and its data types are, The harder that reflection is to implement.
To assist this, the class org.apache.hadoop.io.wrappedio.WrappedIO has few methods which are intended to provide simple access to the API, especially through reflection.
public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; public static List<Map.Entry<Path, String>> bulkDelete(FileSystem fs, Path base, Collection<Path> paths);
The default implementation which will be used by all implementation of FileSystem of the BulkDelete interface is org.apache.hadoop.fs.impl.DefaultBulkDeleteOperation which fixes the page size to be 1 and calls FileSystem.delete(path, false) on the single path in the list.
The S3A implementation is org.apache.hadoop.fs.s3a.impl.BulkDeleteOperation which implements the multi object delete semantics of the AWS S3 API Bulk Delete For more details please refer to the S3A Performance documentation.