This is an experimental Swift library to show how to connect to a remote Apache Spark Connect Server and run SQL statements to manipulate remote data.
So far, this library project is tracking the upstream changes of Apache Arrow project's Swift-support.
- Apache Spark 4.0.0 (May 2025)
- Swift 6.0 (2024) or 6.1 (2025)
- gRPC Swift 2.2 (May 2025)
- gRPC Swift Protobuf 1.3 (May 2025)
- gRPC Swift NIO Transport 1.1 (May 2025)
- FlatBuffers v25.2.10 (February 2025)
- Apache Arrow Swift
Create a Swift project.
mkdir SparkConnectSwiftApp
cd SparkConnectSwiftApp
swift package init --name SparkConnectSwiftApp --type executable
Add SparkConnect
package to the dependency like the following
$ cat Package.swift
import PackageDescription
let package = Package(
name: "SparkConnectSwiftApp",
platforms: [
.macOS(.v15)
],
dependencies: [
.package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main")
],
targets: [
.executableTarget(
name: "SparkConnectSwiftApp",
dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")]
)
]
)
Use SparkSession
of SparkConnect
module in Swift.
$ cat Sources/main.swift
import SparkConnect
let spark = try await SparkSession.builder.getOrCreate()
print("Connected to Apache Spark \(await spark.version) Server")
let statements = [
"DROP TABLE IF EXISTS t",
"CREATE TABLE IF NOT EXISTS t(a INT) USING ORC",
"INSERT INTO t VALUES (1), (2), (3)",
]
for s in statements {
print("EXECUTE: \(s)")
_ = try await spark.sql(s).count()
}
print("SELECT * FROM t")
try await spark.sql("SELECT * FROM t").cache().show()
try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc")
try await spark.read.orc("/tmp/orc").show()
await spark.stop()
Run your Swift application.
$ swift run
...
Connected to Apache Spark 4.0.0 Server
EXECUTE: DROP TABLE IF EXISTS t
EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT)
EXECUTE: INSERT INTO t VALUES (1), (2), (3)
SELECT * FROM t
+---+
| a |
+---+
| 2 |
| 1 |
| 3 |
+---+
+----+
| id |
+----+
| 2 |
| 6 |
| 0 |
| 8 |
| 4 |
+----+
You can find more complete examples including Spark SQL REPL
, Web Server
and Streaming
applications in the Examples directory.
This library also supports SPARK_REMOTE
environment variable to specify the Spark Connect connection string in order to provide more options.