Read files created by the stream
stream_read_csv
Description
Read files created by the stream
Usage
stream_read_csv(
sc,
path,
name = NULL,
header = TRUE,
columns = NULL,
delimiter = ",",
quote = "\"",
escape = "\\",
charset = "UTF-8",
null_value = NULL,
options = list(),
...
)
stream_read_text(sc, path, name = NULL, options = list(), ...)
stream_read_json(sc, path, name = NULL, columns = NULL, options = list(), ...)
stream_read_parquet(
sc,
path,
name = NULL,
columns = NULL,
options = list(),
...
)
stream_read_orc(sc, path, name = NULL, columns = NULL, options = list(), ...)
stream_read_kafka(sc, name = NULL, options = list(), ...)
stream_read_socket(sc, name = NULL, columns = NULL, options = list(), ...)
stream_read_delta(sc, path, name = NULL, options = list(), ...)
stream_read_cloudfiles(sc, path, name = NULL, options = list(), ...)
stream_read_table(sc, path, name = NULL, options = list(), ...)Arguments
| Arguments | Description |
|---|---|
| sc | A spark_connection. |
| path | The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://" and "file://" protocols. |
| name | The name to assign to the newly generated stream. |
| header | Boolean; should the first row of data be used as a header? Defaults to TRUE. |
| columns | A vector of column names or a named vector of column types. If specified, the elements can be "binary" for BinaryType, "boolean" for BooleanType, "byte" for ByteType, "integer" for IntegerType, "integer64" for LongType, "double" for DoubleType, "character" for StringType, "timestamp" for TimestampType and "date" for DateType. |
| delimiter | The character used to delimit each column. Defaults to ','. |
| quote | The character used as a quote. Defaults to '"'. |
| escape | The character used to escape other characters. Defaults to '\'. |
| charset | The character set. Defaults to "UTF-8". |
| null_value | The character to use for null, or missing, values. Defaults to NULL. |
| options | A list of strings with additional options. |
| … | Optional arguments; currently unused. |
Examples
sc <- spark_connect(master = "local")
dir.create("csv-in")
write.csv(iris, "csv-in/data.csv", row.names = FALSE)
csv_path <- file.path("file://", getwd(), "csv-in")
stream <- stream_read_csv(sc, csv_path) %>% stream_write_csv("csv-out")
stream_stop(stream)