clear query| facets| time Search criteria: .   Results from 1 to 10 from 16 (0.0s).
Loading phrases to help you
refine your search...
[Spark RDD] Persisting Spark RDDs across spark contexts/applications - options - Spark - [mail # user]
...Hi Boris,This is actually why Alluxio (by-then Tachyon) was created initially inAMPLab.Checkout the documentationhttps://docs.alluxio.io/os/user/stable/en/compute/Spark.html on persistingRDD...
   Author: Bin Fan , 2020-06-04, 17:43
Spark dataframe hdfs vs s3 - Spark - [mail # user]
...Try to deploy Alluxio as a caching layer on top of S3, providing Spark asimilar HDFS interface?Like in this article:https://www.alluxio.io/blog/accelerate-spark-and-hive-jobs-on-aws-s3-by-10...
   Author: Bin Fan , 2020-05-29, 18:09
What is directory "/path/_spark_metadata" for? - Spark - [mail # user]
...Hey Mark,I believe this is the name of the subdirectory that is used to storemetadata about which files are valid, see comment in codehttps://github.com/apache/spark/blob/v2.3.0/sql/core/src...
   Author: Bin Fan , 2019-11-11, 23:44
Low cache hit ratio when running Spark on Alluxio - Spark - [mail # user]
...Depending on the Alluxio version you are running, e..g, for 2.0, themetrics of the local short-circuit read is not turned on by default.So I would suggest you to first turn on the metrics co...
   Author: Bin Fan , 2019-09-19, 18:03
Can I set the Alluxio WriteType in Spark applications? - Spark - [mail # user]
...Hi Mark,You can follow the instructions here:https://docs.alluxio.io/os/user/stable/en/compute/Spark.html#customize-alluxio-user-properties-for-individual-spark-jobsSomething like this:$ spa...
   Author: Bin Fan , 2019-09-19, 17:43
How to fix ClosedChannelException - Spark - [mail # user]
...HiThis *java.nio.channels.ClosedChannelException* is often caused by aconnection timeoutbetween your Spark executors and Alluxio workers.One simple and quick fix is to increase the timeout v...
   Author: Bin Fan , 2019-05-17, 05:27
How to configure alluxio cluster with spark in yarn - Spark - [mail # user]
...hi AndyAssuming you are running Spark with YARN, then I would recommend deployingAlluxio in the same YARN cluster if you are looking for best performance.Alluxio can also be deployed separat...
   Author: Bin Fan , 2019-05-17, 00:28
cache table vs. parquet table performance - Spark - [mail # user]
...Hi Tomas,One option is to cache your table as Parquet files into Alluxio (which canserve as an in-memory distributed caching layer for Spark in your case).The code on Spark will be like> ...
   Author: Bin Fan , 2019-04-18, 05:34
[grpc-io] Migrating RPC framework from Thrift to gRPC - gRPC - [mail # user]
...Hi,Two of core developers of Alluxio  open source project recently posted a tech article documenting our RPC framework migration from Apache thrift to gRPC, including the motivation, le...
   Author: Bin Fan , 2019-04-16, 01:57
[expand - 1 more] - How shall I configure the Spark executor memory size and the Alluxio worker memory size on a machine? - Spark - [mail # user]
...oops, sorry for the confusion. I mean "20% of the size of your input dataset" allocated to Alluxio as memory resource as the starting point.after that, you can checkout the cache hit ratio i...
   Author: Bin Fan , 2019-04-05, 05:27