Spark SQL Join

Date Tags Spark

Join in Hive

Common Join

在Hive查询的性能调优期间,需要注意的一个方面是执行期间的join的类型。 Common Join是Hive中的默认join类型,也称为Shuffle …

more ...

Spark: Shuffling and Partitioning

Date Tags Spark

Shuffling

org.apache.spark.rdd.RDD[(String, Int)]= ShuffledRDD[366]

Think again what happens when you have to do a groupBy or a groupByKey. Remember our data is distributed! Did you notice anything odd?

val pairs = sc.parallelize(List((1, "one"), (2, "two"), (3, "three")))
pairs.groupByKey()

// res2: org.apache …
more ...

Debugging Spark Application

Date Tags Spark
content
    select
        *
    from
        table1 as A
    join
        table as B
    on
        A.item_id = B.item_id
    where
        A.id in (1139426, 1139436)
        and A.date >= '2018-12-01'
    

    yarn logs -applicationId <app ID> > output_file

    2019-02-21 19:23:41 ERROR ApplicationMaster:91 - User class threw exception: org.apache.spark …
    more ...

    Spark RDD

    Date Tags Spark
    Spark RDD要点总结:
    Spark RDD弹性分布式数据集
    1. RDD简介
      - RDD的概述
      - RDD的属性
    2. RDD的创建方式
      - 从文件系统中加载数据创建RDD
      - 通过 …
    more ...