GRAPH ATTENTION NETWORKS

official repo: https://github.com/PetarV-/GAT

reference: https://zhuanlan.zhihu.com/p/34232818

  • 归纳学习(Inductive Learning):先从训练样本中学习到一定的模式,然后利用其对测试样本进行预测(即首先从特殊到一般,然后再从一般到特殊),这类模型如常见的贝叶斯模型。
  • 转导学习(Transductive Learning):先观察特定的训练样本,然后对特定的测试样本做出预测(从特殊到特殊),这类模型如k近邻、SVM等。

GRAPH ATTENTION NETWORKS

  1. 动机

    • task:node classification
    • 在GCN基础上引入 masked self-attentional layers
    • specify different weights to different nodes in a neighborhood,感觉是用attention矩阵替换邻接矩阵?
  2. 论点

    • the attention architecture properties
      • parallelizable,计算高效
      • can be applied to graph nodes having different degrees,这个邻接矩阵也可以啊
      • directly applicable to inductive learning problems,是说原始GCN那种semi-supervised场景吗
      • 感觉后面两点有点牵强
    • GCN
      • 可以避免复杂的矩阵运算
      • 但是依赖固定图结构,不能直接用于其他图
  3. methods

    • graph attentional layer

      • input:node features

        • N个节点,F-dim representation
        • $h=\{\overrightarrow {h_1},\overrightarrow {h_2},…,\overrightarrow {h_N} \}$,$\overrightarrow {h_i} \in R^F$
      • output:a new set of node features

        • $h=\{\overrightarrow {h_1^{‘}},\overrightarrow {h_2^{‘}},…,\overrightarrow {h_N^{‘}} \}$,$\overrightarrow {h_i} \in R^{F^{‘}}$
      • a weight matrix

        • $W \in R^{F \times F^{‘}}$
        • applied to every node
      • then self-attention

        • compute attention coefficients:$e_{ij} = a(W\overrightarrow {h_i},W\overrightarrow {h_j})$

          • attention mechanism a:是一个single-layer feedforward neural network + LeakyReLU(0.2)
          • weight vector $\in R^{2F^{‘}}$
        • softmax norm

        • overall expression

          • 两个feature vector concat到一起
          • 然后全连接层+LeakyReLU
          • 然后softmax

        • 表达的是节点j对节点i的重要性

        • masked attention:inject graph structure,只计算节点i的neighborhood的importance

        • neighborhood:the first-order neighbors

        • 加权和 + nonlinearity

        • multi-head attention:

          • trainable weights有多组,一个节点与其neighborhood的attention coefficients有多组
          • 最后每组weights计算出那个new node feature(加权平均+nonlinear unit),可以选择concat/avg,作为最终输出
          • concat

          • 如果是网络最后一层的MHA layer,先avg,再非线性激活函数:

        • overall

      • comparisons to related work

        • our proposed GAT layer directly address several issues that were present in prior approaches

          • computationally efficient,说是比矩阵操作高效,这个不懂
          • assign different importance to nodes of a neighborhood,这个GCN with tranable adjacent matrix不也是一样性质的吗,不懂
          • enable 有向图
          • enable inductive learning,可以被直接用于解决归纳学习问题,即可以对从未见过的图结构进行处理,为啥可以不懂
        • 数学表达上看,attention和adjacent matrix本质上都是用来建模graph edges的

          • adj-trainable GCN:dag paper里面那种,adjacent matrix本身就是一个可训练变量(N,N),随着训练更新参数
          • GAT:attention的更新引入了新的线性层权重 $ W \in R^{2F^{‘}}$