verseg

challenge

Large Scale Vertebrae Segmentation Challenge
- task1:Vertebra Labelling，关键点检测
- task2:Vertebra Segmentation，多类别分割
data
1. variation：数据affine轴不统一，尺寸不统一，扫描范围不统一，FOV区域不统一
2. nii的两大解析工具：nibabel库load data的xyz顺序与axcode的顺序一致，e.g.[‘R’,’A’,’S’]的orientation会得到xyz的array，而sitk的读取刚好反过来，sitk的arr会是zyx。我们之前在将dicom写入nii时，会指定一个不为np.eye(4)的affine，就是为了transpose这三个轴。
model
1. team paper \
  - 三阶段：第一阶段，due to large variation FOV of the dataset，粗分割定位脊柱位置，第二阶段，higher resolution多类别关键点定位center，获得each located vertebra，第三阶段，二类分割for each located vertebra。
  - keywords：1. uniform voxel spacing：不要随意resize，todo: trilinear interp；2. on-the-fly data augmentation：using SimpleITK
  - 第一阶段：Spine Localization
    - Unet
    - regress the Gaussian heatmap of spinal centerline
    - L2-loss
    - uniform voxel spacing of 8mm
    - input shape：[64,64,128]，pad？
  - 第二阶段：Vertebrae Localization
    - SpatialConfiguration-Net
    - regress each located vertebra‘s heatmap in individual channel
    - resampling：bi/tricubic interpolation
    - norm：maxmin on the whole dataset
    - uniform voxel spacing of 2mm
    - input shape：[96,96,128]，z-axis random crop，xy-plane use ROI from stage1
  - 第三阶段：Vertebrae Segmentation
    - Unet
    - binary segment the mask of each vertebrae
    - sigmoid ce-loss
    - uniform voxel spacing of 1mm
    - input shape：[128,128,96]，crop origin image & heatmap image based on centroids
2. reference paper\
  - 核心贡献：1.MIP：combines the information across reformations，3D to 2D，2. 基于判别器的训练机制：encodes local spine structure as an anatomical prior，加固椎块间类别&位置的spacial information
  - MIP：
    - localisation and identification rely on a large context
    - large receptive field
    - in full-body scans where spine is not spatially centred or is obstructed by the ribcage, such cases are handled with a pre-processing stage detecting the occluded spine
  - adversarial learning：
    - FCN用于分割
    - AE用于评估分割的好坏
    - do not ‘pre-train’ it (the AE)
    - loss：an anatomically-inspired supervision instead of the usual binary adversarial supervision (vanilla GAN)
  - 先说FCN——Btrfly Network
    - 建模成回归问题，每个关键点对应一个通道的高斯heatmap，背景channel为$1-max_i (y_i)$
    - 双输入双输出（sagittal & coronal）
    - 两个视角的feature map在网络深层做了融合，to learn their inter-dependency
    - Batch- normalisation is used after every convolution layer, along with 20% dropout in the fused layers of Btrfly
    - loss：l2 distance + weighted ce
      $L_{sag} = ||Y_{sag} - \hat{Y}_{sag}||^2 + \omega CE(softmax(Y_{sag}, softmax(\hat{Y}_{sag}))$
      $\omega$ is the median frequency weighing map, boosting the learning of less frequent classes(ECB)
  - 再说判别器——Energy-based adversary for encoding prior
    - fully-convolutional：its predictions across voxels are independent of each other owing to the spatial invariance of convolutions
    - to impose the anatomical prior of the spine’s shape onto the Btrfly net
    - look at $\hat{Y}_{sag}$ and $\hat{Y}_{cor}$ as a 3D volume and employ a 3D AE with a receptive field covering a part of the spine
    - $\hat{Y}_{sag}$ consists of Gaussians：less informative than an image, avoid using max-pooling by resorting to average pooling
    - employ spatially dilated convolution kernels
    - mission of AE：predict the l2 distance of input and its reconstruction, it learns to discriminate by predicting a low E for real annotations, while G learns to generate annotations that would trick D
      $L = D(Y_x) + max(0, m-D(Y_g))\\ L_G = D(Y_g) + L_{fcn}$
  - inference：
    - The values below a threshold (T) are ignored in order to remove noisy predictions
    - 用外积，$\hat{Y}=\hat{Y}_{sag}\otimes\hat{Y}_{cor}$
    - 每个channel的最大值作为centroids
  - experiments
    - 【IMPORTANT】10 MIPs are obtained from one 3D scan per view, each time randomly choosing half the slices of interest
    - 对于每个视角，每次随机抽取一半数目的slice用于计算MIP

similar local appearance：
strong spatial configuration：凡是涉及到椎块-wise的信息，从全局信息入手