UV-Net (review)

논문을 자세히 읽어보는 이유

현재, 하나의 부품/제품 Part를 나타내는 STEP-file들로 부터 공통적/반복적으로 나타나는 분분 형상을 검출하고자 한다. 현재로써 부분 형상을 검출은 부분 형상이 이루는 face들 검출을 통해 진행하려고 하는데, 이는 CAD에서 face segmentation하는 것 이라 볼 수 있다. UN-Net은 STEP-file을 face adjacency graph로 변환할 때, 어떤 정보들 embeding 하여 좋은 결과를 얻었는지 알아보고자 한다.

현재까지 진행했던 방식은 STEP-file에 있는 entity들을 바로 Graph \(G(V,E)\)의 node (vertex)로 변환한 후, face detection을 수행하고자 했다. Node에 넣어주었던 정보는 Face, Loop, Edge, Vertex를 나타내는 entity들의 타입이었다. 예를 들어 face는 STEP-file에서 'ADVANCED_FACE', loop는 'EDGE_LOOP', edge는 'EDGE_CURVE' 등의 타입으로 나타냈다. 하지만 이와 같은 표현으로는 labeling했던 face를 찾아낼 수 없었다. 아무래도 개별 node에 topology 정보(PLAN, CYLINDRICAL_SURFACE, CONICAL_SURFACE 등)가 없고 geometry 정도 또한 없었기 때문에 face를 찾아내기에는 너무 적은 정보가 들어있는 것으로 생각된다. 다음으로 생각했던 진행 방식은 STEP-file을 heterogeneous graph로 표현하여 부분 형상을 검출하는 것이었느나, 다음과 같은 두 가지 짚고 넘어갈 사항이 있었다. 첫째로 STEP-file을 heterogeneous graph로 나타내기 위한 효과적인 방법이 아직 불확실 하다는 것과 둘째로 UV-Net, Automate와 같은 연구에서 STEP-file을 homogeneous graph형태로 변환시켜 GNN을 적용했을 때 괜찮은 결과를 얻었다는 것이다.

UV-Net은 크게 3가지 실험(부품 classification, segmentation, self-supervised shape embeeddings)을 통해 저자들이 제시하는 UV-Net의 representation과 network architecture의 성능을 확인 했다. 세 실험 모두 들여다볼 필요가 있지만 나는 우선적으로 segmentation task를 확인하고자 한다. UV-Net은 STEP-file을 edge에서 1D convolution embedding, face에서 2D convolution embedding과정을 거쳐 face adjacency graph 형태로 변환한 후, GNN을 적용했다. 나 같은 경우, STEP-file을 sampling 과정 없이 graph 형태로 나타내어 GNN을 적용하려 한다는 점에서 차이가 있다. UN-Net 논문을 통해 STEP-file에서 어떤 entity들을 이용하여 적절한 homogeneous graph로 나타낼지 intuition을 얻고자 한다.

3. Method

3.1. Input representation

Topology

Face-adjacency graph를 사용

Graph \(G(V,E)\)가 있을 때, B-rep에서 face들를 \(V\), edge들을 \(E\)로 표시.
Face adjacency는 face와 edge를 통해 geometrically and topologically 풍부한 요소들을 잡을 수 있음.

Curve geometry

Parameter domain인 구간 \([u_{min}, u_{max}] \in \mathbb{R} \)에서 geometry domain \(\mathbb{R}^3\)로 사항(map)하는 parametric curve \(\mathbf{ C } (u)\)가 있다고 하자.

Curve는 line, circular arc, or B-spline으로 parameterize될 수 있음.

본 논문의 아이디어는 curve의 geometry를 parameter domain에서 \( \delta u = \frac{u_{max}-u_{min}}{M-1} \)인 regular 1D grid로 discretizing한 것. \(M\)은 sample size. (Figure 2(c))

Parameter domain의 각 discretized points \( u_k \)에, curve로부터 평가된 features의 set을 붙여줄 수 있음.
(e.g.: absolute point coordinates \( \mathbf{C}(u_k) \), optionally the unit tangent vector \( \hat{\mathbf{C}_u}(u_k) \))
이러한 1D UV-grid는 \(G\)의 input edge features set이 됨.

Surface geometry

B-rep에서 각 topological face는 plane, sphere, cylinder, cone, freeform NURBS surface가 될 수 있는 관련 surface geometry를 가짐
Surface들은 face의 경계를 따라 보여지는 부분만 잘려짐 (halfedge loops를 따라 잘림).
Parametric domain인 2D interval \( [u_{min},u_{max}] \times [v_{min},v_{max}] \in \mathbb{R}^2 \)에서 geometry domain \(\mathbb{R}^3\)로 사상(map)하는 parametric surface \(\mathbf{S}(u,v)\)가 있다고 하자.
Parameter domain에서 step size \( \delta u = \frac{u_{max}-u_{min}}{M-1} \), \( \delta v = \frac{v_{max}-v_{min}}{N-1} \) 인 sample의 regular 2D grid로 discretize함. \(M\), \(N\)은 각 차원에 대한 sample size.
구간 \([u_{min},u_{max}]\)와 \([v_{min},v_{max}]\)는 보이는 영역을 밀접하게 묶는 루프가 되도록 선택됨.
\((k, l)\)로 index된 위와 같은 grid point들에서, surface의 geometry를 channel 형태로 local feature들로 encoding해줌.
(1) 3D absolute point position \( \mathbf{S}(u_k, v_l) \)
(solid의 scale은 크기가 2이고 origin으로 중심이 맞춰진 cube로 normalize됨.)
(2) Optionally, the 3D absolute surface normal \( \frac{\mathbf{S}_u(u_k, v_l) \times \mathbf{S}_v(u_k, v_l)} {\Vert \mathbf{S}_u(u_k, v_l) \times \mathbf{S}_v(u_k, v_l) \Vert} \). 항상 바깥쪽을 향함.
(3) 각각 1 (visible region)과 0 (trimmed region)인 trimming mask 표현되는 samples
이러한 2D UV-grid는 G에서 input node features로써 정의됨.
- 모든 실험에서 \( M=N=10 \)으로 설정.

Advantages

(1) Set of parameters에서 curves/surfaces 평가하는 것이 primitive and spline surfaces에서 모두 빠름
(2) 표현법이 sparse하고 (적은 숫자로도 표현할 수 있다는 의미) curves와 surfaces 수에 따라 조절됨. (원문 문장이 좀 이상하게 쓰여진듯? ' The representation is sparse and scales with the number curves and surfaces in B-rep.')
(3) Grid는 정확한 매게변수화(parametrization)에 크게 영향받지 않음.
예를 들어 planar surface가 NURBS patch로 변환되어도 embedding 결과가 비슷. 반면에 raw curve/surface equation은 상당히 많이 변함.
(4) Parameter domain (UV-grids)에서 local neighborhood는 curve/surface geometry domain의 local neighborhoods와 일치함. 따라서, manifold에서 계층 특성(heirarchical feature) 추출이 가능.

3.2. Network archtecture

앞서 설명한 표현방식과 같이,
먼저 UV-grid curve와 surface에서 image convolutions을 수행.
이러한 local curve/surface feature들은 graph convolution들을 통해 전체 B-rep으로 전파됨(propagate). (Figure 3)

Curve & surface convolution

Surface CNN은 보통 4 또는 7 channels 2D UV-grids를 사용하고 아래와 같이 정의 됨.
\[ \text{Conv}(4/7, 64,3) \rightarrow \text{Conv}(64,128,3) \rightarrow \text{Conv}(128,256,3) \rightarrow \text{Pool}(1,1) \rightarrow \text{FC}(256,64) \] \( \text{Conv}(i,o,k) \): image convolutional layer. \(i\) input channels, \(o\) output channels, \(k\) kernel size
\( \text{Pool}(n,n) \): adaptive average pooling layer. \(n \times n\) feature map 출력
\( \text{FC}(i,o) \): fully connected layer. 입력 \(i\text{-D vector}\)을 출력 \(o\text{-D vector}\)로 사상(map).

Curve CNN은 1D UV-grid를 취하여 B-rep의 edge들에 놓인 curve들로부터 계산됨. 1D convolutional and pooling layer들로 비슷하게 정의됨.
B-rep에서 edge들과 surface들은 서로 weight를 공유함.

Message pasing

Curve와 surface CNN들의 출력은 hidden feature들로써 graph neural network의 입력 edge와 node feutures가 됨.
Initial feature들이 주어질 때, edge features \( h_{uv}^{k-1} \)에서 input node feature들을 conditioning하여 input node features \( h_v^{(k-1)} \) 을 one-hop neighborhood \( u \in N(v) \)로부터 aggregating 함으로써 graph layer \( k \in 1...K \)에서 hidden node features \( h_u^{(k)} \)를 계산함.
(위에서 conditioning 한다는 것은 아래 수식을 봤을 때, 'Hadamard product'를 하는 것을 의미하는 것 같다. Hardamard product \( \odot \)는 element-wise product라고 생각하면 된다.)
\begin{equation*} h_v^{(k)} = \phi^{(k)} \left( (1+\epsilon^{(k)}) h_v^{(k-1)} + \sum_{u \in N(v)} {f_\Theta ( h_{uv}^{(k-1)} ) \odot h_u^{(k-1)}} \right), \tag{1} \end{equation*} 여기서, \( \phi^{(k)} \)는 두개의 fully connected layers \( FC(64,64) \rightarrow FC(64,64) \)를 갖는 multi-layer perceptron (MLP), \( \epsilon^{(k)} \)는 center nodes를 neighbors로 부터 구분하기 위한 학습 가능한 parameter, \( f_\Theta \)는 edge에서 node feature space로의 linear projection. (자세한 설명은 아래의 '더 자세히 확인해볼 부분'에서 확인)

다음으로 Hidden edge feature들은 endpoint nodes의 feature들을 고려함으로써 비슷하게 update됨.
\begin{equation*} h_{uv}^{(k)} = \psi^{(k)} \left( (1+\gamma^{(r)}) h_{uv}^{(k-1)} + f_\Xi( h_u^{(k-1)} + h_v^{(k-1)} ) \right), \tag{2} \end{equation*} 여기서, \(\psi^{(k)}\)는 2-layer MLP, \( \gamma^{(k)} \)는 neighbor들로 부터 edge features를 구분하기 위한 학습가능 parameter, \(f_\Xi\)는 node에서 edge feature space로의 linear projection. (자세한 설명은 아래의 '더 자세히 확인해볼 부분'에서 확인)

끝으로, 모든 hidden node features \( \{ h_v^{(k)} | k \in 1...K \} \)를 얻고 모든 layer \( \{ h^{(k)} | k \in 1...K \} \)로부터 hierarchical graph-level feature vector들을 얻기위해 node들에 element-wise max=pooling operation을 적용함. 여기에서 \( h^{(k)} = maxpool_{v \in V}(h_v^{(k)}) \).
위와 같은 feature들은 128D vector들에 선형적으로 사영(projected)되고 최종 shape enbedding을 얻기위해 더해짐.
\begin{equation*} h_G = \sum_{k=1}^{K}w^{(k)} \cdot h^{(k)} + b^{(k)}. \tag{3} \end{equation*} 논문에서는 모든 실험에서 \( K = 2 \) graph layers를 사용.

4. Experiments

4.2. Tasks

4.2.1 Classification

논문은 B-rep에서 geometry와 topology 둘 다 사용했을 때 이점을 보이는데, 특히 geometric variance가 크지만 비슷한 topology를 가질 때 중요함.
Network는 Figure 3에 있는 것과 같이 128D shape embedding을 class logits으로 사상하는 UV-Net encoder network로 구성됨. UV-Net encoder network는 non-linear classifier(2-layer MLP)에 의해 뒤이어짐.

4.2.2 Segmentation

B-rep의 면 segmenting 문제에서 MFCAD와 ABC dataset을 실험에 사용.
ABC dataset에서 label들이 부족한 문제는 Autodesk Shape Manger를 이용. ExtrudeSide, ExtrudeEnd, Fillet과 같은 면을 생성.
각 node embeddings에 shape embedding을 concat. \( \rightarrow \) node별 logit 출력을 위해 non-linear classifier를 사용. 추가로 curve tangent와 surface normal 정보를 포함시킴.

Limitation

회전에 영향을 받는다. "UV-grid features are not rotation-invariant"

We also believe there is tremendous potential to improve our self-supervised method for transfer learning from large datasets like ABC.

더 자세히 확인해볼 부분

~~GNN으로 들어가기 전, discretized 된 형태는 어떻게 되어있는지?~~ 확인함
~~Face의 surface normal vector는 모든 sample 지점에서 얻는 것이 맞나?~~ 맞다
trimming mask는 면적 정보를 포함하는 셈인거 같다. 내가 하려는 방식은 면적 정보를 포함시키기가 쉽지 않은데 edge 정보를 더 풍부하게 넣어줘야 하나 싶다.

linear projection from edge to node feature space \( f_\Theta \)
linear projection from node to edge feature space \( f_\Xi \) 연산 방식

참고: 소스 코드를 통해 확인. edge_feats, node_feats, out_feats들은 feature demension(int)를 의미
\( f_\Theta \): nn.Linear(edge_feats, node_feats * out_feats) 사용.
\( f_\Xi \): nn.Linear(edge_feats, edge_feats)와 동일하게 사용

이 블로그 검색

방

UV-Net (review)

논문을 자세히 읽어보는 이유

3. Method

3.1. Input representation

3.2. Network archtecture

4. Experiments

4.2. Tasks

댓글

댓글 쓰기

이 블로그의 인기 게시물

로봇의 작업영역(Workspace)

Frenet-Serret formulas (프레네-세레 공식)

yes24 [뷰어 서버의 기본정보가 존재하지 않습니다.] 오류 해결