2025-12-23

Claude Code GitHub Actionsを利用してテストコードを自動生成する

ZOZO Advent Calendar 2025 23日目の記事になります。今回はClaude CodeとGithub Actionsを連携し、Actions内でテストを行う際のテストデータを自動生成してみます。

前準備
検証用のコードを作成する
- APIサーバーの作成
- テストコードの作成
テストの自動作成
まとめ

前準備

まずはClaude CodeとGithub Actionsの連携から始めます。連携手順は下記ドキュメントにまとまっているのでドキュメントに従って導入します。
code.claude.com

Claude Codeを連携したいリポジトリに対して/install-github-appコマンドを実行します。コマンドの案内に従って導入を進めると下記ログが出力され、セットアップが完了します。

> /install-github-app 
  ⎿  GitHub Actions setup complete!

/install-github-appコマンドの連携が完了したら試しに下記jobをActions上で実行してみます。

name: Auto Update Readme

on:
  pull_request:
    branches:
      - 'main'
    types: [opened, synchronize, closed]

permissions:
  contents: write
  pull-requests: write
  id-token: write
jobs:
  create-pr:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
        with:
          ref: ${{ github.head_ref }}

      - uses: anthropics/claude-code-action@v1
        with:
          prompt: |
            以下のタスクを実行してください：
            - 現在のブランチに切り替えてください。
            - README.md ファイルに "This is an auto-generated README file." という一文を追加してください。すでに追加されている場合は何もしないでください。
            - 変更を加えたら現在のブランチにコミットしてください。
            - 変更をプッシュしてください。
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          github_token: ${{ secrets.GITHUB_TOKEN }}
          claude_args: "--allowedTools Edit,Write,Read,Bash"
          show_full_output: true

上記jobの実行が完了するとREADME.md内にThis is an auto-generated README file.という文字を追加したcommitがbranchにpushされています！

検証用のコードを作成する

APIサーバーの作成

Claude CodeとGithub Actionsの連携を確認できたので、次はテスト対象のAPIサーバーを作成します。ユーザー作成が行える/usersエンドポイントを作成します。今回はテストなのでid: 1のユーザーを固定で返すようにします。

main.py

from fastapi import FastAPI
from pydantic import BaseModel, Field

app = FastAPI()

class User(BaseModel):
    name: str = Field(..., min_length=1, max_length=100)
    email: str
    age: int = Field(..., ge=0, le=150)

class UserResponse(BaseModel):
    id: int
    name: str
    email: str

@app.post("/users", response_model=UserResponse)
async def create_user(user: User):
    return {"id": 1, "name": user.name, "email": user.email}

APIの実行例は下記になります。

テストコードの作成

次にベースとなるテストコードを作成します。先程実行例でPOSTしたパラメーターを正常系、それ以外でBaseModelで定義した条件に違反するパラメーターを異常系としてテスト追加します。

main_test.py

import pytest
from httpx import AsyncClient, ASGITransport
from main import app


@pytest.fixture
def anyio_backend():
    return "asyncio"


@pytest.fixture
async def client():
    transport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as ac:
        yield ac

@pytest.mark.anyio
async def test_create_user(client):
    user_data = {
        "name": "山田太郎",
        "email": "yamada@example.com",
        "age": 30
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 200
    data = response.json()
    assert data["name"] == "山田太郎"
    assert data["email"] == "yamada@example.com"


@pytest.mark.anyio
async def test_create_user_invalid_name(client):
    user_data = {
        "name": "",  
        "email": "test@example.com",
        "age": 25
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422  # Validation Error

@pytest.mark.anyio
async def test_create_user_invalid_age(client):
    user_data = {
        "name": "テスト",
        "email": "test@example.com",
        "age": 500
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422

@pytest.mark.anyio
async def test_create_user_missing_field(client):
    user_data = {
        "name": "テスト"
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422

テストの自動作成

ベースとなるコードが出来上がったのでjobの挙動を変更してテストを作成するように指示してみます。

name: create tests

on:
  pull_request:
    branches:
      - 'main'
    types: [opened, synchronize, closed]

permissions:
  contents: write
  pull-requests: write
  id-token: write
jobs:
  create-pr:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
        with:
          ref: ${{ github.head_ref }}

      - uses: anthropics/claude-code-action@v1
        with:
          prompt: |
            以下のタスクを実行してください：
            - 現在のブランチに切り替えてください。
            - main.pyで作成しているRouteに対応するテストコードをtest_main.pyに追加してください。正常系と異常系の両方のテストケースを考慮してください。
            - 追加したテストコードが正しく動作することを確認してください。
            - 変更内容を説明するコミットメッセージを作成してください。
            - 変更を加えたら現在のブランチにコミットしてください。
            - 変更をプッシュしてください。
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          github_token: ${{ secrets.GITHUB_TOKEN }}
          claude_args: "--allowedTools Edit,Write,Read,Bash"
          show_full_output: true

main.pyに新しくusers/1でユーザー情報が取得できるエンドポイントを追加します。

main_test.py

from fastapi import FastAPI
from pydantic import BaseModel, Field

app = FastAPI()

class User(BaseModel):
    name: str = Field(..., min_length=1, max_length=100)
    email: str
    age: int = Field(..., ge=0, le=150)

class UserResponse(BaseModel):
    id: int
    name: str
    email: str

@app.post("/users", response_model=UserResponse)
async def create_user(user: User):
    return {"id": 1, "name": user.name, "email": user.email}

# 新しく追加したエンドポイント。
@app.get("/users/{user_id}", response_model=UserResponse)
async def get_user(user_id: int):
    return {"id": user_id, "name": "田中", "email": "tanaka@example.com"}

上記コードをPushするとJobが動きます。実行されたJobのログを確認するとusers/{user_id}のテストが追加されています。

追加されたテストコードは下記になります。

main_test.py(コードが長くなったので折りたたんでます)

import pytest
from httpx import AsyncClient, ASGITransport
from main import app


@pytest.fixture
def anyio_backend():
    return "asyncio"


@pytest.fixture
async def client():
    transport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as ac:
        yield ac

@pytest.mark.anyio
async def test_create_user(client):
    user_data = {
        "name": "山田太郎",
        "email": "yamada@example.com",
        "age": 30
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 200
    data = response.json()
    assert data["name"] == "山田太郎"
    assert data["email"] == "yamada@example.com"


@pytest.mark.anyio
async def test_create_user_invalid_name(client):
    user_data = {
        "name": "",  
        "email": "test@example.com",
        "age": 25
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422  # Validation Error

@pytest.mark.anyio
async def test_create_user_invalid_age(client):
    user_data = {
        "name": "テスト",
        "email": "test@example.com",
        "age": 500
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422

@pytest.mark.anyio
async def test_create_user_missing_field(client):
    user_data = {
        "name": "テスト"
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


# 正常系：境界値テスト
@pytest.mark.anyio
async def test_create_user_min_name_length(client):
    """名前の最小長（1文字）のテスト"""
    user_data = {
        "name": "A",
        "email": "a@example.com",
        "age": 0
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 200
    data = response.json()
    assert data["name"] == "A"
    assert data["email"] == "a@example.com"


@pytest.mark.anyio
async def test_create_user_max_name_length(client):
    """名前の最大長（100文字）のテスト"""
    long_name = "A" * 100
    user_data = {
        "name": long_name,
        "email": "long@example.com",
        "age": 150
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 200
    data = response.json()
    assert data["name"] == long_name


@pytest.mark.anyio
async def test_create_user_min_age(client):
    """年齢の最小値（0歳）のテスト"""
    user_data = {
        "name": "赤ちゃん",
        "email": "baby@example.com",
        "age": 0
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 200
    data = response.json()
    assert data["name"] == "赤ちゃん"


@pytest.mark.anyio
async def test_create_user_max_age(client):
    """年齢の最大値（150歳）のテスト"""
    user_data = {
        "name": "最高齢者",
        "email": "oldest@example.com",
        "age": 150
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 200
    data = response.json()
    assert data["name"] == "最高齢者"


# 異常系：境界値超過
@pytest.mark.anyio
async def test_create_user_name_too_long(client):
    """名前が最大長（100文字）を超える場合"""
    too_long_name = "A" * 101
    user_data = {
        "name": too_long_name,
        "email": "toolong@example.com",
        "age": 30
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


@pytest.mark.anyio
async def test_create_user_negative_age(client):
    """年齢が負の値の場合"""
    user_data = {
        "name": "テスト",
        "email": "negative@example.com",
        "age": -1
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


@pytest.mark.anyio
async def test_create_user_age_exceeds_max(client):
    """年齢が最大値（150歳）を超える場合"""
    user_data = {
        "name": "テスト",
        "email": "tooold@example.com",
        "age": 151
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


# 異常系：データ型エラー
@pytest.mark.anyio
async def test_create_user_invalid_age_type(client):
    """年齢が文字列の場合"""
    user_data = {
        "name": "テスト",
        "email": "test@example.com",
        "age": "thirty"
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


@pytest.mark.anyio
async def test_create_user_invalid_name_type(client):
    """名前が数値の場合"""
    user_data = {
        "name": 123,
        "email": "test@example.com",
        "age": 30
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


@pytest.mark.anyio
async def test_create_user_missing_all_fields(client):
    """全フィールドが欠けている場合"""
    user_data = {}
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


@pytest.mark.anyio
async def test_create_user_missing_email(client):
    """emailフィールドが欠けている場合"""
    user_data = {
        "name": "テスト",
        "age": 30
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


@pytest.mark.anyio
async def test_create_user_missing_age(client):
    """ageフィールドが欠けている場合"""
    user_data = {
        "name": "テスト",
        "email": "test@example.com"
    }
    response = await client.post("/users", json=user_data)
    assert response.status_code == 422


# GET /users/{user_id} のテスト
# 正常系
@pytest.mark.anyio
async def test_get_user(client):
    """ユーザーIDを指定してユーザー情報を取得する正常系テスト"""
    user_id = 1
    response = await client.get(f"/users/{user_id}")
    assert response.status_code == 200
    data = response.json()
    assert data["id"] == user_id
    assert data["name"] == "田中"
    assert data["email"] == "tanaka@example.com"


@pytest.mark.anyio
async def test_get_user_with_large_id(client):
    """大きなユーザーIDでユーザー情報を取得"""
    user_id = 999999
    response = await client.get(f"/users/{user_id}")
    assert response.status_code == 200
    data = response.json()
    assert data["id"] == user_id
    assert data["name"] == "田中"
    assert data["email"] == "tanaka@example.com"


@pytest.mark.anyio
async def test_get_user_with_zero_id(client):
    """ユーザーID=0でユーザー情報を取得"""
    user_id = 0
    response = await client.get(f"/users/{user_id}")
    assert response.status_code == 200
    data = response.json()
    assert data["id"] == user_id


# 異常系
@pytest.mark.anyio
async def test_get_user_with_negative_id(client):
    """負のユーザーIDでリクエスト（FastAPIは負のintも受け入れる）"""
    user_id = -1
    response = await client.get(f"/users/{user_id}")
    assert response.status_code == 200
    data = response.json()
    assert data["id"] == user_id


@pytest.mark.anyio
async def test_get_user_with_invalid_id_type(client):
    """文字列のユーザーIDでリクエスト"""
    response = await client.get("/users/invalid")
    assert response.status_code == 422


@pytest.mark.anyio
async def test_get_user_with_float_id(client):
    """小数のユーザーIDでリクエスト"""
    response = await client.get("/users/1.5")
    assert response.status_code == 422


@pytest.mark.anyio
async def test_get_user_without_id(client):
    """ユーザーIDなしでリクエスト（パスが異なる）"""
    response = await client.get("/users/")
    assert response.status_code == 307

まとめ

Claude CodeとGithub Actionsを連携して自動的にテストを作成してくれるような機能を作成してみました。型やFastAPIで定義したValidation条件に対して自動でテストを作成してくれたのでテストを作成する手間が省けました。よりビジネスの要件に沿ったテストをしたい場合は与えるプロンプトを工夫する必要があるかなと感じました。また、テスト作成以外にもREADMEの自動生成やコーディングチェック等色々利用できそうだなと思いました。

2025-12-16

net/http/httptestパッケージを利用してE2Eテストを行う

ZOZO Advent Calendar 2025 16日目の記事になります。今回はnet/http/httptestパッケージを利用してAPIサーバーを構築し、E2Eテストを行うコードを作成します。

pkg.go.dev

前準備
- 利用言語
- テスト対象のAPIサーバー作成
テストコードの作成
まとめ

前準備

まずは言語のバージョンとテスト対象のAPIサーバーを作成します

利用言語

go 1.25.4

テスト対象のAPIサーバー作成

下記JSONパラメーターでGETを行うとユーザー情報が取得できるようなAPIサーバーを作成します。

# 送るjsonパラメーター
{
    "id": 1
}

# 返ってくるjson
{
    "id": 1,
    "name": "Alice",
    "email": "example@example.com"
}

ロジックを下記コードで実装しました。

package main

import (
    "encoding/json"
    "net/http"
)

type UserIDRequest struct {
    ID int `json:"id"`
}

type User struct {
    ID    int    `json:"id"`
    Name  string `json:"name"`
    Email string `json:"email"`
}

var users = []User{
    {ID: 1, Name: "Alice", Email: "example@example.com"},
    {ID: 2, Name: "Bob", Email: "bob@example.com"},
    {ID: 3, Name: "Charlie", Email: "charlie@example.com"},
}

func userHandler(w http.ResponseWriter, r *http.Request) {
    var req UserIDRequest
    if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
        w.Header().Set("Content-Type", "application/json")
        w.WriteHeader(400)
        json.NewEncoder(w).Encode(map[string]string{"message": "User Handler"})
        return
    }
    for _, user := range users {
        if user.ID == req.ID {
            w.Header().Set("Content-Type", "application/json")
            w.WriteHeader(200)
            json.NewEncoder(w).Encode(user)
            return
        }
    }
    // User not found
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(404)
    json.NewEncoder(w).Encode(map[string]string{"error": "User not found"})
}

func initRouter() *http.ServeMux {
    mux := http.NewServeMux()
    mux.HandleFunc("/sample", userHandler)
    return mux
}

func main() {
    server := http.Server{
        Addr: ":8080",
    }

    server.Handler = initRouter()
    server.ListenAndServe()
}

実際にAPIサーバーを立ち上げて、GETリクエストを送りました。指定ユーザーIDの情報が返ってきてます。

テストコードの作成

net/http/httptestを用いてE2Eテストを行います。パッケージ内のNewServerを利用すると内部でgoroutineが起動され、指定したURLでListenerが開始されます。これによりテストコード内でAPIサーバーの起動、テストの実行を一気に行えます。試しにID: 1のユーザーをリクエストし、ユーザー情報が取得できるかのテストを作成します。

package main

import (
    "bytes"
    "encoding/json"
    "net/http"
    "net/http/httptest"
    "net/url"
    "testing"
)

func postUserRequest(t *testing.T, url string, body string) (*http.Response, error) {
    req, err := http.NewRequest("GET", url, bytes.NewBuffer([]byte(body)))
    req.Host = url
    if err != nil {
        return nil, err
    }
    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        return nil, err
    }
    return resp, nil
}

func TestE2E(t *testing.T) {
    router := initRouter()
    ts := httptest.NewServer(router)
    userUrl, err := url.JoinPath(ts.URL, "sample")
    if err != nil {
        t.Fatalf("Failed to construct URL: %v", err)
    }
    t.Run("Get ID Request", func(t *testing.T) {
        requestBody := `{"id": 1}`
        resp, err := postUserRequest(t, userUrl, requestBody)
        if err != nil {
            t.Fatalf("Failed to send request: %v", err)
        }
        defer resp.Body.Close()

        if resp.StatusCode != 200 {
            t.Fatalf("Expected status code 200, got %d", resp.StatusCode)
        }
        var user User
        if err := json.NewDecoder(resp.Body).Decode(&user); err != nil {
            t.Fatalf("Failed to decode response: %v", err)
        }
        if user.ID != 1 || user.Name != "Alice" || user.Email != "example@example.com" {
            t.Fatalf("Unexpected user data: %+v", user)
        }

        t.Logf("Get user: %v", user)
    })

    t.Run("Get Unknown ID Request", func(t *testing.T) {
        requestBody := `{"id": 4}`
        resp, err := postUserRequest(t, userUrl, requestBody)
        if err != nil {
            t.Fatalf("Failed to send request: %v", err)
        }
        defer resp.Body.Close()

        if resp.StatusCode != 404 {
            t.Fatalf("Expected status code 404, got %d", resp.StatusCode)
        }
        t.Logf("status code is %v", resp.StatusCode)

    })
}

上記コードを実行すると下記実行結果が得られます。NewServerで立ち上げたAPIサーバーにGETリクエストを送り、意図したJSONが返却されるか確認できました。

user@dummy test-http % go test -v -run TestE2E
=== RUN   TestE2E
=== RUN   TestE2E/Get_ID_Request
    main_test.go:52: Get user: {1 Alice example@example.com}
=== RUN   TestE2E/Get_Unknown_ID_Request
    main_test.go:66: status code is 404
--- PASS: TestE2E (0.00s)
    --- PASS: TestE2E/Get_ID_Request (0.00s)
    --- PASS: TestE2E/Get_Unknown_ID_Request (0.00s)
PASS
ok      teste2e 0.653s

まとめ

net/http/httptestパッケージを用いてE2Eテストを行ってみました。テストコード内でAPIサーバーの起動からテストまで行えるので、APIサーバーとテストコード間を実行する待ち処理などの実装もする必要もなく簡潔に実装できました。CIにも組み込みやすいので便利だと感じました。

2025-12-09

PythonでBigQueryのポリシータグを操作する

ZOZO Advent Calendar 2025 9日目の記事になります。今回はPython経由でBigQueryのポリシータグを操作する方法を紹介したいと思います。

前準備

はじめに検証環境用のテーブル作成、及び利用ライブラリのバージョンについて記載します。

利用ライブラリ

python 3.12  
google-cloud-bigquery==3.38.0

検証テーブルの作成

下記Schemaで検証用のテーブルを作成します。addressのカラムに住所、mailのカラムにメールが保存される想定で作成しています。今回はaddressとmailカラムに対してアクセス制御を行いたいためそれぞれポリシータグを付与します。

CREATE TABLE `sample-project.temp.sample`
(
  id STRING,
  address STRING,
  mail STRING,
  description STRING
)

ポリシータグは下記種別で作成します。

sampleのテーブルへ作成したポリシータグを付与します。

ポリシータグの取得

まずテーブルに対してポリシータグの情報を取得する方法を紹介します。get_tableメソッドのSchemaから取得できます。SchemaにはSchemaFieldが配列で格納されているので、policy_tagsの情報を取得します。

from google.cloud import bigquery

def get_policy_tag_information(project_id, dataset_id, table_id):
    client = bigquery.Client(project=project_id)
    table_ref = client.dataset(dataset_id).table(table_id)
    table = client.get_table(table_ref)
    table_schemas = {}
    for schema in table.schema:
        table_schemas[schema.name] = {"policy_tags": schema.policy_tags}
    return table_schemas

if __name__ == "__main__":
    project_id = "sample-project"
    dataset_id = "temp"
    table_id = "sample"
    table_schemas = get_policy_tag_information(
          project_id,
          dataset_id,
          table_id
    )
    print(table_schemas)

コードを実行すると下記結果が取得できます

{
    "id": {
        "policy_tags": None
    },
    "address": {
        "policy_tags": PolicyTagList(
            names=(
                "projects/sample/locations/us/taxonomies/12345/policyTags/6789",
            )
        )
    },
    "mail": {
        "policy_tags": PolicyTagList(
            names=(
                "projects/sample/locations/us/taxonomies/12345/policyTags/2345",
            )
        )
    },
    "description": {
        "policy_tags": None
    }
}

ポリシータグの変更(削除)

次にすでに付与されているポリシータグを削除する方法について紹介します。テーブルに対してポリシータグの情報を更新するにはupdate_tableメソッドを利用します。更新したいフィールドを指定して更新を行います。今回はschemaを指定しています。

from google.cloud import bigquery

def update_policy_tag(project_id, dataset_id, table_id):
    client = bigquery.Client(project=project_id)
    table_ref = client.dataset(dataset_id).table(table_id)
    table = client.get_table(table_ref)

    updated_schema = []
    for field in table.schema:
        updated_field = bigquery.SchemaField(
            name=field.name,
            field_type=field.field_type,
            mode=field.mode,
            description=field.description,
            policy_tags=bigquery.PolicyTagList(names=[])
        )
        updated_schema.append(updated_field)
    table.schema = updated_schema
    table = client.update_table(table, ["schema"])

if __name__ == "__main__":
    project_id = "sample-project"
    dataset_id = "temp"
    table_id = "sample"
    table_schemas = update_policy_tag(
          project_id,
          dataset_id,
          table_id
    )

上記コードを実行し、テーブルを確認するとポリシータグが外れています。

ポリシータグの変更(追加)

ポリシータグの追加に関しても削除と同様にupdate_tableメソッドを利用します。付与したいポリシータグのIDを指定して付与が行えます。

from google.cloud import bigquery

def update_policy_tag(project_id, dataset_id, table_id):
    client = bigquery.Client(project=project_id)
    table_ref = client.dataset(dataset_id).table(table_id)
    table = client.get_table(table_ref)
    plicy_tag_list = {
        "address": bigquery.PolicyTagList(names=["projects/sample/locations/us/taxonomies/12345/policyTags/6789"]),
        "mail": bigquery.PolicyTagList(names=["projects/sample/locations/us/taxonomies/12345/policyTags/2345"])
    }
    updated_schema = []
    for field in table.schema:
      if field.name in plicy_tag_list:
        update_policy_tag = plicy_tag_list[field.name]
        updated_field = bigquery.SchemaField(
            name=field.name,
            field_type=field.field_type,
            mode=field.mode,
            description=field.description,
            policy_tags=update_policy_tag
        )
        updated_schema.append(updated_field)
      else:
        updated_schema.append(field)

    table.schema = updated_schema
    table = client.update_table(table, ["schema"])

if __name__ == "__main__":
    project_id = "sample-project"
    dataset_id = "temp"
    table_id = "sample"
    table_schemas = update_policy_tag(
          project_id,
          dataset_id,
          table_id
    )

コードを実行し、テーブルを確認するとポリシータグが外れています。

まとめ

今回は簡単にですがポリシータグをPython経由で操作する方法を紹介しました。複数テーブルに対してポリシータグの操作をしたい場合に利用できるかなと思います。

2025-12-02

Cloud Composer(Airflow)のDAGテストを実装してみる

ZOZO Advent Calendar 2025 2日目の記事になります。今回はlocalの環境でCloud Composer(Airflow)のテストを実行する方法を紹介したいと思います。

前準備
- 環境構築
- DAGの作成
DAGのテスト
- テスト実行
まとめ

前準備

本題に入る前にDAGを実行する環境構築を行います。

環境構築

composer-local-devを用いて検証環境を構築します。コマンドのインストール方法は割愛します。

下記コマンドでdev環境の構築を行います。

$ composer-dev create \
  --from-image-version composer-2.15.4-airflow-2.9.3  \
  test --debug

composer-dev start test

上記コマンドでlocalのcomposer環境を立ち上げます。立ち上げに成功したらhttp://localhost:8080にアクセスしてairflowのUIが立ち上がったかを確認します

無事UIを確認できました。

DAGの作成

テスト対象のDAGを作成します。今回はDAGの実行時にパラメータを渡せるようにし、渡したパラメーターによって出力が変わるようなDAGを作成します。作成したDAGのコードは下記になります。

composer-test/composer/test/dags/tutorial.py

from datetime import timedelta
from airflow import DAG
from airflow.providers.standard.operators.python import (
    PythonOperator,
)
from airflow.models.param import Param


def get_username_from_ids(**kwargs):
    users = {
       1: {"username": "Yamada", "email": "user_1@example.com"},
       2: {"username": "Tanaka", "email": "user_2@example.com"},
       3: {"username": "Sato", "email": "user_3@example.com"},
       4: {"username": "Suzuki", "email": "user_4@example.com"},
    }
    user_ids = kwargs['params'].get("test_parameters").get("user_ids", [])
    return_user_info = []
    for user_id in user_ids:
      user = users.get(user_id)
      return_user_info.append(user)
    return return_user_info

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),

}
dag = DAG(
    'test DAG',
    default_args=default_args,
    description='Only use test DAG',
    tags=['example'],
    params={
        "test_parameters": Param(
            default={"key": "default_value"},
            type="object",
        )
    }
)

run_first = PythonOperator(task_id="print_the_context", python_callable=get_username_from_ids, dag=dag)

run_first

上記のDAGを手動実行する場合は、下記スクリーンショットの順でパラメーターを入力します。

今回の手動実行では下記jsonを入力します

{
  "user_ids": [1, 2]
}

実行結果のログを確認するとユーザー情報が返されています。

[2025-12-01, 19:01:49 UTC] {python.py:198} INFO - Done. Returned value was: [{'username': 'Yamada', 'email': 'user_1@example.com'}, {'username': 'Tanaka', 'email': 'user_2@example.com'}]

DAGのテスト

いよいよDAGのテストを実装します。

tests
├── expected
│   ├── test1.json
│   └── test2.json
├── sample_test.py
└── testdata
    ├── test1.json
    └── test2.json

上記のようなディレクトリをdagsディレクトリ配下に作成し、testdataディレクトリにinputを行うデータを配置します。expectedディレクトリに出力が期待されるデータを作成します。例としてtestdataディレクトリに下記2つのinputデータを作成しました。

testdata/test1.json

{
  "user_ids": [1, 2]
}

testdata/test2.json

{
  "user_ids": [3, 4]
}

expectedディレクトリには下記2つのjsonを配置します。

expected/test1.json

[
  {
    "username": "Yamada",
    "email": "user_1@example.com"
  },
  {
    "username": "Tanaka",
    "email": "user_2@example.com"
  }
]

expected/test2.json

[
  {
    "username": "Sato",
    "email": "user_3@example.com"
  },
  {
    "username": "Suzuki",
    "email": "user_4@example.com"
  }
]

最後にテストを行うpythonコードを実装します。expected, testdataディレクトリそれぞれを読み込み、メソッド経由でairflowのDAGを実行します。

sample_test.py

import pytest
from pathlib import Path
import yaml
import os
from airflow.models import DagBag, DagRun
testcases = [
  {"name": "テストケース1", "testcase_name": "test1"},
  {"name": "テストケース2", "testcase_name": "test2"},
]

def _load_testdata(testcase_name):
    testdata_path = Path(__file__).parent / "testdata" / f"{testcase_name}.json"
    
    with open(testdata_path, "r") as f:
        testdata = yaml.safe_load(f)
    return testdata

def _load_expected_result(testcase_name):
    expected_result_path = Path(__file__).parent / "expected" / f"{testcase_name}.json"
    with open(expected_result_path, "r") as f:
        expected_result = yaml.safe_load(f)
    return expected_result

def test_dag_response():
    dag_bag = DagBag(dag_folder=os.environ.get('DAGS_FOLDER'), include_examples=False)
    dag = dag_bag.get_dag(dag_id='test')
    assert dag is not None, "DAG 'tutorial' not found in DagBag"

    for testcase in testcases:
        testcase_name = testcase["testcase_name"]
        testdata = _load_testdata(testcase_name)

        dr = dag.test(run_conf={"test_parameters": testdata})
        ti = dr.get_task_instance(task_id="output_the_context")
        result = ti.xcom_pull()
        print("---- Test Result ----")

        expected_result = _load_expected_result(testcase_name)
        assert result == expected_result, f"Test case {testcase_name} failed: expected {expected_result}, got {result}"
        print(f"Testcase: {testcase_name}, Result: {result}" )

dag.test部分でDAGの実行を行い、get_task_instance部分でtaskの情報を取得します。取得した情報を出力が期待されるデータと突き合わせてassertを行っています。

テスト実行

試しに作成したテストが実行できるか確認します。下記コマンドを用いてコンテナ内に入ります。

docker exec -ti composer-local-dev-test /bin/bash

コンテナ内に入ったらpytestを実行します

$ pytest

実行を行うと無事テストが通ったことが確認できます

airflow@15756e2b1ac3:~$ pytest -v 
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.11.8, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /home/airflow
plugins: anyio-4.11.0
collected 1 item                                                                                                                                                                       

gcs/dags/tests/sample_test.py::test_dag_response PASSED                                                                                                                          [100%]

================================================================================== 1 passed in 1.40s ===================================================================================

まとめ

簡単にですがCloud Composerで用いるDAGのテストを行う方法をまとめました。WF系のツールはテストの作成が難しいと考えていましたが、Airflow側でテスト用のメソッドが用意されていたので楽にテストの実装ができました。

2024-12-25

Go言語でVisitorパターンを実装してみる

ZOZO Advent Calendar 2024 25日目の記事になります。今回はVisitorパターンについて簡単に紹介したいと思います。

Visitorパターンについて
実装例
まとめ

Visitorパターンについて

Visitor パターンは、オブジェクト指向プログラミングおよびソフトウェア工学において、アルゴリズムをオブジェクトの構造から分離するためのデザインパターンである。分離による実用的な結果として、既存のオブジェクトに対する新たな操作を構造を変更せずに追加することができる。

wikiから引用

Visitor パターン - Wikipedia

説明だけでは実装イメージがつきにくいのですが、重要なのは既存のオブジェクトに対する新たな操作を構造を変更せずに追加することができる部分かと思います。
オブジェクトを郵便の配達物に見立てると、配達物を配達するような処理を間に挟む感覚が近いのかなと思います。

実装例

実装の例として、受け取った文字列をJSON、またはYAMLに変換するロジックを実装します。

まず最初にVisitorインタフェースを定義します。

type Visitor interface {
    VisitParseJSON(*ParseJSON)
    VisitParseYAML(*ParseYAML)
}

次がJSONに変換する処理になります。

type ParseJSON struct {
    Name   string
    Sample string
}

func (p *ParseJSON) Accept(visitor Visitor) {
    visitor.VisitParseJSON(p)
}

type RawData struct {
    Data []byte
}

func (c *RawData) VisitParseJSON(p *ParseJSON) {
    var ParseJSON ParseJSON
    json.Unmarshal(c.Data, &ParseJSON)
    marshalJSON, err := json.Marshal(ParseJSON)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(string(marshalJSON))
}

次にYAMLに変換する処理になります。

type ParseYAML struct {
    Name   string `yaml:"Name"`
    Sample string `yaml:"Sample"`
}

func (p *ParseYAML) Accept(visitor Visitor) {
    visitor.VisitParseYAML(p)
}

func (c *RawData) VisitParseYAML(p *ParseYAML) {
    var ParseYAML ParseYAML
    yaml.Unmarshal(c.Data, &ParseYAML)
    marshalYaml, err := yaml.Marshal(ParseYAML)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(string(marshalYaml))
}

上記でJSON、またはYAMLの変換ロジックが完成したのでこの関数の呼び出し部分を作成します。

func main() {
    jsonData := `{"Name": "json_value", "Sample": "bar"}`

    rawData := RawData{Data: []byte(jsonData)}
    parseJSON := ParseJSON{}
    parseJSON.Accept(&rawData)

    ParseYAML := ParseYAML{}
    ParseYAML.Accept(&rawData)

}

上記の呼び出し方法より、JSONの変換やYAMLの変換ロジックの呼び出しはAcceptメソッドを介して行われます。これにより新しく別の変換処理を行いたい場合はVisitorインタフェース内にメソッドを追加し、処理を追加するだけで済みます。

下記が処理の全体コードになります。

package main

import (
    "encoding/json"
    "fmt"

    "gopkg.in/yaml.v3"
)

type Element interface {
    Accept(visitor Visitor)
}

type Visitor interface {
    VisitParseJSON(*ParseJSON)
    VisitParseYAML(*ParseYAML)
}

type ParseJSON struct {
    Name   string
    Sample string
}

func (p *ParseJSON) Accept(visitor Visitor) {
    visitor.VisitParseJSON(p)
}

type RawData struct {
    Data []byte
}

func (c *RawData) VisitParseJSON(p *ParseJSON) {
    var ParseJSON ParseJSON
    json.Unmarshal(c.Data, &ParseJSON)
    marshalJSON, err := json.Marshal(ParseJSON)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(string(marshalJSON))
}

type ParseYAML struct {
    Name   string `yaml:"Name"`
    Sample string `yaml:"Sample"`
}

func (p *ParseYAML) Accept(visitor Visitor) {
    visitor.VisitParseYAML(p)
}

func (c *RawData) VisitParseYAML(p *ParseYAML) {
    var ParseYAML ParseYAML
    yaml.Unmarshal(c.Data, &ParseYAML)
    marshalYaml, err := yaml.Marshal(ParseYAML)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(string(marshalYaml))
}

func main() {
    jsonData := `{"Name": "json_value", "Sample": "bar"}`
         fmt.Println("Convert JSON")
    rawData := RawData{Data: []byte(jsonData)}
    parseJSON := ParseJSON{}
    parseJSON.Accept(&rawData)

         fmt.Println("Convert YAML")
    ParseYAML := ParseYAML{}
    ParseYAML.Accept(&rawData)

}

実行結果は下記になります。

$ go run  main.go
Convert JSON
{"Name":"json_value","Sample":"bar"}
Convert YAML
Name: json_value
Sample: bar

まとめ

Visitorパターンを簡単に紹介いたしました。作成するプログラムに対して機能追加が増えそうな場合は予めVisitorパターンを利用すると機能の追加が行いやすくなるので選択肢として実装方法を知っておくと役に立つかなと思いました。

2024-12-18

Goでjson.Marshalを行う際、nil sliceを渡すと結果がnullになる

ZOZO Advent Calendar 2024 18日目の記事になります。
今回はTips的な話になるのですが、Go言語のMarshal関数を利用してjsonのencodeを行う際、nil sliceを渡すとnullになる挙動があり、注意が必要だと思ったので紹介したいと思います。

nil sliceについて
nil sliceを渡した際のMarshal関数の挙動
nil sliceとempty sliceの宣言について
まとめ

nil sliceについて

nil sliceについてはA Tour of Goで紹介されています。 go.dev

上記説明に書かれているとおりなのですが、sliceの初期値はnilなので初期化されていないsliceはnilになります。また、nil sliceは基底配列(underlying array)が存在しません。よく比較されるnil sliceと空のslice(empty slice)に対して挙動の違いを確認します。

package main

import (
    "fmt"
    "reflect"
)

func main() {
    // nil slice
    var nc []int
    // empty slice
    s := []int{}

    fmt.Println("nil slice:", nc, len(nc), cap(nc), nc == nil, reflect.TypeOf(nc))
    fmt.Println("empty slice:", s, len(s), cap(s), s == nil, reflect.TypeOf(s))
}

上記コードを実行すると下記結果が得られます。nil slice, empty sliceの長さ、キャパシティはどちらも同じ結果になりました。しかしnilの比較に関しては違う結果になりました。 sliceの長さ、キャパシティだけで比較するとnil slice, empty sliceの区別がつかず、nilの比較を行わずに後述のencoding/jsonパッケージ等を利用すると意図しない挙動になることもあるので注意が必要です。

$ go run main.go
nil slice: [] 0 0 true []int
empty slice: [] 0 0 false []int

nil sliceを渡した際のMarshal関数の挙動

本題に入る前に、今回確認する挙動はMarshal関数の説明に記載されている挙動になります。

Array and slice values encode as JSON arrays, except that []byte encodes as a base64-encoded string, and a nil slice encodes as the null JSON value.

上記を踏まえた上で、encoding/jsonパッケージのMarshal関数を利用した際の挙動を確認したいと思います。

まず構造体をjsonに変換する例が下記コードになります。

package main

import (
    "encoding/json"
    "fmt"
    "log"
)

type PrintMessage struct {
    Message string `json:"message"`
}

func main() {
    messsage := PrintMessage{Message: "Hello World!"}
    marshaledMessage, err := json.Marshal(messsage)
    if err != nil {
        log.Fatal("Error converting to json")
    }
    fmt.Println(string(marshaledMessage))
}

出力される結果は下記になります。

$ go run main.go
{"message":"Hello World!"}

上記コードで利用するjson.Marshal関数に対してnil slice, empty sliceをそれぞれ渡します。

package main

import (
    "encoding/json"
    "fmt"
    "log"
)

type PrintMessage struct {
    Message string `json:"message"`
}

func main() {
    // nil slice
    var nilSliceMessage []PrintMessage
    // empty slice
    emptySliceMessage := []PrintMessage{}

    marshaledEmptySliceMessage, err := json.Marshal(emptySliceMessage)
    if err != nil {
        log.Fatal("Error converting to json")
    }
    marshaledNilSliceMessage, nil := json.Marshal(nilSliceMessage)
    if nil != nil {
        log.Fatal("Error converting to json")
    }
    fmt.Println("empty slice:", string(marshaledEmptySliceMessage))
    fmt.Println("nil slice:", string(marshaledNilSliceMessage))
}

出力される結果は下記になります。empty sliceは空sliceが返され、nil sliceはnullが返されていることがわかります。

$ go run main.go
empty slice: []
nil slice: null

nil sliceとempty sliceの宣言について

最後にnil sliceとempty sliceの宣言について触れたいと思います。sliceの宣言を行うとき下記3パターンの宣言方法があると思います。

var hoge []int
hoge := []int{}
hoge := make([]int, 0)

上記3パターンの宣言についてnil slice, empty sliceどちらになるか確認したいと思います。

package main

import (
    "fmt"
    "reflect"
)

func main() {
    var s1 []int
    s2 := []int{}
    s3 := make([]int, 0)

    fmt.Println("s1 slice:", s1, len(s1), cap(s1), s1 == nil, reflect.TypeOf(s1))
    fmt.Println("s2 slice:", s2, len(s2), cap(s2), s2 == nil, reflect.TypeOf(s2))
    fmt.Println("s3 slice:", s3, len(s3), cap(s3), s3 == nil, reflect.TypeOf(s3))
}

上記コードの結果は下記になります。

$ go run main.go
s1 slice: [] 0 0 true []int # nil slice
s2 slice: [] 0 0 false []int # empty slice
s3 slice: [] 0 0 false []int # empty slice

まとめ

nil sliceとempty sliceの挙動を踏まえてjson.Marshal関数の挙動を確認できました。若干浅い内容になってしまったのですが、sliceとjson.Marshal関数の内部的な挙動についてもう少し調べて理解を深めたいなと思いました。

2024-12-11

SQLの字句解析器(tokenizer)を実装する

ZOZO Advent Calendar 2024 11日目の記事になります。
今回はSQLの字句解析器(tokenizer)の実装を行ってみたいと思います。

はじめに
使用環境
tokenの分割について
実装
まとめ

はじめに

例えばSQL内で参照されているテーブル名を抜き出したい場合があるとします。簡単に実装するのであれば正規表現で抜き出す方法があると思います。しかしBigQueryを例に挙げると、FROM句の後にサブクエリやプロジェクト名を省略した形でSELECTするテーブルを書くことができ、すべての条件を網羅しようとすると正規表現が複雑になります。

cloud.google.com

今回SQLの字句解析器(tokenizer)を実装し、SQL内の単語をtokenへ分割してFROM句のテーブル名を抜き出せるようにしたいと思います。

使用環境

Python 3.13.1

tokenの分割について

字句解析器を実装する前にtoken化した単語をどこまで切り分けるのか決めます。利用用途によって切り分け方を変えると良いのですが、今回はFROM句後のテーブル名を抜き出す一例として切り分け方を紹介したいと思います。今回の実装では下記表のような形で単語とtokenの種類で抜き出します。

抜き出す単語	token名
--, #, /* , */	Comment
空白, 改行	NewlineAndWhitespace
FROM, JOIN	FromORJoin
,	Comma
その他の単語	Something

実装

字句解析を行うTokenizerクラスを下記の様に実装しました。

import re

class Tokenizer:
    def __init__(self):
        self._SQL_REGEX = [
            # Comment
            (r'(--|#).*?(\r\n|\r|\n|$)', "Comment"),
            (r'(?s)/\*.*?\*/', "Comment"),
            # Newline and Whitespace
            (r'(\r\n|\r|\n|\s+?)', "NewlineAndWhitespace"),
            # FROM and JOIN
            (r'(?i)(JOIN|FROM)\b', "FromORJoin"),
            # Comma
            (r',', "Comma"),
            # Other
            (r'(.+?)(?=\s|\n|$)', "Something"),         
        ]

    def _match_token(self, sql, index):
        for pattern, token in self._SQL_REGEX:
            match_token_pattern = re.compile(pattern).match(sql, index)
            if match_token_pattern:
                return token, match_token_pattern
        raise RuntimeError(f"Did not match any token pattern: '{sql}'")

    def lexer(self, sql):
        match_tokens = list()
        index = 0
        sql_length = len(sql)
        while index < sql_length:
            token_type, match_sql_text = self._match_token(sql, index)
            match_tokens.append({"token_type": token_type, "text": match_sql_text.group()})
            index += match_sql_text.end() - index
        return match_tokens

クラスの利用イメージは下記コードのようになります。SQLの文字列をlexer関数に渡すとtokenの種類と対応する文字列がlistで返されます。

def main():
    tokenizer = Tokenizer()
    sql = "SELECT * FROM table1 JOIN table2 ON table1.id = table2.id WHERE table1.id = 1"
    tokens = tokenizer.lexer(sql)
    for token in tokens:
        print(token)


if __name__ == "__main__":
    main()

上記コードを実行した結果になります。

$ python tokenizer.py
{'token_type': 'NewlineAndWhitespace', 'text': '\n'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Comment', 'text': '-- テスト\n'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': 'SELECT'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': '*'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'FromORJoin', 'text': 'FROM'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': 'table1'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'FromORJoin', 'text': 'JOIN'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': 'table2'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': 'WHERE'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': 'column1'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': '='}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'Something', 'text': "'value1'"}
{'token_type': 'NewlineAndWhitespace', 'text': '\n'}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}
{'token_type': 'NewlineAndWhitespace', 'text': ' '}

それぞれのtokenに対応した文字列を抜き出せることが確認できました。

次は抜き出したtokenの種類をもとにテーブル名を抜き出します。先ほど作成したTokenizerクラスの出力ではNewlineAndWhitespace、Comment、Commaのtokenを出力してましたが、テーブル名を抜き出すだけの場合は必要ないので出力しないようにします。

import re

class Tokenizer:
    def __init__(self):
        self._SQL_REGEX = [
            # Comment
            (r'(--|#).*?(\r\n|\r|\n|$)', "Comment"),
            (r'(?s)/\*.*?\*/', "Comment"),
            # Newline and Whitespace
            (r'(\r\n|\r|\n|\s+?)', "NewlineAndWhitespace"),
            # FROM and JOIN
            (r'(?i)(JOIN|FROM)\b', "FromORJoin"),
            # Comma
            (r',', "Comma"),
            # Other
            (r'(.+?)(?=\s|\n|$)', "Something"),         
        ]

    def _match_token(self, sql, index):
        for pattern, token in self._SQL_REGEX:
            match_token_pattern = re.compile(pattern).match(sql, index)
            if match_token_pattern:
                return token, match_token_pattern
        raise RuntimeError(f"Did not match any token pattern: '{sql}'")

    def lexer(self, sql):
        match_tokens = list()
        index = 0
        sql_length = len(sql)
        while index < sql_length:
            token_type, match_sql_text = self._match_token(sql, index)
            # FROM句後のテーブル名を抜き出すため、NewlineAndWhitespace, Comment, Commaは無視する
            if token_type != "NewlineAndWhitespace" and token_type != "Comment" and token_type != "Comma":
              match_tokens.append({"token_type": token_type, "text": match_sql_text.group()})
            index += match_sql_text.end() - index
        return match_tokens



def main():
    tokenizer = Tokenizer()
    sql = """
    -- テスト
    SELECT * FROM table1 JOIN table2 WHERE column1 = 'value1'
    """
    tokens = tokenizer.lexer(sql)
    table_ids = list()
    index = 0
    tokens_length = len(tokens)
    while index < tokens_length:
      token_type = tokens[index]["token_type"]
      # FROMかJOINの次の単語を抜き出す
      if token_type == "FromORJoin":
        table_ids.append(tokens[index + 1]["text"])
        index += 2
      else:
        index += 1
    print(table_ids)
if __name__ == "__main__":
    main()

出力結果は下記になります。SQLの文字列に含まれたテーブル名が抜き出せていることが確認できました。

$ python tokenizer.py
['table1', 'table2']

まとめ

字句解析を実装して、うまくテーブル名を抜き出せました。ただサブクエリやextract関数で利用されるFROMについて考慮する場合はもう少し抜き出し方を工夫する必要があります。 tokenの種類を変更することでテーブル名を抜き出す以外にも応用ができそうなので色々利用用途を模索するのも面白そうかなと思いました。