feat: add gRPC Go skill

2026-02-25 21:57:09 +07:00
parent bb40f76957
commit 3e0c2fc3f2
2 changed files with 651 additions and 0 deletions
--- a/skills/grpc-golang/SKILL.md
+++ b/skills/grpc-golang/SKILL.md
@@ -0,0 +1,103 @@
+---
+name: grpc-golang
+description: "Build production-ready gRPC services in Go with mTLS, streaming, and observability. Use when designing Protobuf contracts with Buf or implementing secure service-to-service transport."
+risk: safe
+source: self
+---
+
+# gRPC Golang (gRPC-Go)
+
+## Overview
+
+Comprehensive guide for designing and implementing production-grade gRPC services in Go. Covers contract standardization with Buf, transport layer security via mTLS, and deep observability with OpenTelemetry interceptors.
+
+## Use this skill when
+
+- Designing microservices communication with gRPC in Go.
+- Building high-performance internal APIs using Protobuf.
+- Implementing streaming workloads (unidirectional or bidirectional).
+- Standardizing API contracts using Protobuf and Buf.
+- Configuring mTLS for service-to-service authentication.
+
+## Do not use this skill when
+
+- Building pure REST/HTTP public APIs without gRPC requirements.
+- Modifying legacy `.proto` files without the ability to introduce a new API version (e.g., `api.v2`) or ensure backward compatibility.
+- Managing service mesh traffic routing (e.g., Istio/Linkerd), which is outside the application code scope.
+
+## Step-by-Step Guide
+
+1. **Confirm Technical Context**: Identify Go version, gRPC-Go version, and whether the project uses Buf or raw protoc.
+2. **Confirm Requirements**: Identify mTLS needs, load patterns (unary/streaming), SLOs, and message size limits.
+3. **Plan Schema**: Define package versioning (e.g., `api.v1`), resource types, and error mapping.
+4. **Security Design**: Implement mTLS for service-to-service authentication.
+5. **Observability**: Configure interceptors for tracing, metrics, and structured logging.
+6. **Verification**: Always run `buf lint` and breaking change checks before finalizing code generation.
+
+Refer to `resources/implementation-playbook.md` for detailed patterns, code examples, and anti-patterns.
+
+## Examples
+
+### Example 1: Defining a Service & Message (v1 API)
+
+```proto
+syntax = "proto3";
+package api.v1;
+option go_package = "github.com/org/repo/gen/api/v1;apiv1";
+
+service UserService {
+  rpc GetUser(GetUserRequest) returns (GetUserResponse);
+}
+
+message User {
+  string id = 1;
+  string name = 2;
+}
+
+message GetUserRequest {
+  string id = 1;
+}
+
+message GetUserResponse {
+  User user = 1;
+}
+```
+
+## Best Practices
+
+- ✅ **Do:** Use Buf to standardize your toolchain and linting with `buf.yaml` and `buf.gen.yaml`.
+- ✅ **Do:** Always use semantic versioning in package paths (e.g., `package api.v1`).
+- ✅ **Do:** Enforce mTLS for all internal service-to-service communication.
+- ✅ **Do:** Handle `ctx.Done()` in all streaming handlers to prevent resource leaks.
+- ✅ **Do:** Map domain errors to standard gRPC status codes (e.g., `codes.NotFound`).
+- ❌ **Don't:** Return raw internal error strings or stack traces to gRPC clients.
+- ❌ **Don't:** Create a new `grpc.ClientConn` per request; always reuse connections.
+
+## Troubleshooting
+
+- **Error: Inconsistent Gen**: If the generated code does not match the schema, run `buf generate` and verify the `go_package` option.
+- **Error: Context Deadline**: Check client timeouts and ensure the server is not blocking infinitely in streaming handlers.
+- **Error: mTLS Handshake**: Ensure the CA certificate is correctly added to the `x509.CertPool` on both client and server sides.
+
+## Limitations
+
+- Does not cover service mesh traffic routing (Istio/Linkerd configuration).
+- Does not cover gRPC-Web or browser-based gRPC integration.
+- Assumes Go 1.21+ and gRPC-Go v1.60+; older versions may have different APIs (e.g., `grpc.Dial` vs `grpc.NewClient`).
+- Does not cover L7 gRPC-aware load balancer configuration (e.g., Envoy, NGINX).
+- Does not address Protobuf schema registry or large-scale schema governance beyond Buf lint.
+
+## Resources
+
+- `resources/implementation-playbook.md` for detailed patterns, code examples, and anti-patterns.
+- [Google API Design Guide](https://cloud.google.com/apis/design)
+- [Buf Docs](https://buf.build/docs)
+- [gRPC-Go Docs](https://grpc.io/docs/languages/go/)
+- [OpenTelemetry Go Instrumentation](https://opentelemetry.io/docs/instrumentation/go/)
+
+## Related Skills
+
+- @golang-pro - General Go patterns and performance optimization outside the gRPC layer.
+- @go-concurrency-patterns - Advanced goroutine lifecycle management for streaming handlers.
+- @api-design-principles - Resource naming and versioning strategy before writing `.proto` files.
+- @docker-expert - Containerizing gRPC services and configuring TLS cert injection via Docker secrets.
--- a/skills/grpc-golang/resources/implementation-playbook.md
+++ b/skills/grpc-golang/resources/implementation-playbook.md
@@ -0,0 +1,548 @@
+# gRPC Golang Implementation Playbook
+
+This file contains detailed patterns, checklists, and code samples referenced by the skill.
+
+## Schema Design Standards
+
+### Protobuf Definition
+
+- **Syntax**: Use proto3 only.
+- **Versioning**: Use package versioning (e.g., `api.v1`).
+- **Pagination**: Use `page_token` and `page_size` for list operations.
+- **Timezone**: Always use `google.protobuf.Timestamp` with UTC values at the server level.
+- **Idempotency**: Use idempotency keys or design side-effect-free methods to allow safe retries.
+- **Validation**: Adopt a schema-level validation approach (e.g., Buf validation rules or `protoc-gen-validate`) and ensure generated code is enforced server-side.
+
+```proto
+syntax = "proto3";
+package api.v1;
+option go_package = "github.com/org/repo/gen/api/v1;apiv1";
+
+import "google/protobuf/timestamp.proto";
+
+service UserService {
+  rpc GetUser(GetUserRequest) returns (GetUserResponse);
+  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
+  rpc WatchUsers(WatchUsersRequest) returns (stream UserEvent);
+}
+
+message User {
+  string id = 1;
+  string name = 2;
+  string email = 3;
+  google.protobuf.Timestamp created_at = 4;
+}
+
+message GetUserRequest {
+  string id = 1;
+}
+
+message GetUserResponse {
+  User user = 1;
+}
+
+message ListUsersRequest {
+  int32 page_size = 1;
+  string page_token = 2;
+}
+
+message ListUsersResponse {
+  repeated User users = 1;
+  string next_page_token = 2;
+}
+
+message WatchUsersRequest {
+  // Empty; streams all user events from the current point.
+}
+
+message UserEvent {
+  enum EventType {
+    EVENT_TYPE_UNSPECIFIED = 0;
+    EVENT_TYPE_CREATED = 1;
+    EVENT_TYPE_UPDATED = 2;
+    EVENT_TYPE_DELETED = 3;
+  }
+  EventType type = 1;
+  User user = 2;
+  google.protobuf.Timestamp occurred_at = 3;
+}
+```
+
+## Code Generation
+
+- **Toolchain**: Use `google.golang.org/protobuf/cmd/protoc-gen-go` and `protoc-gen-go-grpc`.
+- **Management**: Use `buf.gen.yaml` to manage plugin versions and generation parameters.
+- **Compatibility**: Ensure plugins use Protobuf Go v2 API (`google.golang.org/protobuf`). Do not mix with the deprecated v1 API (`github.com/golang/protobuf`).
+
+### buf.gen.yaml Example
+
+```yaml
+version: v2
+plugins:
+  - remote: buf.build/protocolbuffers/go
+    out: gen
+    opt: paths=source_relative
+  - remote: buf.build/grpc/go
+    out: gen
+    opt: paths=source_relative
+```
+
+## Server Implementation
+
+### Full Server Setup with Graceful Shutdown
+
+```go
+package main
+
+import (
+	"context"
+	"log"
+	"net"
+	"os"
+	"os/signal"
+	"syscall"
+	"time"
+
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/health"
+	healthpb "google.golang.org/grpc/health/grpc_health_v1"
+	"google.golang.org/grpc/keepalive"
+
+	apiv1 "github.com/org/repo/gen/api/v1"
+)
+
+func main() {
+	srv := grpc.NewServer(
+		grpc.ChainUnaryInterceptor(
+			recoveryInterceptor,
+			loggingInterceptor,
+			otelUnaryInterceptor,
+		),
+		grpc.KeepaliveParams(keepalive.ServerParameters{
+			MaxConnectionIdle: 5 * time.Minute,
+			Time:              1 * time.Minute,
+			Timeout:           20 * time.Second,
+		}),
+		grpc.MaxRecvMsgSize(4<<20), // 4 MB
+		grpc.MaxSendMsgSize(4<<20), // 4 MB
+	)
+
+	// Register application services.
+	apiv1.RegisterUserServiceServer(srv, newUserService())
+
+	// Register health check with fully-qualified service name.
+	healthSrv := health.NewServer()
+	healthpb.RegisterHealthServer(srv, healthSrv)
+	healthSrv.SetServingStatus(
+		"api.v1.UserService",
+		healthpb.HealthCheckResponse_SERVING,
+	)
+
+	lis, err := net.Listen("tcp", ":50051")
+	if err != nil {
+		log.Fatalf("listen: %v", err)
+	}
+
+	// Graceful shutdown: GracefulStop with a fallback timeout to Stop.
+	go func() {
+		sigCh := make(chan os.Signal, 1)
+		signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
+		<-sigCh
+
+		log.Println("shutting down gRPC server...")
+		healthSrv.SetServingStatus(
+			"api.v1.UserService",
+			healthpb.HealthCheckResponse_NOT_SERVING,
+		)
+
+		ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
+		defer cancel()
+
+		stopped := make(chan struct{})
+		go func() {
+			srv.GracefulStop()
+			close(stopped)
+		}()
+
+		select {
+		case <-stopped:
+			log.Println("server stopped gracefully")
+		case <-ctx.Done():
+			log.Println("graceful stop timed out, forcing stop")
+			srv.Stop()
+		}
+	}()
+
+	log.Printf("gRPC server listening on %s", lis.Addr())
+	if err := srv.Serve(lis); err != nil {
+		log.Fatalf("serve: %v", err)
+	}
+}
+```
+
+## mTLS Setup
+
+```go
+package main
+
+import (
+	"crypto/tls"
+	"crypto/x509"
+	"fmt"
+	"log"
+	"os"
+
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/credentials"
+)
+
+// loadServerTLS configures mTLS for the server side.
+func loadServerTLS() grpc.ServerOption {
+	tlsCert, err := tls.LoadX509KeyPair("server.crt", "server.key")
+	if err != nil {
+		log.Fatalf("load server cert: %v", err)
+	}
+
+	caCert, err := os.ReadFile("ca.crt")
+	if err != nil {
+		log.Fatalf("read CA cert: %v", err)
+	}
+
+	caPool := x509.NewCertPool()
+	if !caPool.AppendCertsFromPEM(caCert) {
+		log.Fatal("failed to append CA cert")
+	}
+
+	tlsCfg := &tls.Config{
+		Certificates: []tls.Certificate{tlsCert},
+		ClientCAs:    caPool,
+		ClientAuth:   tls.RequireAndVerifyClientCert,
+		MinVersion:   tls.VersionTLS13,
+	}
+	return grpc.Creds(credentials.NewTLS(tlsCfg))
+}
+
+// dialWithMTLS creates a client connection using mTLS.
+func dialWithMTLS(target string) (*grpc.ClientConn, error) {
+	clientCert, err := tls.LoadX509KeyPair("client.crt", "client.key")
+	if err != nil {
+		return nil, fmt.Errorf("load client cert: %w", err)
+	}
+
+	caCert, err := os.ReadFile("ca.crt")
+	if err != nil {
+		return nil, fmt.Errorf("read CA cert: %w", err)
+	}
+
+	caPool := x509.NewCertPool()
+	if !caPool.AppendCertsFromPEM(caCert) {
+		return nil, fmt.Errorf("failed to append CA cert")
+	}
+
+	creds := credentials.NewTLS(&tls.Config{
+		Certificates: []tls.Certificate{clientCert},
+		RootCAs:      caPool,
+		MinVersion:   tls.VersionTLS13,
+	})
+
+	// Note: for gRPC-Go v1.63+, grpc.NewClient is the recommended replacement.
+	conn, err := grpc.Dial(target, grpc.WithTransportCredentials(creds))
+	if err != nil {
+		return nil, fmt.Errorf("dial %s: %w", target, err)
+	}
+	return conn, nil
+}
+```
+
+## Client Best Practices
+
+### Connection Reuse
+
+```go
+package main
+
+import (
+	"context"
+	"fmt"
+	"log"
+	"os"
+	"time"
+
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/credentials"
+
+	apiv1 "github.com/org/repo/gen/api/v1"
+)
+
+// Initialize once at startup; reuse across the application lifetime.
+var userConn *grpc.ClientConn
+
+func initClients(creds credentials.TransportCredentials) {
+	var err error
+	// Note: for gRPC-Go v1.63+, use grpc.NewClient instead.
+	userConn, err = grpc.Dial(
+		os.Getenv("USER_SVC_ADDR"),
+		grpc.WithTransportCredentials(creds),
+	)
+	if err != nil {
+		log.Fatalf("dial user-svc: %v", err)
+	}
+}
+
+func callListUsers(ctx context.Context) (*apiv1.ListUsersResponse, error) {
+	// Always set a deadline per call, not per connection.
+	ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
+	defer cancel()
+
+	client := apiv1.NewUserServiceClient(userConn)
+	resp, err := client.ListUsers(ctx, &apiv1.ListUsersRequest{PageSize: 20})
+	if err != nil {
+		return nil, fmt.Errorf("list users: %w", err)
+	}
+	return resp, nil
+}
+```
+
+### Retry Policy
+
+Only enable retries for idempotent calls. Use exponential backoff.
+
+```go
+import "google.golang.org/grpc"
+
+// Service config with retry policy for idempotent methods.
+const retryPolicy = `{
+  "methodConfig": [{
+    "name": [{"service": "api.v1.UserService", "method": "GetUser"}],
+    "retryPolicy": {
+      "maxAttempts": 3,
+      "initialBackoff": "0.1s",
+      "maxBackoff": "1s",
+      "backoffMultiplier": 2,
+      "retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
+    }
+  }]
+}`
+
+// Note: for gRPC-Go v1.63+, use grpc.NewClient instead of grpc.Dial.
+conn, err := grpc.Dial(
+	target,
+	grpc.WithTransportCredentials(creds),
+	grpc.WithDefaultServiceConfig(retryPolicy),
+)
+```
+
+## Observability
+
+### Interceptor Labels
+
+- **Logging**: Include `grpc.method`, `grpc.service`, `grpc.code`, `latency_ms`, and `trace_id`.
+- **Metrics**: Export request count, latency histogram, and in-flight stream count.
+
+### OpenTelemetry Integration
+
+```go
+import (
+	"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
+	"google.golang.org/grpc"
+)
+
+srv := grpc.NewServer(
+	grpc.StatsHandler(otelgrpc.NewServerHandler()),
+)
+
+// Note: for gRPC-Go v1.63+, use grpc.NewClient instead of grpc.Dial.
+conn, err := grpc.Dial(
+	target,
+	grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
+)
+```
+
+## Testing
+
+### bufconn In-Process Test
+
+```go
+package service_test
+
+import (
+	"context"
+	"net"
+	"testing"
+	"time"
+
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/credentials/insecure"
+	"google.golang.org/grpc/status"
+	"google.golang.org/grpc/codes"
+	"google.golang.org/grpc/test/bufconn"
+
+	apiv1 "github.com/org/repo/gen/api/v1"
+)
+
+func TestListUsers(t *testing.T) {
+	lis := bufconn.Listen(1 << 20)
+	srv := grpc.NewServer()
+	apiv1.RegisterUserServiceServer(srv, &fakeUserSvc{})
+	go func() {
+		if err := srv.Serve(lis); err != nil {
+			t.Logf("server exited: %v", err)
+		}
+	}()
+	t.Cleanup(srv.GracefulStop)
+
+	// Note: for gRPC-Go v1.63+, use grpc.NewClient instead of grpc.DialContext.
+	conn, err := grpc.DialContext(context.Background(),
+		"bufnet",
+		grpc.WithContextDialer(func(ctx context.Context, _ string) (net.Conn, error) {
+			return lis.DialContext(ctx)
+		}),
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+	)
+	if err != nil {
+		t.Fatalf("dial bufnet: %v", err)
+	}
+	t.Cleanup(func() { conn.Close() })
+
+	client := apiv1.NewUserServiceClient(conn)
+	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+	defer cancel()
+
+	resp, err := client.ListUsers(ctx, &apiv1.ListUsersRequest{PageSize: 10})
+	if code := status.Code(err); code != codes.OK {
+		t.Fatalf("expected OK, got %v: %v", code, err)
+	}
+	if resp == nil {
+		t.Fatal("expected non-nil response")
+	}
+}
+```
+
+## Streaming Handler Pattern
+
+Always check `ctx.Done()` in streaming loops. Never expose raw internal errors to clients.
+
+```go
+func (s *userService) WatchUsers(
+	req *apiv1.WatchUsersRequest,
+	stream apiv1.UserService_WatchUsersServer,
+) error {
+	ctx := stream.Context()
+
+	events := s.subscribeUserEvents()
+	defer s.unsubscribe(events)
+
+	for {
+		select {
+		case <-ctx.Done():
+			// Client disconnected or deadline exceeded; exit cleanly.
+			return status.Error(codes.Canceled, "client disconnected")
+
+		case event, ok := <-events:
+			if !ok {
+				// Channel closed; server is shutting down.
+				return status.Error(codes.Unavailable, "service shutting down")
+			}
+
+			if err := stream.Send(event); err != nil {
+				// Log the raw error server-side for diagnostics.
+				log.Printf("stream send failed: %v", err)
+				// Return a generic message to the client; never leak raw err.
+				return status.Error(codes.Internal, "failed to send event")
+			}
+		}
+	}
+}
+```
+
+## Error Mapping
+
+Map domain errors to gRPC status codes consistently:
+
+Only return `err.Error()` to clients when it is a safe, user-facing domain message (not an internal error string).
+
+```go
+package service
+
+import (
+	"errors"
+
+	"google.golang.org/grpc/codes"
+	"google.golang.org/grpc/status"
+)
+
+var (
+	ErrNotFound       = errors.New("resource not found")
+	ErrAlreadyExists  = errors.New("resource already exists")
+	ErrInvalidInput   = errors.New("invalid input")
+	ErrPermission     = errors.New("permission denied")
+)
+
+// toGRPCError maps a domain error to a gRPC status error.
+func toGRPCError(err error) error {
+	if err == nil {
+		return nil
+	}
+	switch {
+	case errors.Is(err, ErrNotFound):
+		return status.Error(codes.NotFound, err.Error())
+	case errors.Is(err, ErrAlreadyExists):
+		return status.Error(codes.AlreadyExists, err.Error())
+	case errors.Is(err, ErrInvalidInput):
+		return status.Error(codes.InvalidArgument, err.Error())
+	case errors.Is(err, ErrPermission):
+		return status.Error(codes.PermissionDenied, err.Error())
+	default:
+		return status.Error(codes.Internal, "internal error")
+	}
+}
+```
+
+## Project Layout
+
+```
+project/
+  buf.gen.yaml
+  buf.yaml
+  proto/
+    api/
+      v1/
+        user_service.proto
+  gen/                          # Generated code (committed or gitignored)
+    api/
+      v1/
+        user_service.pb.go
+        user_service_grpc.pb.go
+  internal/
+    service/
+      user.go                  # Service implementation
+      user_test.go             # bufconn tests
+    domain/
+      errors.go                # Domain error definitions
+  cmd/
+    server/
+      main.go                  # Server entrypoint with graceful shutdown
+  config/
+    config.go                  # Env-based config (timeouts, TLS paths, limits)
+```
+
+## Safety Checklist
+
+- Default to TLS/mTLS for all production traffic.
+- Enforce limits (`MaxRecvMsgSize`, `MaxSendMsgSize`, metadata size) to reduce resource exhaustion.
+- Treat client-sent metadata as untrusted; validate and allowlist keys used for auth or tenant routing.
+- Disable gRPC reflection in production to avoid exposing internal service schemas.
+- Check `context.Done()` in every iteration of a streaming handler to prevent goroutine leaks.
+
+## Anti-Patterns
+
+| Anti-Pattern                                  | Why It Hurts                                                                                  | Fix                                                          |
+| --------------------------------------------- | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
+| Create new `grpc.ClientConn` per request      | Exhausts OS sockets and disables HTTP/2 multiplexing, causing high latency and resource leaks | Initialize once, reuse globally                              |
+| Mix Protobuf v1 and v2 libraries              | Causes silent marshaling bugs; `proto.Marshal` from v1 and v2 are NOT interchangeable         | Pin to `google.golang.org/protobuf` (v2) throughout          |
+| Expose raw internal error strings to clients  | Leaks stack traces and internal service names; a security and UX risk                         | Map errors with `status.Errorf` using appropriate gRPC codes |
+| Ignore `context.Done()` in streaming handlers | Goroutine and connection leak when client disconnects                                         | Check `ctx.Err()` in every iteration of a streaming loop     |
+| Skip error handling with `_ =`                | Hides failures silently; production outages become undiagnosable                              | Always check and handle errors explicitly                    |
+| Use `grpc.Dial` without health checks         | Connection failures are deferred and may surface as runtime errors                            | Use health checks and monitor connection state               |
+
+> **Migration note**: For gRPC-Go v1.63+ (Jan 2024), `grpc.NewClient` is the newer API recommended by the gRPC-Go project for new code. For older versions (or when following existing codebases and official grpc.io examples), using `grpc.Dial` / `grpc.DialContext` is still common.