Skip to content

Testing Guide

Complete guide to testing Shannot - running tests, writing new tests, and ensuring code quality.

Quick Start

# Install dev dependencies
pip install -e ".[dev,mcp]"

# Run all tests
pytest

# Run only unit tests (works on any platform)
pytest -m "not linux_only and not requires_bwrap"

# Run with coverage
pytest --cov=shannot --cov-report=term

# Run specific test file
pytest tests/test_tools.py -v

# Get help
pytest --help

Test Structure

tests/
├── conftest.py              # Shared fixtures and pytest configuration
├── test_sandbox.py          # Core sandbox functionality tests
├── test_cli.py              # CLI command tests
├── test_integration.py      # Integration tests (requires Linux + bwrap)
├── test_tools.py            # MCP tools layer tests (NEW)
├── test_mcp_server.py       # MCP server tests (NEW)
├── test_mcp_integration.py  # MCP integration tests (NEW)
└── test_mcp_security.py     # MCP security tests (NEW)

Running Tests

Quick Start

# Install dev dependencies
pip install -e ".[dev,mcp]"

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run specific test file
pytest tests/test_tools.py

# Run tests matching a pattern
pytest -k "test_mcp"

Test Categories

1. Unit Tests (Run on any platform)

# All unit tests
pytest tests/test_tools.py tests/test_mcp_server.py -v

# Specific test class
pytest tests/test_tools.py::TestCommandInput -v

# Specific test
pytest tests/test_tools.py::TestRunCommand::test_successful_command -v

Coverage: 41 tests, all passing on macOS/Windows/Linux

2. Integration Tests (Require Linux + bubblewrap)

# Run integration tests (Linux only)
pytest tests/test_mcp_integration.py -v

# Run with integration marker
pytest -m integration

Coverage: 19 tests for real sandbox execution

3. Security Tests (Require Linux + bubblewrap)

# Run security tests (Linux only)
pytest tests/test_mcp_security.py -v

# Run specific security test class
pytest tests/test_mcp_security.py::TestCommandInjectionPrevention -v

Coverage: 27 tests for security validation

Test Markers

Tests are organized with pytest markers:

# Skip Linux-only tests
pytest -m "not linux_only"

# Skip tests requiring bubblewrap
pytest -m "not requires_bwrap"

# Skip integration tests
pytest -m "not integration"

# Run only unit tests (skip Linux/bwrap requirements)
pytest -m "not linux_only and not requires_bwrap"

Test Coverage

Current Status

Total Tests: 112
├── Unit Tests: 66 (tools, mcp_server, cli, sandbox)
├── Integration Tests: 19 (mcp_integration)
└── Security Tests: 27 (mcp_security)

Pass Rate: 100% (63 passed on macOS, 49 skipped - Linux-only)

Coverage by Module

Module Unit Tests Integration Tests Security Tests Total
tools.py 25 10 0 35
mcp_server.py 16 3 0 19
sandbox.py 7 0 0 7
cli.py 4 0 0 4
Security 0 6 27 33
Total 52 19 27 98

What's Tested

✅ tools.py

  • [x] SandboxDeps initialization
  • [x] Input model validation (CommandInput, FileReadInput, etc.)
  • [x] Output model validation (CommandOutput)
  • [x] run_command tool
  • [x] read_file tool
  • [x] list_directory tool (with options)
  • [x] check_disk_usage tool
  • [x] check_memory tool
  • [x] search_files tool
  • [x] grep_content tool (simple and recursive)
  • [x] Error handling

✅ mcp_server.py

  • [x] Server initialization
  • [x] Profile loading
  • [x] Profile discovery
  • [x] Tool registration
  • [x] Tool description generation
  • [x] Command output formatting
  • [x] Resource listing
  • [x] Resource reading
  • [x] Error handling
  • [x] Multiple profile support

✅ Integration Tests

  • [x] Real sandbox execution
  • [x] Command execution (ls, echo, cat, etc.)
  • [x] File reading
  • [x] Directory listing
  • [x] Disk usage check
  • [x] Memory check
  • [x] Command duration tracking
  • [x] Disallowed command blocking
  • [x] Ephemeral /tmp

✅ Security Tests

  • [x] Command injection prevention (semicolon, pipe, backticks, $())
  • [x] Path traversal mitigation
  • [x] Command allowlist enforcement
  • [x] Read-only enforcement
  • [x] Network isolation
  • [x] Input validation
  • [x] Special character handling

Writing New Tests

Test Template

import pytest
from shannot.tools import CommandInput, SandboxDeps, run_command

@pytest.mark.asyncio
async def test_my_new_feature(sandbox_deps):
    """Test description."""
    # Arrange
    mock_result = ProcessResult(
        command=("ls",),
        stdout="output",
        stderr="",
        returncode=0,
        duration=0.1,
    )
    sandbox_deps.manager.run.return_value = mock_result

    # Act
    cmd_input = CommandInput(command=["ls"])
    result = await run_command(sandbox_deps, cmd_input)

    # Assert
    assert result.succeeded is True
    assert result.stdout == "output"

Integration Test Template

import pytest
from shannot.tools import CommandInput, SandboxDeps, run_command

@pytest.mark.linux_only
@pytest.mark.requires_bwrap
@pytest.mark.integration
@pytest.mark.asyncio
async def test_real_execution(profile_json_minimal, bwrap_path):
    """Test with real sandbox execution."""
    deps = SandboxDeps(profile_path=profile_json_minimal, bwrap_path=bwrap_path)

    cmd_input = CommandInput(command=["echo", "hello"])
    result = await run_command(deps, cmd_input)

    assert result.succeeded is True
    assert "hello" in result.stdout

Security Test Template

import pytest
from shannot.tools import CommandInput, SandboxDeps, run_command

@pytest.mark.linux_only
@pytest.mark.requires_bwrap
@pytest.mark.asyncio
async def test_command_injection_blocked(security_test_deps):
    """Test that command injection is prevented."""
    # Try to inject second command
    cmd_input = CommandInput(command=["ls", "/; rm -rf /"])
    result = await run_command(security_test_deps, cmd_input)

    # Semicolon should be treated as literal argument
    assert "rm" not in result.stdout

Test Fixtures

Available Fixtures (from conftest.py)

@pytest.fixture
def temp_dir() -> Path:
    """Temporary directory cleaned up after test."""

@pytest.fixture
def minimal_profile() -> SandboxProfile:
    """Minimal valid sandbox profile."""

@pytest.fixture
def bwrap_path() -> Path:
    """Path to bubblewrap executable."""

@pytest.fixture
def profile_json_minimal(temp_dir) -> Path:
    """Minimal profile JSON file."""

@pytest.fixture
def sandbox_deps(mock_profile, mock_manager):
    """Mocked sandbox dependencies for unit tests."""

Continuous Integration

GitHub Actions

Tests run automatically on: - Push to main branch - Pull requests - Manual workflow dispatch

See .github/workflows/test.yml for configuration.

Test Matrix

  • Python: 3.9, 3.10, 3.11, 3.12, 3.13
  • OS: Ubuntu (integration tests), macOS (unit tests), Windows (unit tests)

Coverage Reports

Generate coverage report:

# Run tests with coverage
pytest --cov=shannot --cov-report=html --cov-report=term

# Open HTML report
open htmlcov/index.html

Current coverage: ~85% for MCP integration code

Debugging Failed Tests

Verbose Output

# Show full output
pytest -v -s

# Show local variables on failure
pytest -l

# Drop into debugger on failure
pytest --pdb

Common Issues

1. "ProcessResult missing argument: command"

Fix: Ensure all ProcessResult instances include command parameter:

# Wrong
mock_result = ProcessResult(stdout="out", stderr="", returncode=0, duration=0.1)

# Correct
mock_result = ProcessResult(
    command=("ls",),
    stdout="out",
    stderr="",
    returncode=0,
    duration=0.1
)

2. "Test requires Linux platform"

Cause: Test marked with @pytest.mark.linux_only

Fix: Run on Linux or skip with: pytest -m "not linux_only"

3. "Test requires bubblewrap"

Cause: Test requires bwrap to be installed

Fix: Install bubblewrap or skip with: pytest -m "not requires_bwrap"

Best Practices

Do's ✅

  • Use async/await for tool tests (tools are async)
  • Mock subprocess calls in unit tests
  • Use real execution in integration tests
  • Add descriptive docstrings to all tests
  • Test error cases as well as success cases
  • Use appropriate markers (linux_only, requires_bwrap, integration)

Don'ts ❌

  • Don't skip markers without good reason
  • Don't test implementation details (test behavior)
  • Don't make tests depend on each other
  • Don't hardcode paths (use fixtures)
  • Don't forget to test error handling

Performance Benchmarks

Expected Test Duration

  • Unit tests: < 1 second
  • Integration tests (single): < 1 second
  • Full integration suite: < 10 seconds
  • Security suite: < 15 seconds
  • All tests: < 30 seconds

If tests take longer, consider: 1. Mocking expensive operations 2. Reducing test data size 3. Parallelizing with pytest-xdist

Future Improvements

Planned

  • [ ] Add performance benchmarks
  • [ ] Add stress tests (many concurrent operations)
  • [ ] Add end-to-end tests with real Claude Desktop
  • [ ] Add fuzz testing for input validation
  • [ ] Increase coverage to 95%+

Nice to Have

  • [ ] Visual regression tests for CLI output
  • [ ] Load testing for MCP server
  • [ ] Property-based testing with Hypothesis
  • [ ] Mutation testing with mutmut

Contributing

When adding new features:

  1. Write tests first (TDD)
  2. Ensure all tests pass: pytest
  3. Check coverage: pytest --cov=shannot
  4. Run linter: ruff check .
  5. Run type checker: basedpyright
  6. Update this document if adding new test patterns

Questions?


Last Updated: 2025-10-20 Test Count: 112 (63 passing, 49 skipped on macOS) Coverage: ~85% for MCP code