Testing Guide¶
Complete guide to testing Shannot - running tests, writing new tests, and ensuring code quality.
Quick Start¶
# Install dev dependencies
pip install -e ".[dev,mcp]"
# Run all tests
pytest
# Run only unit tests (works on any platform)
pytest -m "not linux_only and not requires_bwrap"
# Run with coverage
pytest --cov=shannot --cov-report=term
# Run specific test file
pytest tests/test_tools.py -v
# Get help
pytest --help
Test Structure¶
tests/
├── conftest.py # Shared fixtures and pytest configuration
├── test_sandbox.py # Core sandbox functionality tests
├── test_cli.py # CLI command tests
├── test_integration.py # Integration tests (requires Linux + bwrap)
├── test_tools.py # MCP tools layer tests (NEW)
├── test_mcp_server.py # MCP server tests (NEW)
├── test_mcp_integration.py # MCP integration tests (NEW)
└── test_mcp_security.py # MCP security tests (NEW)
Running Tests¶
Quick Start¶
# Install dev dependencies
pip install -e ".[dev,mcp]"
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run specific test file
pytest tests/test_tools.py
# Run tests matching a pattern
pytest -k "test_mcp"
Test Categories¶
1. Unit Tests (Run on any platform)¶
# All unit tests
pytest tests/test_tools.py tests/test_mcp_server.py -v
# Specific test class
pytest tests/test_tools.py::TestCommandInput -v
# Specific test
pytest tests/test_tools.py::TestRunCommand::test_successful_command -v
Coverage: 41 tests, all passing on macOS/Windows/Linux
2. Integration Tests (Require Linux + bubblewrap)¶
# Run integration tests (Linux only)
pytest tests/test_mcp_integration.py -v
# Run with integration marker
pytest -m integration
Coverage: 19 tests for real sandbox execution
3. Security Tests (Require Linux + bubblewrap)¶
# Run security tests (Linux only)
pytest tests/test_mcp_security.py -v
# Run specific security test class
pytest tests/test_mcp_security.py::TestCommandInjectionPrevention -v
Coverage: 27 tests for security validation
Test Markers¶
Tests are organized with pytest markers:
# Skip Linux-only tests
pytest -m "not linux_only"
# Skip tests requiring bubblewrap
pytest -m "not requires_bwrap"
# Skip integration tests
pytest -m "not integration"
# Run only unit tests (skip Linux/bwrap requirements)
pytest -m "not linux_only and not requires_bwrap"
Test Coverage¶
Current Status¶
Total Tests: 112
├── Unit Tests: 66 (tools, mcp_server, cli, sandbox)
├── Integration Tests: 19 (mcp_integration)
└── Security Tests: 27 (mcp_security)
Pass Rate: 100% (63 passed on macOS, 49 skipped - Linux-only)
Coverage by Module¶
| Module | Unit Tests | Integration Tests | Security Tests | Total |
|---|---|---|---|---|
| tools.py | 25 | 10 | 0 | 35 |
| mcp_server.py | 16 | 3 | 0 | 19 |
| sandbox.py | 7 | 0 | 0 | 7 |
| cli.py | 4 | 0 | 0 | 4 |
| Security | 0 | 6 | 27 | 33 |
| Total | 52 | 19 | 27 | 98 |
What's Tested¶
✅ tools.py¶
- [x] SandboxDeps initialization
- [x] Input model validation (CommandInput, FileReadInput, etc.)
- [x] Output model validation (CommandOutput)
- [x] run_command tool
- [x] read_file tool
- [x] list_directory tool (with options)
- [x] check_disk_usage tool
- [x] check_memory tool
- [x] search_files tool
- [x] grep_content tool (simple and recursive)
- [x] Error handling
✅ mcp_server.py¶
- [x] Server initialization
- [x] Profile loading
- [x] Profile discovery
- [x] Tool registration
- [x] Tool description generation
- [x] Command output formatting
- [x] Resource listing
- [x] Resource reading
- [x] Error handling
- [x] Multiple profile support
✅ Integration Tests¶
- [x] Real sandbox execution
- [x] Command execution (ls, echo, cat, etc.)
- [x] File reading
- [x] Directory listing
- [x] Disk usage check
- [x] Memory check
- [x] Command duration tracking
- [x] Disallowed command blocking
- [x] Ephemeral /tmp
✅ Security Tests¶
- [x] Command injection prevention (semicolon, pipe, backticks, $())
- [x] Path traversal mitigation
- [x] Command allowlist enforcement
- [x] Read-only enforcement
- [x] Network isolation
- [x] Input validation
- [x] Special character handling
Writing New Tests¶
Test Template¶
import pytest
from shannot.tools import CommandInput, SandboxDeps, run_command
@pytest.mark.asyncio
async def test_my_new_feature(sandbox_deps):
"""Test description."""
# Arrange
mock_result = ProcessResult(
command=("ls",),
stdout="output",
stderr="",
returncode=0,
duration=0.1,
)
sandbox_deps.manager.run.return_value = mock_result
# Act
cmd_input = CommandInput(command=["ls"])
result = await run_command(sandbox_deps, cmd_input)
# Assert
assert result.succeeded is True
assert result.stdout == "output"
Integration Test Template¶
import pytest
from shannot.tools import CommandInput, SandboxDeps, run_command
@pytest.mark.linux_only
@pytest.mark.requires_bwrap
@pytest.mark.integration
@pytest.mark.asyncio
async def test_real_execution(profile_json_minimal, bwrap_path):
"""Test with real sandbox execution."""
deps = SandboxDeps(profile_path=profile_json_minimal, bwrap_path=bwrap_path)
cmd_input = CommandInput(command=["echo", "hello"])
result = await run_command(deps, cmd_input)
assert result.succeeded is True
assert "hello" in result.stdout
Security Test Template¶
import pytest
from shannot.tools import CommandInput, SandboxDeps, run_command
@pytest.mark.linux_only
@pytest.mark.requires_bwrap
@pytest.mark.asyncio
async def test_command_injection_blocked(security_test_deps):
"""Test that command injection is prevented."""
# Try to inject second command
cmd_input = CommandInput(command=["ls", "/; rm -rf /"])
result = await run_command(security_test_deps, cmd_input)
# Semicolon should be treated as literal argument
assert "rm" not in result.stdout
Test Fixtures¶
Available Fixtures (from conftest.py)¶
@pytest.fixture
def temp_dir() -> Path:
"""Temporary directory cleaned up after test."""
@pytest.fixture
def minimal_profile() -> SandboxProfile:
"""Minimal valid sandbox profile."""
@pytest.fixture
def bwrap_path() -> Path:
"""Path to bubblewrap executable."""
@pytest.fixture
def profile_json_minimal(temp_dir) -> Path:
"""Minimal profile JSON file."""
@pytest.fixture
def sandbox_deps(mock_profile, mock_manager):
"""Mocked sandbox dependencies for unit tests."""
Continuous Integration¶
GitHub Actions¶
Tests run automatically on: - Push to main branch - Pull requests - Manual workflow dispatch
See .github/workflows/test.yml for configuration.
Test Matrix¶
- Python: 3.9, 3.10, 3.11, 3.12, 3.13
- OS: Ubuntu (integration tests), macOS (unit tests), Windows (unit tests)
Coverage Reports¶
Generate coverage report:
# Run tests with coverage
pytest --cov=shannot --cov-report=html --cov-report=term
# Open HTML report
open htmlcov/index.html
Current coverage: ~85% for MCP integration code
Debugging Failed Tests¶
Verbose Output¶
# Show full output
pytest -v -s
# Show local variables on failure
pytest -l
# Drop into debugger on failure
pytest --pdb
Common Issues¶
1. "ProcessResult missing argument: command"¶
Fix: Ensure all ProcessResult instances include command parameter:
# Wrong
mock_result = ProcessResult(stdout="out", stderr="", returncode=0, duration=0.1)
# Correct
mock_result = ProcessResult(
command=("ls",),
stdout="out",
stderr="",
returncode=0,
duration=0.1
)
2. "Test requires Linux platform"¶
Cause: Test marked with @pytest.mark.linux_only
Fix: Run on Linux or skip with: pytest -m "not linux_only"
3. "Test requires bubblewrap"¶
Cause: Test requires bwrap to be installed
Fix: Install bubblewrap or skip with: pytest -m "not requires_bwrap"
Best Practices¶
Do's ✅¶
- Use async/await for tool tests (tools are async)
- Mock subprocess calls in unit tests
- Use real execution in integration tests
- Add descriptive docstrings to all tests
- Test error cases as well as success cases
- Use appropriate markers (linux_only, requires_bwrap, integration)
Don'ts ❌¶
- Don't skip markers without good reason
- Don't test implementation details (test behavior)
- Don't make tests depend on each other
- Don't hardcode paths (use fixtures)
- Don't forget to test error handling
Performance Benchmarks¶
Expected Test Duration¶
- Unit tests: < 1 second
- Integration tests (single): < 1 second
- Full integration suite: < 10 seconds
- Security suite: < 15 seconds
- All tests: < 30 seconds
If tests take longer, consider:
1. Mocking expensive operations
2. Reducing test data size
3. Parallelizing with pytest-xdist
Future Improvements¶
Planned¶
- [ ] Add performance benchmarks
- [ ] Add stress tests (many concurrent operations)
- [ ] Add end-to-end tests with real Claude Desktop
- [ ] Add fuzz testing for input validation
- [ ] Increase coverage to 95%+
Nice to Have¶
- [ ] Visual regression tests for CLI output
- [ ] Load testing for MCP server
- [ ] Property-based testing with Hypothesis
- [ ] Mutation testing with mutmut
Contributing¶
When adding new features:
- Write tests first (TDD)
- Ensure all tests pass:
pytest - Check coverage:
pytest --cov=shannot - Run linter:
ruff check . - Run type checker:
basedpyright - Update this document if adding new test patterns
Questions?¶
- Check existing tests for examples
- See pytest documentation
- See pytest-asyncio documentation
- Ask in GitHub issues
Last Updated: 2025-10-20 Test Count: 112 (63 passing, 49 skipped on macOS) Coverage: ~85% for MCP code