Smoke Test

{{

Software development team reviewing automated smoke test results on a monitor dashboard
}}

Definition and Purpose

A smoke test is a lightweight automated verification procedure that executes a software pipeline end-to-end on representative or minimal input data to confirm that the pipeline runs to completion without crashing and produces output files with the expected structure. The test performs a binary pass/fail assessment — if any step raises an unhandled exception produces no output or generates output missing critical columns the test fails and the build is rejected immediately.

The term originates from hardware engineering where the metaphor is literal. When engineers powered on a newly assembled circuit board for the first time they would watch for smoke escaping from burned-out components. If smoke appeared the board had a catastrophic failure and all further testing stopped. The defective board was either repaired or discarded before anyone invested time in detailed diagnostics. Software smoke testing applies the identical principle: run the code and check for “smoke” — crashes uncaught exceptions missing outputs — before investing time in detailed validation.

In the context of inspection software pipelines a smoke test validates that every stage of the processing chain can execute on representative data. For TarmacView this means feeding a small runway or pavement image through the entire pipeline — from image ingestion through crack detection defect classification surface assessment survey mapping and visualization — and verifying that each stage produces output without errors. This is particularly critical for aviation software where pipeline failures can delay机场 infrastructure assessments that directly impact flight safety.

The term build verification test (BVT) is often used interchangeably with smoke test Microsoft being the most prominent adopter of this terminology. Microsoft’s internal development processes — particularly for Windows and Office — institutionalized smoke testing as a mandatory quality gate in the 1990s. Steve McConnell’s Code Complete identifies the “daily build and smoke test” as the highest industry best practice for continuous integration maturity. Google’s Site Reliability Engineering (SRE) framework positions smoke tests under “system tests” — the simplest type used specifically to short-circuit expensive testing pipelines.

The purpose of smoke testing is threefold:

  1. Fail fast — detect catastrophic failures within seconds of a code change before expensive full-pipeline runs consume compute resources and developer time.
  2. Gate keep — prevent unstable builds from progressing to staging or production environments where they would waste testing infrastructure and block other teams.
  3. Feedback — provide developers with immediate signal that their changes did not break core pipeline execution often within 60 seconds of commit.

Smoke tests occupy the first tier of the testing pyramid executing before unit tests integration tests and end-to-end regression suites. They are designed to complete in under 60 seconds for a typical inspection pipeline making them suitable for execution on every code commit in CI/CD. In high-maturity software organizations smoke tests run on every push to any branch not just the main branch providing the earliest possible detection of integration failures.

Historical Origins and Industry Adoption

The smoke testing concept predates software engineering by decades. The first documented use of the phrase appears in plumbing and stove testing from the 19th century. Stove manufacturers would light a fire inside a newly assembled stove close all dampers and observe where smoke escaped — if smoke emerged from unintended locations the stove had construction defects. This same logic of “apply minimal power and observe where things break” migrated to electronics testing in the mid-20th century and finally to software engineering in the 1980s and 1990s.

Microsoft is widely credited with bringing smoke testing into mainstream software engineering practice. In the late 1990s Microsoft’s Windows and Office divisions adopted what they called Build Verification Testing (BVT) as a mandatory gating process. Every nightly build had to pass a suite of BVTs before the build was released to internal testers. If the BVT failed the build was “broken” and the developer responsible was paged regardless of the time of day. This culture of immediate accountability for build quality became foundational to Microsoft’s engineering culture and was documented extensively in MSDN documentation and Microsoft Press books.

The Crosslake Technologies taxonomy (derived directly from Microsoft practice) draws a distinction between smoke tests and BVTs. Smoke tests are described as “cursory — ensure basic functionality works” running in minutes and focused on critical functions. BVTs are described as “a superset of smoke tests” that are “slightly more thorough” but still run in minutes not hours. Both serve the same gate-keeping function but BVTs include a wider set of critical-path scenarios.

Google adopted smoke testing through its SRE (Site Reliability Engineering) practices. The Google Testing Blog treats smoke testing as a known assumed practice focusing more on how to weight smoke test results alongside other test types. Google’s engineering culture emphasizes precision in unit tests and weighted confidence scoring for smoke-test-style broad checks treating smoke tests as complementary to rather than replacements for rigorous unit testing.

In the aviation software domain smoke testing maps to the Hardware/Software Integration Testing phase described in DO-178C Section 6.4.3. While DO-178C does not explicitly name “smoke testing” the standard requires that the integrated system “boots and basic functions work” before deeper testing proceeds. This is functionally identical to smoke testing. For airport inspection software operating under ICAO Annex 14 — which governs aerodrome design and operations — smoke tests provide the software reliability assurance needed to support Pavement Condition Index (PCI) assessments that feed into airport safety management systems.

Smoke Test vs Unit Test vs Integration Test

Understanding where smoke tests fit in the broader testing taxonomy is essential for building a balanced quality assurance strategy. The four main testing levels serve complementary but distinct roles each targeting different failure modes at different stages of the pipeline lifecycle.

DimensionSmoke TestUnit TestIntegration TestSystem Test
ScopeEntire pipeline end-to-endSingle function or methodTwo or more interacting componentsFull system in production-like environment
DataMinimal representative sampleMocked or stubbed inputsReal but limited dataProduction-scale data
Execution timeSeconds to under 1 minuteMillisecondsMinutes to hoursHours to days
What it catchesCrashes missing outputs import failuresLogic errors within a functionInterface mismatches protocol errorsEnd-to-end correctness performance
DependenciesReal (non-mocked)Mocked or stubbedReal subsetFull production stack
FrequencyEvery commit in CIEvery commit in CIPer build or nightlyPer release
Failure impactHalts the pipelineIsolated to single functionBlocks integration branchBlocks release

Unit tests validate that individual functions produce correct outputs for given inputs. They mock all external dependencies — databases file systems network services hardware accelerators. A unit test for a crack-detection function might verify that it correctly identifies cracks in a synthetic 10x10 pixel array with known crack positions. Unit tests are narrow and deep: they verify the logic of a single function exhaustively covering edge cases boundary conditions and error paths. However unit tests cannot catch integration failures because dependencies are mocked — the test never exercises the actual import chain configuration loading or inter-module data flow.

Integration tests validate that two or more components work together correctly. They use real but controlled instances of dependencies. An integration test for TarmacView might verify that the image ingestion stage correctly passes data to the crack-detection model in the expected tensor format with the correct channel ordering and normalization parameters. Integration tests are narrower than smoke tests in pipeline scope but deeper in interaction validation: they focus on specific component boundaries rather than the full pipeline.

Smoke tests validate that the entire pipeline executes without crashing. They run every stage in sequence with real though small data. Smoke tests are broad and shallow — they cover the full pipeline but only verify that execution completes and outputs exist not that numerical results are correct. A crack-length calculation that returns 47.2 pixels instead of the correct 42.1 pixels passes a smoke test as long as the column length_px exists and contains a floating-point value.

System tests (also called end-to-end tests) run the complete system with production-scale data in a production-like environment. They verify that the system meets its functional and non-functional requirements including accuracy performance and reliability. System tests are the most expensive to run and maintain and they execute only on release candidates not on every commit.

In the testing pyramid smoke tests form the base layer — they run first fastest and most frequently. If a smoke test fails unit and integration tests on that build are typically skipped or flagged as pre-emptively unreliable. This saves compute resources and developer time by failing fast when fundamental pipeline execution is broken.

Smoke Test Design — Representative Minimal Input

Effective smoke test design centers on the concept of representative minimal input — the smallest dataset that exercises every stage of the pipeline without triggering data-dependent edge cases. This is the single most important design decision in building a smoke test suite.

Principles of Smoke Test Input Selection

Principle 1: Minimal but not trivial. The input must be large enough to pass through every pipeline stage without taking code paths that bypass real processing logic. A single-pixel image is trivial — it would pass through image loading but the processing algorithms would take degenerate code paths that never execute on real data. A 256x256 pixel image of actual runway pavement is minimal yet representative: it exercises tile-based processing algorithms color normalization routines and model inference paths without requiring excessive compute time.

Principle 2: Representative of production data. The input should have the same file format color depth metadata structure and statistical properties as production data. If production data comes from a Phase One iXM-RS150F aerial camera capturing 16-bit TIFF files with embedded EXIF GPS metadata the smoke test input must match these characteristics. Using synthetic data that differs from production data defeats the purpose of the smoke test because pipeline failures often stem from unexpected properties of real data — missing EXIF tags unexpected color profiles non-standard GeoTIFF projection strings.

Principle 3: Fixed and version-controlled. Smoke test inputs must be checked into version control alongside the code as binary files or managed via Git LFS (Large File Storage). They should never change without explicit review through pull requests. A changing input makes it impossible to distinguish pipeline regressions from test data changes — a smoke test that passes today and fails tomorrow could indicate either a code regression or a modified test input. Version-controlling inputs eliminates this ambiguity.

Principle 4: Fast to process. Total smoke test suite execution should not exceed 60 seconds for a typical pipeline and 3 minutes for complex multi-stage pipelines like inspection software. This constraint drives the input size. For image-based inspection pipelines a single 512x512 pixel image is typically sufficient. For video pipelines a single frame or a 2-second clip. For LiDAR pipelines a single flight line segment covering 100 meters of pavement.

Principle 5: Contains known features. The test input must contain the features that each pipeline stage is designed to detect. A smoke test for crack detection is useless if the input image contains no cracks — the pipeline could silently skip crack detection and still pass. Test inputs must be curated to contain at least one instance of each detectable feature with known ground truth that can be referenced during failure analysis.

Smoke Test Input Example for TarmacView

For TarmacView’s inspection pipeline the smoke test input is a single 1024x768 pixel RGB orthophoto strip of airport pavement captured at 1mm ground sample distance (GSD). The image contains at least one visible crack one surface defect (e.g. raveling or spalling) and clear pavement markings. This single image:

  • Exercises the image ingestion stage — decode color correct georeference
  • Exercises the crack detection model — at least one crack present
  • Exercises the defect classification model — at least one defect present
  • Exercises the survey mapping stage — geospatial coordinate assignment
  • Exercises the visualization stage — overlay rendering with annotations
  • Exercises the assessment stage — PCI calculation or condition scoring

The expected processing time for this single image through the full pipeline is under 30 seconds on CI hardware leaving headroom for the remaining smoke tests in the suite. The test image is stored in the repository under tests/data/smoke/ and is version-controlled via Git LFS.

TarmacView Smoke Tests

TarmacView’s inspection pipeline is validated by a suite of dedicated smoke test scripts each covering a specific pipeline phase. These scripts live in the scripts/ directory following the naming convention smoke_<phase>.py. Each script is designed to be runnable both independently (for development debugging) and as part of the CI suite (for automated gating).

{{

Drone flying over airport runway at sunrise for pavement inspection data capture
}}

smoke_analyze.py

This script validates the image analysis phase. It reads a single representative orthophoto strip runs the full analysis pipeline through the image processing module and verifies:

  • The image is loaded and decoded without errors from the TIFF container format
  • Color correction and radiometric normalization complete successfully converting raw sensor values to reflectance
  • Georeferencing metadata is parsed and applied including UTM zone and datum information
  • The processed image has the expected dimensions (1024x768 pixels) and data type (float32)
  • Output analysis file is written to the expected path in the analysis output directory
  • Minimum and maximum pixel values fall within valid radiometric ranges

The script accepts a --input argument pointing to the test image and a --output-dir argument specifying where the pipeline should write results. If the pipeline crashes during any processing step the exception is caught and the test returns a structured failure report.

smoke_assess.py

This script validates the pavement condition assessment phase. It takes the output of the analysis phase (or a pre-generated analysis file stored in version control) runs the condition assessment algorithm and verifies:

  • The assessment function executes without errors on the input data
  • Condition indices (Pavement Condition Index PCI or equivalent) are computed
  • Output assessment file contains expected columns: pavement_id condition_index severity extent date_assessed
  • Assessment scores fall within valid numerical range (0-100 for PCI where 0 is failed and 100 is perfect)
  • The output file is written as a Parquet file with the expected schema

The assessment algorithm in TarmacView follows the ASTM D5340 standard for airport PCI calculation adapted for automated image-based inspection. The smoke test does not verify that the PCI values are numerically correct — it only verifies that the calculation runs produces results in the expected format and writes them to disk.

smoke_crack.py

This script validates the crack detection phase. It runs the crack-detection model on a test image known to contain cracks and verifies:

  • The model loads successfully from its saved weights file and produces inference without errors
  • PyTorch or TensorFlow runtime initializes correctly including CUDA/GPU initialization if available
  • Output crack mask has the same dimensions as the input image (1024x768)
  • At least one crack pixel is detected confirming the test image contains known cracks and the model is responsive
  • Crack properties file contains columns: crack_id length_px width_px orientation confidence classification
  • Output files are written to the expected paths in the cracks output directory

The crack detection model used by TarmacView is a U-Net architecture with a ResNet-50 backbone trained on labeled pavement images from multiple airports. The smoke test input image was specifically selected from the validation set meaning it contains cracks that the model has seen during training but not during test-set evaluation.

smoke_defect.py

This script validates the surface defect classification phase. It runs the defect classifier on a test image containing known defects (raveling spalling patching weathering) and verifies:

  • The classifier loads and produces predictions without errors
  • Output classification map matches input image dimensions
  • At least one defect class is predicted confirming known defects are present in the input
  • Defect inventory file contains columns: defect_id defect_type area_px severity confidence timestamp
  • Output geospatial file contains valid coordinates in the expected CRS (Coordinate Reference System)
  • Defect polygon counts are reasonable (not zero and not suspiciously high)

The defect classifier uses a Mask R-CNN architecture that produces instance segmentation masks for each detected defect. The smoke test checks that the output mask has the correct dimensions and that the inventory file contains the expected schema columns. It does not check that the defect classifications are correct — that requires a separate validation suite with labeled ground truth data.

smoke_survey.py

This script validates the survey mapping phase. It takes crack and defect data produced by upstream stages runs the geospatial survey mapping module and verifies:

  • Geospatial coordinate assignment completes without errors converting pixel coordinates to real-world coordinates
  • Output survey file contains valid latitude and longitude columns with values within the expected geographic bounds
  • Spatial relationships between detected features are preserved (cracks that are adjacent in pixel space remain adjacent in geographic space)
  • Survey output file contains expected projection metadata including EPSG code
  • File is written in the required output format (GeoJSON or Shapefile)
  • Geometry types are correct (LineString for cracks Polygon for defects Point for survey markers)

The survey mapping module uses the orthophoto georeferencing metadata to perform a projective transformation from pixel coordinates to geographic coordinates. The smoke test verifies that this transformation produces valid geographic coordinates within the expected bounding box for the test image. A photo of the test image location shot at the airport provides a visual cross-reference for failure analysis.

smoke_seg_head.py (Visualization Phase)

This script validates the visualization and segmentation head phase — the final stage of the pipeline that produces human-readable output. It takes the processed pipeline outputs runs the visualization renderer and verifies:

  • Overlay images render without errors combining crack masks defect polygons and assessment scores onto the base orthophoto
  • Output visualization matches input dimensions and color space
  • Legend and annotation elements are present including scale bar north arrow and color-coded severity legend
  • Report generation completes successfully producing an HTML or PDF summary
  • Final output files are written to the expected paths with correct file extensions (.png for images .html or .pdf for reports)
  • File sizes are non-zero and within expected ranges

The visualization phase is often the first place where cumulative pipeline errors become visible. A crack detection stage that produces a mask in the wrong coordinate space will generate a visualization where crack overlays are shifted relative to the base image. The smoke test detects this indirectly if the rendering crashes but it is the visual inspection regression test — not the smoke test — that catches visual quality regressions.

Test Orchestration and Results Format

These smoke tests are designed to run in sequence but can also execute independently if upstream outputs are cached. The full suite completes in under 3 minutes on CI hardware. Each test outputs a structured JSON report with fields:

FieldTypeDescription
test_namestringUnique name of the test (e.g. “smoke_crack”)
passedbooleanWhether the test passed (true) or failed (false)
duration_msintegerWall-clock execution time in milliseconds
output_filesarray of stringsPaths to output files created during the test
failure_reasonstring or nullHuman-readable failure reason if test failed
input_pathstringPath to the input data used for the test
pipeline_versionstringGit commit SHA or version tag of the pipeline code

The test runner aggregates individual JSON reports into a suite summary that is posted to the CI dashboard and optionally sent to Slack or email for real-time notification.

What Smoke Tests Verify

Smoke tests are deliberately narrow in what they assert. They verify three categories of conditions and nothing more. This narrow scope is intentional — it keeps tests fast deterministic and easy to maintain.

1. Pipeline Runs to Completion

The most fundamental assertion: does the code execute without raising an unhandled exception? This covers:

  • Import chains resolve correctly — all Python module imports succeed transitive dependencies are installed version conflicts do not exist
  • Configuration files parse without errors — YAML JSON or TOML configuration files are syntactically valid and contain expected keys
  • External dependencies are accessible — model weight files are present at expected paths database connections succeed API endpoints are reachable
  • Hardware accelerators are available — GPU is present and CUDA initializes correctly if the pipeline requires GPU inference
  • File system paths exist and are writable — output directories are created and have correct write permissions
  • Memory limits are not exceeded — the pipeline processes the test input without triggering out-of-memory errors

A pipeline that crashes on startup or mid-execution fails the smoke test immediately. The failure reason is captured from the exception traceback providing developers with a direct pointer to the code location of the failure.

2. Output Files Exist

After execution completes the smoke test checks that output files exist at the expected paths. This covers:

  • Main output files are present — results reports processed images model outputs
  • Intermediate files are present — if the pipeline contract specifies that intermediate outputs must be persisted the smoke test checks for them
  • File sizes are non-zero — a zero-byte output file indicates the stage produced no data which is a failure
  • File format signatures are valid — the file header matches the expected format (e.g. valid PNG magic bytes valid GeoJSON structure valid Parquet magic bytes)

A pipeline that claims to finish but produces no output files fails the smoke test. This catches silent failures where the pipeline exits cleanly but skips critical write operations due to misconfigured output paths or conditional logic that evaluates to false unexpectedly.

3. Key Columns Present in Tabular Output

For any tabular output (CSV Parquet GeoJSON feature collection) the smoke test verifies that expected columns exist by name. This is a structural integrity check — the column schema must match what downstream consumers expect.

Output TypeRequired Columns
Crack inventorycrack_id length_px width_px orientation confidence classification
Defect inventorydefect_id defect_type area_px severity confidence timestamp
Condition assessmentpavement_id condition_index severity extent date_assessed method
Survey mappingfeature_id latitude longitude geometry_type crs_epsg accuracy_m

The test uses a column-existence check (not a type check not a value-range check). This is intentional — column existence is the minimum structural integrity guarantee. Type and range checks belong in integration and validation tests where the cost of running them is justified by the depth of information they provide.

4. Data Format Compatibility

The fourth category of verification that smoke tests perform — often overlooked — is data format compatibility. The smoke test verifies that output files can be read back by the expected downstream consumers. For TarmacView this means:

  • Parquet files can be read by Pandas without schema errors
  • GeoJSON files can be parsed by shapely and geopandas
  • GeoTIFF files can be opened by rasterio with correct CRS metadata
  • JSON reports can be deserialized without syntax errors
  • CSV files have consistent row counts across columns

This catches format version mismatches — for example if the Parquet library is upgraded and changes its encoding or if the GeoJSON specification evolves and the pipeline output no longer conforms.

What Smoke Tests Do NOT Verify

Equally important is understanding what smoke tests deliberately exclude. Misunderstanding this leads to false confidence in pipeline correctness — a passing smoke test does not mean the pipeline is correct only that it is not catastrophically broken.

Numerical Accuracy

Smoke tests do not verify that computed values are correct. A crack-length calculation that returns 47.2 pixels instead of the correct 42.1 pixels passes a smoke test as long as the column length_px exists and contains a float. Accuracy validation belongs in unit tests where the correct value is hard-coded and integration tests where results are compared against manual measurements from certified inspectors.

For aviation inspection pipelines operating under ICAO Annex 14 numerical accuracy is critical because condition assessments directly inform maintenance prioritization and budget allocation. A pipeline that passes smoke tests but produces inaccurate PCI scores could lead to incorrect maintenance decisions. This is why smoke tests are only the first gate — they must be followed by accuracy-focused validation tests.

Edge Cases

Smoke tests use representative but non-adversarial inputs. They do not test:

  • Empty images — all-black or all-white images that contain no features
  • Corrupted files — truncated TIFF headers missing EXIF data zero-byte files
  • Extreme lighting conditions — overexposed or underexposed images shadowed images night-time captures
  • Images beyond expected resolution limits — 100-megapixel images or sub-100-pixel thumbnails
  • Missing metadata — images without georeferencing without EXIF GPS without camera calibration data
  • Network timeouts — API calls that hang or return 503 errors
  • Concurrent access — multiple pipeline instances writing to the same output directory

Edge-case handling is validated by dedicated edge-case tests that specifically target each boundary condition. These tests are more expensive to run and are typically executed nightly rather than on every commit.

Performance Characteristics

Smoke tests verify that the pipeline completes not that it completes within a performance budget. A pipeline that takes 10 seconds per image in smoke tests but is expected to process 100 images per second in production passes smoke tests without issue. Performance validation requires dedicated benchmark tests with production-scale data and timing assertions.

For airport inspection pipelines performance is critical because airports process hundreds of pavement images per survey. A 10x performance regression might still pass smoke tests on a single image but would make full surveys infeasible. Performance benchmarks with timing thresholds are the appropriate tool for detecting such regressions.

Regression in Non-Critical Features

Smoke tests cover only the core execution path. Features not in the critical path — alternative output formats optional logging telemetry audit trails experimental export features — are not covered. Regression in these areas must be caught by regression test suites that specifically target non-critical functionality.

Data Quality

Smoke tests do not verify that data values are internally consistent. For example a smoke test checks that crack_id column exists but does not verify that all crack_id values are unique non-null or within expected ranges. Data quality validation requires dedicated data quality tests using frameworks like Great Expectations or Pandera that define data contracts and validate datasets against them.

Smoke Tests in CI/CD

Smoke tests deliver maximum value when integrated into the continuous integration and continuous deployment (CI/CD) pipeline. Their placement in the pipeline workflow determines their effectiveness as a quality gate.

{{

CI/CD pipeline visualization showing data flowing through analysis processing stages with green validation checkmarks
}}

Pipeline Placement

In a typical CI/CD workflow smoke tests execute at the build verification stage immediately after compilation and before any other test suite:

Code Commit → Build → Lint → Unit Tests → Smoke Tests → Integration Tests → Validation Tests → Deploy

The exact ordering depends on the project’s testing strategy. Some organizations run unit tests before smoke tests reasoning that unit tests are faster and catch logic errors. Others run smoke tests first reasoning that a pipeline that crashes on basic execution should be rejected without spending time on unit tests. The TarmacView pipeline runs unit tests and smoke tests in parallel after a successful build since they cover independent failure modes and have no interdependencies.

This placement ensures that if the build produces a pipeline that crashes on basic execution no further test infrastructure is consumed. The feedback loop is measured in minutes rather than hours. A developer who pushes a commit that breaks the import chain receives a CI failure notification within 2-3 minutes rather than waiting 4 hours for the integration test suite to fail.

Gate Logic

The CI/CD pipeline uses a hard gate: if any smoke test fails the pipeline halts and does not proceed to subsequent stages. The build is marked as failed and developers are notified with the smoke test failure report. No manual override is permitted in the default configuration — a passing smoke test suite is a necessary condition for deployment.

This gate logic prevents the following scenarios:

  • A team runs integration tests for 4 hours only to discover the pipeline crashes on data load — wasting compute and blocking other builds
  • A release candidate is deployed to staging where the database schema was silently changed during deployment configuration
  • A production deployment breaks due to a missing import or file path that was introduced in the last commit

The gate is implemented in the CI platform configuration. For CircleCI this means configuring workflow dependencies so that the smoke-test job runs before integration-test and deploy. For GitHub Actions this means using the needs: keyword to enforce job ordering.

Parallel Execution and Concurrency

Smoke tests within the suite can run in parallel if they test independent pipeline stages. TarmacView’s smoke test suite uses parallelism where possible:

  • smoke_crack.py and smoke_defect.py have no interdependencies and can execute concurrently reducing total suite time by 40%
  • smoke_analyze.py must complete before smoke_assess.py since assessment consumes analysis output
  • smoke_survey.py depends on crack and defect outputs
  • smoke_seg_head.py depends on all upstream outputs

The parallel execution strategy is configured in the CI pipeline YAML using job-level parallelism. Each job runs in its own container with isolated dependencies preventing resource contention.

Reporting and Notification

CI/CD integration includes automated reporting with multiple channels:

  • Pass/Fail status on the CI dashboard with color-coded indicators (green for pass red for fail)
  • Duration tracked per test for trend analysis — a gradual increase in smoke test duration may indicate performance regression
  • Failure logs captured as CI artifacts for developer inspection with full stack traces
  • Slack or email notifications on test failure with links to logs and the offending commit
  • Trend graphs showing pass rate over time with target of >99% on main branch
  • Flaky test detection — a test that alternates between pass and fail on the same commit is flagged for investigation

CI/CD Configuration Example

# .circleci/config.yml
version: 2.1

jobs:
  smoke-test:
    docker:
      - image: tarmacview/pipeline:ci
    parallelism: 3
    steps:
      - checkout
      - run: pip install -r requirements.txt
      - run: python scripts/smoke_analyze.py --input tests/data/smoke_image.tif --output-dir /tmp/smoke-output
      - run:
          name: "Parallel smoke tests"
          command: |
            python scripts/smoke_crack.py --input tests/data/smoke_image.tif --output-dir /tmp/smoke-output &
            python scripts/smoke_defect.py --input tests/data/smoke_image.tif --output-dir /tmp/smoke-output &
            wait
      - run: python scripts/smoke_assess.py --input /tmp/smoke-output/cracks.parquet --output-dir /tmp/smoke-output
      - run: python scripts/smoke_survey.py --input /tmp/smoke-output --output-dir /tmp/smoke-output
      - run: python scripts/smoke_seg_head.py --input /tmp/smoke-output --output-dir /tmp/smoke-output
      - store_artifacts:
          path: /tmp/smoke-output

workflows:
  version: 2
  build-and-test:
    jobs:
      - build
      - unit-tests:
          requires: [build]
      - smoke-test:
          requires: [build]
      - integration-tests:
          requires: [smoke-test]
      - deploy:
          requires: [integration-tests]

Writing Smoke Tests

Writing effective smoke tests requires discipline. The test must be fast reliable and focused only on what it is designed to detect. Every smoke test follows the same fundamental pattern: run the pipeline stage check for output existence verify minimal schema correctness report pass/fail in structured JSON.

Structure of a Smoke Test

# smoke_crack.py — simplified pattern
import sys
import json
import time
from pathlib import Path
import pandas as pd

def test_smoke_crack(input_path: str, output_dir: str) -> dict:
    result = {
        "test_name": "smoke_crack",
        "passed": False,
        "duration_ms": 0,
        "output_files": [],
        "failure_reason": None,
        "input_path": input_path,
        "pipeline_version": "unknown"
    }

    start = time.time()

    try:
        # Step 1: Get pipeline version
        from tarmacview import __version__
        result["pipeline_version"] = __version__

        # Step 2: Run the pipeline stage with real imports
        from tarmacview.pipeline.crack import detect_cracks
        output = detect_cracks(input_path, output_dir)

        # Step 3: Verify output file exists and is non-empty
        output_path = Path(output["crack_mask_path"])
        assert output_path.exists(), f"Output file not found: {output_path}"
        assert output_path.stat().st_size > 0, f"Output file is empty: {output_path}"
        result["output_files"].append(str(output_path))

        # Step 4: Verify key columns in tabular output
        if output.get("crack_inventory_path"):
            inv_path = Path(output["crack_inventory_path"])
            assert inv_path.exists(), f"Inventory file not found: {inv_path}"
            df = pd.read_parquet(inv_path)
            required_columns = [
                "crack_id", "length_px", "width_px",
                "orientation", "confidence", "classification"
            ]
            for col in required_columns:
                assert col in df.columns, f"Missing required column: {col}"
            result["output_files"].append(str(inv_path))

        # Step 5: Verify output mask dimensions match input
        from PIL import Image
        input_img = Image.open(input_path)
        mask_img = Image.open(output_path)
        assert input_img.size == mask_img.size, \
            f"Mask dimensions {mask_img.size} do not match input {input_img.size}"

        result["passed"] = True

    except Exception as e:
        result["failure_reason"] = f"{type(e).__name__}: {str(e)}"
        result["passed"] = False

    result["duration_ms"] = int((time.time() - start) * 1000)
    return result

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--input", required=True)
    parser.add_argument("--output-dir", default="/tmp/smoke-output")
    args = parser.parse_args()

    result = test_smoke_crack(args.input, args.output_dir)
    print(json.dumps(result, indent=2))
    sys.exit(0 if result["passed"] else 1)

Design Rules

  1. One assertion per category. Assert that execution completes. Assert that output exists. Assert that columns exist. Do not assert numerical ranges exact file sizes or data quality metrics. Each additional assertion increases test maintenance cost and reduces test reliability.

  2. Use real imports. Do not mock pipeline modules. The smoke test must exercise the actual import chain to catch import errors and missing dependencies. A smoke test that uses unittest.mock to suppress import errors defeats its own purpose.

  3. Minimal setup and teardown. The test should require no database setup no service startup no external configuration beyond the test input file. If setup is required it should be part of the CI job definition not the test script. Each smoke test should be runnable on a developer’s laptop with a single command.

  4. Deterministic. The same input must always produce the same pass/fail result. Random seeds must be fixed in the pipeline configuration. Test input files must be version-controlled. Non-deterministic smoke tests are worse than useless — they erode trust in the entire testing infrastructure.

  5. Self-contained structured output. The test should print structured JSON output on completion making it easy for CI systems to parse results without custom log parsing. The JSON schema should be consistent across all smoke tests in the suite.

  6. Fast. Each test must complete in under 60 seconds on CI hardware. If a test consistently exceeds this limit optimize the test before adding it to the suite. A smoke test that takes 5 minutes will be skipped by developers running tests locally.

Common Pitfalls to Avoid

  • Testing too much. Adding accuracy assertions or value-range checks turns a smoke test into an integration test. Keep it shallow. The test should take less than 60 seconds to write and less than 60 seconds to execute.
  • Ignoring import errors. A smoke test that catches all exceptions broadly with a generic except Exception may miss import failures. Let some exceptions propagate or at minimum log import errors separately.
  • Using production data. Production data is large uncontrolled and may contain sensitive information including airport security zones and copyrighted imagery. Use a fixed small synthetic or curated input.
  • Mutable test inputs. Test inputs stored on network drives or cloud storage can change without version control. Always version-control test inputs in the repository or via Git LFS.
  • Assuming GPU availability. If the pipeline requires a GPU for inference the smoke test should gracefully handle GPU absence by either skipping the GPU-dependent test or running on CPU fallback.
  • Hard-coded file paths. Never hard-code absolute file paths in smoke tests. Use relative paths and the --output-dir argument to allow flexible execution environments.

Interpreting Smoke Test Failures

When a smoke test fails the development workflow must shift from “developing” to “diagnosing and fixing”. Effective failure interpretation follows a structured approach that maps failure symptoms to likely root causes.

Failure Categories

Failure SymptomLikely Root CauseImmediate Action
ImportErrorMissing dependency renamed module removed module version conflictCheck requirements.txt and recent import changes in the commit diff
ModuleNotFoundErrorPackage not installed or not in Python pathVerify environment matches lock file check for conditional imports
FileNotFoundErrorOutput path changed write permission issue missing directoryCheck output path configuration in recent commits verify directory creation logic
PermissionErrorInsufficient filesystem permissions in CI containerCheck Dockerfile permissions and CI runner user
Empty output fileStage completed but skipped processing logic due to conditional that evaluated unexpectedlyAdd debug logging to trace execution path check for early returns
Missing columnSchema change in upstream stage renamed column removed column type mismatchCompare column schemas between pipeline stages check recent PRs touching schema definitions
Segfault or OOMMemory leak unbounded memory allocation native extension crashProfile with reduced input check for infinite loops verify CUDA memory management
TimeoutInfinite loop blocking I/O deadlock slow external APIAdd timeout wrapper check for thread safety issues verify network connectivity
AssertionError on dimensionsResize operation removed model input size changed data type conversion changed shapeCheck image processing parameters verify model input specifications
CUDA errorGPU driver mismatch CUDA version conflict insufficient GPU memoryVerify CUDA toolkit version check Docker image GPU drivers

Failure Response Workflow

When a smoke test fails in CI the recommended response workflow is:

  1. Check the failure report. The structured JSON output from the smoke test includes the failure_reason field which contains the exception type and message. This is the fastest path to understanding what broke.

  2. Identify whether the failure is a code regression or an infrastructure issue. A ModuleNotFoundError on a package that was installed yesterday suggests a regression. A Timeout that appears on all builds simultaneously suggests an infrastructure issue (network outage CI runner degradation).

  3. Revert if the root cause is not obvious within 15 minutes. The smoke test suite is fast specifically to enable quick reversion. A build that fails smoke tests should be reverted immediately to unblock the CI pipeline for other developers.

  4. Investigate with the test input. Run the failing smoke test locally with the same version-controlled test input. Reproducing the failure locally eliminates CI-specific variables (container differences resource limits parallelism issues).

  5. Fix and add a unit test. Once the root cause is identified fix the code and add a unit test at the appropriate level that would have caught the failure. This prevents the same class of regression from recurring.

  6. Verify the fix passes smoke tests. Run the full smoke test suite locally before pushing the fix. This ensures the fix does not introduce new failures in other pipeline stages.

False Positives and Flaky Tests

Smoke tests can produce false positives — the test fails but the pipeline is actually functional. Common causes include:

  • Resource contention in CI — multiple builds running simultaneously exhaust memory or disk space
  • Network-dependent tests — tests that contact external APIs or download model weights from remote storage
  • Timing-dependent tests — tests that assume certain operations complete within a specific time window
  • File system race conditions — parallel tests writing to overlapping output directories

Flaky smoke tests erode trust in the testing infrastructure. Developers who see smoke tests fail randomly will start ignoring failures. The solution is to de-flake aggressively: identify the root cause of flakiness and harden the test. If a test cannot be made reliable with reasonable effort it should be moved to a lower-priority test suite or replaced with a more robust design.

Smoke Tests in the Aviation Software Context

For aviation software including airport inspection pipelines smoke tests carry additional significance due to the safety-critical nature of the domain. Under DO-178C (Software Considerations in Airborne Systems and Equipment Certification) software verification follows a rigorous requirements-based process. While DO-178C does not explicitly name “smoke testing” the concept of Hardware/Software Integration Testing defined in Section 6.4.3 requires that the integrated system “boots and basic functions work” — the functional equivalent of a smoke test.

Under ICAO Annex 14 Volume I — Aerodrome Design and Operations — the condition of airport pavements directly affects aircraft safety. Pavement failure on a runway can cause aircraft damage or loss of control during takeoff and landing. Software used for automated pavement inspection must therefore meet reliability standards commensurate with the safety implications of its output. Smoke tests provide the first line of defense against software errors that could compromise pavement condition assessments.

For TarmacView smoke tests are part of a broader software quality assurance framework that includes:

  • Requirements traceability — each pipeline feature links to a documented requirement
  • Unit test coverage — >90% line coverage for core algorithms
  • Smoke test coverage — 100% of pipeline stages covered by smoke tests
  • Integration test coverage — each component boundary tested with real data
  • Validation tests — pipeline results compared against manual inspection by certified airport inspectors
  • Performance benchmarks — throughput and latency tracked per release

This multi-layered approach ensures that smoke tests serve as the first gate without being the only gate. A passing smoke test means the pipeline is structurally sound. But it is the integration tests validation tests and performance benchmarks that provide the confidence needed to use TarmacView’s output for real airport maintenance decisions.

Conclusion

Smoke testing is a foundational software quality practice that delivers outsized value relative to its implementation cost. For inspection software pipelines like TarmacView smoke tests catch integration failures — broken imports missing files schema mismatches — in seconds rather than hours. They serve as the first gate in CI/CD rejecting unstable builds before they consume testing infrastructure. The TarmacView smoke test suite covers every pipeline phase from image ingestion through crack detection defect classification surface assessment survey mapping and visualization with each test verifying that the stage runs produces output and maintains expected column schemas.

The key to effective smoke testing is understanding its narrow scope. Smoke tests verify that the pipeline runs and produces output. They do not verify accuracy performance edge-case handling or data quality. These are validated by other test types in the broader quality assurance framework. When smoke tests are designed with this discipline — representative minimal inputs fast execution times broad but shallow coverage — they provide the rapid reliable failure detection that makes CI/CD pipelines effective for safety-critical aviation software.

Frequently Asked Questions

Ensure Pipeline Reliability

TarmacView uses automated smoke tests to validate every stage of its inspection pipeline. Contact us to learn how we maintain software quality for mission-critical airport infrastructure analysis.

Learn more

Defect Head Evaluation and Smoke Testing

Defect Head Evaluation and Smoke Testing

The defect head smoke test validates that TarmacView's structural defect detection pipeline — DINOv3 backbone + 5-label MLP head for crack/spalling/efflorescenc...

35 min read
testing defect +4
Testing – Process of Verifying Performance – Quality Assurance

Testing – Process of Verifying Performance – Quality Assurance

Explore the advanced concepts of software performance testing and quality assurance (QA), including processes, methodologies, tools, metrics, and real-world app...

7 min read
Performance Testing Quality Assurance +3
Test Procedure

Test Procedure

A test procedure is a step-by-step, documented method for systematically verifying the compliance, correctness, and performance of systems in quality assurance....

6 min read
Quality Assurance Regulatory Compliance +1