Defect Head Evaluation and Smoke Testing
The defect head smoke test validates that TarmacView's structural defect detection pipeline — DINOv3 backbone + 5-label MLP head for crack/spalling/efflorescenc...
A smoke test is a quick, end-to-end verification that a software pipeline executes without crashing on representative data, producing expected outputs. TarmacView’s scripts/smoke_*.py tests validate each pipeline phase (analyze, assess, crack, defect, survey, visualize). Covers smoke test design, what smoke tests verify vs full tests, and their role in CI/CD.
{{
A smoke test is a lightweight automated verification procedure that executes a software pipeline end-to-end on representative or minimal input data to confirm that the pipeline runs to completion without crashing and produces output files with the expected structure. The test performs a binary pass/fail assessment — if any step raises an unhandled exception produces no output or generates output missing critical columns the test fails and the build is rejected immediately.
The term originates from hardware engineering where the metaphor is literal. When engineers powered on a newly assembled circuit board for the first time they would watch for smoke escaping from burned-out components. If smoke appeared the board had a catastrophic failure and all further testing stopped. The defective board was either repaired or discarded before anyone invested time in detailed diagnostics. Software smoke testing applies the identical principle: run the code and check for “smoke” — crashes uncaught exceptions missing outputs — before investing time in detailed validation.
In the context of inspection software pipelines a smoke test validates that every stage of the processing chain can execute on representative data. For TarmacView this means feeding a small runway or pavement image through the entire pipeline — from image ingestion through crack detection defect classification surface assessment survey mapping and visualization — and verifying that each stage produces output without errors. This is particularly critical for aviation software where pipeline failures can delay机场 infrastructure assessments that directly impact flight safety.
The term build verification test (BVT) is often used interchangeably with smoke test Microsoft being the most prominent adopter of this terminology. Microsoft’s internal development processes — particularly for Windows and Office — institutionalized smoke testing as a mandatory quality gate in the 1990s. Steve McConnell’s Code Complete identifies the “daily build and smoke test” as the highest industry best practice for continuous integration maturity. Google’s Site Reliability Engineering (SRE) framework positions smoke tests under “system tests” — the simplest type used specifically to short-circuit expensive testing pipelines.
The purpose of smoke testing is threefold:
Smoke tests occupy the first tier of the testing pyramid executing before unit tests integration tests and end-to-end regression suites. They are designed to complete in under 60 seconds for a typical inspection pipeline making them suitable for execution on every code commit in CI/CD. In high-maturity software organizations smoke tests run on every push to any branch not just the main branch providing the earliest possible detection of integration failures.
The smoke testing concept predates software engineering by decades. The first documented use of the phrase appears in plumbing and stove testing from the 19th century. Stove manufacturers would light a fire inside a newly assembled stove close all dampers and observe where smoke escaped — if smoke emerged from unintended locations the stove had construction defects. This same logic of “apply minimal power and observe where things break” migrated to electronics testing in the mid-20th century and finally to software engineering in the 1980s and 1990s.
Microsoft is widely credited with bringing smoke testing into mainstream software engineering practice. In the late 1990s Microsoft’s Windows and Office divisions adopted what they called Build Verification Testing (BVT) as a mandatory gating process. Every nightly build had to pass a suite of BVTs before the build was released to internal testers. If the BVT failed the build was “broken” and the developer responsible was paged regardless of the time of day. This culture of immediate accountability for build quality became foundational to Microsoft’s engineering culture and was documented extensively in MSDN documentation and Microsoft Press books.
The Crosslake Technologies taxonomy (derived directly from Microsoft practice) draws a distinction between smoke tests and BVTs. Smoke tests are described as “cursory — ensure basic functionality works” running in minutes and focused on critical functions. BVTs are described as “a superset of smoke tests” that are “slightly more thorough” but still run in minutes not hours. Both serve the same gate-keeping function but BVTs include a wider set of critical-path scenarios.
Google adopted smoke testing through its SRE (Site Reliability Engineering) practices. The Google Testing Blog treats smoke testing as a known assumed practice focusing more on how to weight smoke test results alongside other test types. Google’s engineering culture emphasizes precision in unit tests and weighted confidence scoring for smoke-test-style broad checks treating smoke tests as complementary to rather than replacements for rigorous unit testing.
In the aviation software domain smoke testing maps to the Hardware/Software Integration Testing phase described in DO-178C Section 6.4.3. While DO-178C does not explicitly name “smoke testing” the standard requires that the integrated system “boots and basic functions work” before deeper testing proceeds. This is functionally identical to smoke testing. For airport inspection software operating under ICAO Annex 14 — which governs aerodrome design and operations — smoke tests provide the software reliability assurance needed to support Pavement Condition Index (PCI) assessments that feed into airport safety management systems.
Understanding where smoke tests fit in the broader testing taxonomy is essential for building a balanced quality assurance strategy. The four main testing levels serve complementary but distinct roles each targeting different failure modes at different stages of the pipeline lifecycle.
| Dimension | Smoke Test | Unit Test | Integration Test | System Test |
|---|---|---|---|---|
| Scope | Entire pipeline end-to-end | Single function or method | Two or more interacting components | Full system in production-like environment |
| Data | Minimal representative sample | Mocked or stubbed inputs | Real but limited data | Production-scale data |
| Execution time | Seconds to under 1 minute | Milliseconds | Minutes to hours | Hours to days |
| What it catches | Crashes missing outputs import failures | Logic errors within a function | Interface mismatches protocol errors | End-to-end correctness performance |
| Dependencies | Real (non-mocked) | Mocked or stubbed | Real subset | Full production stack |
| Frequency | Every commit in CI | Every commit in CI | Per build or nightly | Per release |
| Failure impact | Halts the pipeline | Isolated to single function | Blocks integration branch | Blocks release |
Unit tests validate that individual functions produce correct outputs for given inputs. They mock all external dependencies — databases file systems network services hardware accelerators. A unit test for a crack-detection function might verify that it correctly identifies cracks in a synthetic 10x10 pixel array with known crack positions. Unit tests are narrow and deep: they verify the logic of a single function exhaustively covering edge cases boundary conditions and error paths. However unit tests cannot catch integration failures because dependencies are mocked — the test never exercises the actual import chain configuration loading or inter-module data flow.
Integration tests validate that two or more components work together correctly. They use real but controlled instances of dependencies. An integration test for TarmacView might verify that the image ingestion stage correctly passes data to the crack-detection model in the expected tensor format with the correct channel ordering and normalization parameters. Integration tests are narrower than smoke tests in pipeline scope but deeper in interaction validation: they focus on specific component boundaries rather than the full pipeline.
Smoke tests validate that the entire pipeline executes without crashing. They run every stage in sequence with real though small data. Smoke tests are broad and shallow — they cover the full pipeline but only verify that execution completes and outputs exist not that numerical results are correct. A crack-length calculation that returns 47.2 pixels instead of the correct 42.1 pixels passes a smoke test as long as the column length_px exists and contains a floating-point value.
System tests (also called end-to-end tests) run the complete system with production-scale data in a production-like environment. They verify that the system meets its functional and non-functional requirements including accuracy performance and reliability. System tests are the most expensive to run and maintain and they execute only on release candidates not on every commit.
In the testing pyramid smoke tests form the base layer — they run first fastest and most frequently. If a smoke test fails unit and integration tests on that build are typically skipped or flagged as pre-emptively unreliable. This saves compute resources and developer time by failing fast when fundamental pipeline execution is broken.
Effective smoke test design centers on the concept of representative minimal input — the smallest dataset that exercises every stage of the pipeline without triggering data-dependent edge cases. This is the single most important design decision in building a smoke test suite.
Principle 1: Minimal but not trivial. The input must be large enough to pass through every pipeline stage without taking code paths that bypass real processing logic. A single-pixel image is trivial — it would pass through image loading but the processing algorithms would take degenerate code paths that never execute on real data. A 256x256 pixel image of actual runway pavement is minimal yet representative: it exercises tile-based processing algorithms color normalization routines and model inference paths without requiring excessive compute time.
Principle 2: Representative of production data. The input should have the same file format color depth metadata structure and statistical properties as production data. If production data comes from a Phase One iXM-RS150F aerial camera capturing 16-bit TIFF files with embedded EXIF GPS metadata the smoke test input must match these characteristics. Using synthetic data that differs from production data defeats the purpose of the smoke test because pipeline failures often stem from unexpected properties of real data — missing EXIF tags unexpected color profiles non-standard GeoTIFF projection strings.
Principle 3: Fixed and version-controlled. Smoke test inputs must be checked into version control alongside the code as binary files or managed via Git LFS (Large File Storage). They should never change without explicit review through pull requests. A changing input makes it impossible to distinguish pipeline regressions from test data changes — a smoke test that passes today and fails tomorrow could indicate either a code regression or a modified test input. Version-controlling inputs eliminates this ambiguity.
Principle 4: Fast to process. Total smoke test suite execution should not exceed 60 seconds for a typical pipeline and 3 minutes for complex multi-stage pipelines like inspection software. This constraint drives the input size. For image-based inspection pipelines a single 512x512 pixel image is typically sufficient. For video pipelines a single frame or a 2-second clip. For LiDAR pipelines a single flight line segment covering 100 meters of pavement.
Principle 5: Contains known features. The test input must contain the features that each pipeline stage is designed to detect. A smoke test for crack detection is useless if the input image contains no cracks — the pipeline could silently skip crack detection and still pass. Test inputs must be curated to contain at least one instance of each detectable feature with known ground truth that can be referenced during failure analysis.
For TarmacView’s inspection pipeline the smoke test input is a single 1024x768 pixel RGB orthophoto strip of airport pavement captured at 1mm ground sample distance (GSD). The image contains at least one visible crack one surface defect (e.g. raveling or spalling) and clear pavement markings. This single image:
The expected processing time for this single image through the full pipeline is under 30 seconds on CI hardware leaving headroom for the remaining smoke tests in the suite. The test image is stored in the repository under tests/data/smoke/ and is version-controlled via Git LFS.
TarmacView’s inspection pipeline is validated by a suite of dedicated smoke test scripts each covering a specific pipeline phase. These scripts live in the scripts/ directory following the naming convention smoke_<phase>.py. Each script is designed to be runnable both independently (for development debugging) and as part of the CI suite (for automated gating).
{{
This script validates the image analysis phase. It reads a single representative orthophoto strip runs the full analysis pipeline through the image processing module and verifies:
The script accepts a --input argument pointing to the test image and a --output-dir argument specifying where the pipeline should write results. If the pipeline crashes during any processing step the exception is caught and the test returns a structured failure report.
This script validates the pavement condition assessment phase. It takes the output of the analysis phase (or a pre-generated analysis file stored in version control) runs the condition assessment algorithm and verifies:
pavement_id condition_index severity extent date_assessedThe assessment algorithm in TarmacView follows the ASTM D5340 standard for airport PCI calculation adapted for automated image-based inspection. The smoke test does not verify that the PCI values are numerically correct — it only verifies that the calculation runs produces results in the expected format and writes them to disk.
This script validates the crack detection phase. It runs the crack-detection model on a test image known to contain cracks and verifies:
crack_id length_px width_px orientation confidence classificationThe crack detection model used by TarmacView is a U-Net architecture with a ResNet-50 backbone trained on labeled pavement images from multiple airports. The smoke test input image was specifically selected from the validation set meaning it contains cracks that the model has seen during training but not during test-set evaluation.
This script validates the surface defect classification phase. It runs the defect classifier on a test image containing known defects (raveling spalling patching weathering) and verifies:
defect_id defect_type area_px severity confidence timestampThe defect classifier uses a Mask R-CNN architecture that produces instance segmentation masks for each detected defect. The smoke test checks that the output mask has the correct dimensions and that the inventory file contains the expected schema columns. It does not check that the defect classifications are correct — that requires a separate validation suite with labeled ground truth data.
This script validates the survey mapping phase. It takes crack and defect data produced by upstream stages runs the geospatial survey mapping module and verifies:
The survey mapping module uses the orthophoto georeferencing metadata to perform a projective transformation from pixel coordinates to geographic coordinates. The smoke test verifies that this transformation produces valid geographic coordinates within the expected bounding box for the test image. A photo of the test image location shot at the airport provides a visual cross-reference for failure analysis.
This script validates the visualization and segmentation head phase — the final stage of the pipeline that produces human-readable output. It takes the processed pipeline outputs runs the visualization renderer and verifies:
The visualization phase is often the first place where cumulative pipeline errors become visible. A crack detection stage that produces a mask in the wrong coordinate space will generate a visualization where crack overlays are shifted relative to the base image. The smoke test detects this indirectly if the rendering crashes but it is the visual inspection regression test — not the smoke test — that catches visual quality regressions.
These smoke tests are designed to run in sequence but can also execute independently if upstream outputs are cached. The full suite completes in under 3 minutes on CI hardware. Each test outputs a structured JSON report with fields:
| Field | Type | Description |
|---|---|---|
test_name | string | Unique name of the test (e.g. “smoke_crack”) |
passed | boolean | Whether the test passed (true) or failed (false) |
duration_ms | integer | Wall-clock execution time in milliseconds |
output_files | array of strings | Paths to output files created during the test |
failure_reason | string or null | Human-readable failure reason if test failed |
input_path | string | Path to the input data used for the test |
pipeline_version | string | Git commit SHA or version tag of the pipeline code |
The test runner aggregates individual JSON reports into a suite summary that is posted to the CI dashboard and optionally sent to Slack or email for real-time notification.
Smoke tests are deliberately narrow in what they assert. They verify three categories of conditions and nothing more. This narrow scope is intentional — it keeps tests fast deterministic and easy to maintain.
The most fundamental assertion: does the code execute without raising an unhandled exception? This covers:
A pipeline that crashes on startup or mid-execution fails the smoke test immediately. The failure reason is captured from the exception traceback providing developers with a direct pointer to the code location of the failure.
After execution completes the smoke test checks that output files exist at the expected paths. This covers:
A pipeline that claims to finish but produces no output files fails the smoke test. This catches silent failures where the pipeline exits cleanly but skips critical write operations due to misconfigured output paths or conditional logic that evaluates to false unexpectedly.
For any tabular output (CSV Parquet GeoJSON feature collection) the smoke test verifies that expected columns exist by name. This is a structural integrity check — the column schema must match what downstream consumers expect.
| Output Type | Required Columns |
|---|---|
| Crack inventory | crack_id length_px width_px orientation confidence classification |
| Defect inventory | defect_id defect_type area_px severity confidence timestamp |
| Condition assessment | pavement_id condition_index severity extent date_assessed method |
| Survey mapping | feature_id latitude longitude geometry_type crs_epsg accuracy_m |
The test uses a column-existence check (not a type check not a value-range check). This is intentional — column existence is the minimum structural integrity guarantee. Type and range checks belong in integration and validation tests where the cost of running them is justified by the depth of information they provide.
The fourth category of verification that smoke tests perform — often overlooked — is data format compatibility. The smoke test verifies that output files can be read back by the expected downstream consumers. For TarmacView this means:
This catches format version mismatches — for example if the Parquet library is upgraded and changes its encoding or if the GeoJSON specification evolves and the pipeline output no longer conforms.
Equally important is understanding what smoke tests deliberately exclude. Misunderstanding this leads to false confidence in pipeline correctness — a passing smoke test does not mean the pipeline is correct only that it is not catastrophically broken.
Smoke tests do not verify that computed values are correct. A crack-length calculation that returns 47.2 pixels instead of the correct 42.1 pixels passes a smoke test as long as the column length_px exists and contains a float. Accuracy validation belongs in unit tests where the correct value is hard-coded and integration tests where results are compared against manual measurements from certified inspectors.
For aviation inspection pipelines operating under ICAO Annex 14 numerical accuracy is critical because condition assessments directly inform maintenance prioritization and budget allocation. A pipeline that passes smoke tests but produces inaccurate PCI scores could lead to incorrect maintenance decisions. This is why smoke tests are only the first gate — they must be followed by accuracy-focused validation tests.
Smoke tests use representative but non-adversarial inputs. They do not test:
Edge-case handling is validated by dedicated edge-case tests that specifically target each boundary condition. These tests are more expensive to run and are typically executed nightly rather than on every commit.
Smoke tests verify that the pipeline completes not that it completes within a performance budget. A pipeline that takes 10 seconds per image in smoke tests but is expected to process 100 images per second in production passes smoke tests without issue. Performance validation requires dedicated benchmark tests with production-scale data and timing assertions.
For airport inspection pipelines performance is critical because airports process hundreds of pavement images per survey. A 10x performance regression might still pass smoke tests on a single image but would make full surveys infeasible. Performance benchmarks with timing thresholds are the appropriate tool for detecting such regressions.
Smoke tests cover only the core execution path. Features not in the critical path — alternative output formats optional logging telemetry audit trails experimental export features — are not covered. Regression in these areas must be caught by regression test suites that specifically target non-critical functionality.
Smoke tests do not verify that data values are internally consistent. For example a smoke test checks that crack_id column exists but does not verify that all crack_id values are unique non-null or within expected ranges. Data quality validation requires dedicated data quality tests using frameworks like Great Expectations or Pandera that define data contracts and validate datasets against them.
Smoke tests deliver maximum value when integrated into the continuous integration and continuous deployment (CI/CD) pipeline. Their placement in the pipeline workflow determines their effectiveness as a quality gate.
{{
In a typical CI/CD workflow smoke tests execute at the build verification stage immediately after compilation and before any other test suite:
Code Commit → Build → Lint → Unit Tests → Smoke Tests → Integration Tests → Validation Tests → Deploy
The exact ordering depends on the project’s testing strategy. Some organizations run unit tests before smoke tests reasoning that unit tests are faster and catch logic errors. Others run smoke tests first reasoning that a pipeline that crashes on basic execution should be rejected without spending time on unit tests. The TarmacView pipeline runs unit tests and smoke tests in parallel after a successful build since they cover independent failure modes and have no interdependencies.
This placement ensures that if the build produces a pipeline that crashes on basic execution no further test infrastructure is consumed. The feedback loop is measured in minutes rather than hours. A developer who pushes a commit that breaks the import chain receives a CI failure notification within 2-3 minutes rather than waiting 4 hours for the integration test suite to fail.
The CI/CD pipeline uses a hard gate: if any smoke test fails the pipeline halts and does not proceed to subsequent stages. The build is marked as failed and developers are notified with the smoke test failure report. No manual override is permitted in the default configuration — a passing smoke test suite is a necessary condition for deployment.
This gate logic prevents the following scenarios:
The gate is implemented in the CI platform configuration. For CircleCI this means configuring workflow dependencies so that the smoke-test job runs before integration-test and deploy. For GitHub Actions this means using the needs: keyword to enforce job ordering.
Smoke tests within the suite can run in parallel if they test independent pipeline stages. TarmacView’s smoke test suite uses parallelism where possible:
smoke_crack.py and smoke_defect.py have no interdependencies and can execute concurrently reducing total suite time by 40%smoke_analyze.py must complete before smoke_assess.py since assessment consumes analysis outputsmoke_survey.py depends on crack and defect outputssmoke_seg_head.py depends on all upstream outputsThe parallel execution strategy is configured in the CI pipeline YAML using job-level parallelism. Each job runs in its own container with isolated dependencies preventing resource contention.
CI/CD integration includes automated reporting with multiple channels:
# .circleci/config.yml
version: 2.1
jobs:
smoke-test:
docker:
- image: tarmacview/pipeline:ci
parallelism: 3
steps:
- checkout
- run: pip install -r requirements.txt
- run: python scripts/smoke_analyze.py --input tests/data/smoke_image.tif --output-dir /tmp/smoke-output
- run:
name: "Parallel smoke tests"
command: |
python scripts/smoke_crack.py --input tests/data/smoke_image.tif --output-dir /tmp/smoke-output &
python scripts/smoke_defect.py --input tests/data/smoke_image.tif --output-dir /tmp/smoke-output &
wait
- run: python scripts/smoke_assess.py --input /tmp/smoke-output/cracks.parquet --output-dir /tmp/smoke-output
- run: python scripts/smoke_survey.py --input /tmp/smoke-output --output-dir /tmp/smoke-output
- run: python scripts/smoke_seg_head.py --input /tmp/smoke-output --output-dir /tmp/smoke-output
- store_artifacts:
path: /tmp/smoke-output
workflows:
version: 2
build-and-test:
jobs:
- build
- unit-tests:
requires: [build]
- smoke-test:
requires: [build]
- integration-tests:
requires: [smoke-test]
- deploy:
requires: [integration-tests]
Writing effective smoke tests requires discipline. The test must be fast reliable and focused only on what it is designed to detect. Every smoke test follows the same fundamental pattern: run the pipeline stage check for output existence verify minimal schema correctness report pass/fail in structured JSON.
# smoke_crack.py — simplified pattern
import sys
import json
import time
from pathlib import Path
import pandas as pd
def test_smoke_crack(input_path: str, output_dir: str) -> dict:
result = {
"test_name": "smoke_crack",
"passed": False,
"duration_ms": 0,
"output_files": [],
"failure_reason": None,
"input_path": input_path,
"pipeline_version": "unknown"
}
start = time.time()
try:
# Step 1: Get pipeline version
from tarmacview import __version__
result["pipeline_version"] = __version__
# Step 2: Run the pipeline stage with real imports
from tarmacview.pipeline.crack import detect_cracks
output = detect_cracks(input_path, output_dir)
# Step 3: Verify output file exists and is non-empty
output_path = Path(output["crack_mask_path"])
assert output_path.exists(), f"Output file not found: {output_path}"
assert output_path.stat().st_size > 0, f"Output file is empty: {output_path}"
result["output_files"].append(str(output_path))
# Step 4: Verify key columns in tabular output
if output.get("crack_inventory_path"):
inv_path = Path(output["crack_inventory_path"])
assert inv_path.exists(), f"Inventory file not found: {inv_path}"
df = pd.read_parquet(inv_path)
required_columns = [
"crack_id", "length_px", "width_px",
"orientation", "confidence", "classification"
]
for col in required_columns:
assert col in df.columns, f"Missing required column: {col}"
result["output_files"].append(str(inv_path))
# Step 5: Verify output mask dimensions match input
from PIL import Image
input_img = Image.open(input_path)
mask_img = Image.open(output_path)
assert input_img.size == mask_img.size, \
f"Mask dimensions {mask_img.size} do not match input {input_img.size}"
result["passed"] = True
except Exception as e:
result["failure_reason"] = f"{type(e).__name__}: {str(e)}"
result["passed"] = False
result["duration_ms"] = int((time.time() - start) * 1000)
return result
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--input", required=True)
parser.add_argument("--output-dir", default="/tmp/smoke-output")
args = parser.parse_args()
result = test_smoke_crack(args.input, args.output_dir)
print(json.dumps(result, indent=2))
sys.exit(0 if result["passed"] else 1)
One assertion per category. Assert that execution completes. Assert that output exists. Assert that columns exist. Do not assert numerical ranges exact file sizes or data quality metrics. Each additional assertion increases test maintenance cost and reduces test reliability.
Use real imports. Do not mock pipeline modules. The smoke test must exercise the actual import chain to catch import errors and missing dependencies. A smoke test that uses unittest.mock to suppress import errors defeats its own purpose.
Minimal setup and teardown. The test should require no database setup no service startup no external configuration beyond the test input file. If setup is required it should be part of the CI job definition not the test script. Each smoke test should be runnable on a developer’s laptop with a single command.
Deterministic. The same input must always produce the same pass/fail result. Random seeds must be fixed in the pipeline configuration. Test input files must be version-controlled. Non-deterministic smoke tests are worse than useless — they erode trust in the entire testing infrastructure.
Self-contained structured output. The test should print structured JSON output on completion making it easy for CI systems to parse results without custom log parsing. The JSON schema should be consistent across all smoke tests in the suite.
Fast. Each test must complete in under 60 seconds on CI hardware. If a test consistently exceeds this limit optimize the test before adding it to the suite. A smoke test that takes 5 minutes will be skipped by developers running tests locally.
except Exception may miss import failures. Let some exceptions propagate or at minimum log import errors separately.--output-dir argument to allow flexible execution environments.When a smoke test fails the development workflow must shift from “developing” to “diagnosing and fixing”. Effective failure interpretation follows a structured approach that maps failure symptoms to likely root causes.
| Failure Symptom | Likely Root Cause | Immediate Action |
|---|---|---|
| ImportError | Missing dependency renamed module removed module version conflict | Check requirements.txt and recent import changes in the commit diff |
| ModuleNotFoundError | Package not installed or not in Python path | Verify environment matches lock file check for conditional imports |
| FileNotFoundError | Output path changed write permission issue missing directory | Check output path configuration in recent commits verify directory creation logic |
| PermissionError | Insufficient filesystem permissions in CI container | Check Dockerfile permissions and CI runner user |
| Empty output file | Stage completed but skipped processing logic due to conditional that evaluated unexpectedly | Add debug logging to trace execution path check for early returns |
| Missing column | Schema change in upstream stage renamed column removed column type mismatch | Compare column schemas between pipeline stages check recent PRs touching schema definitions |
| Segfault or OOM | Memory leak unbounded memory allocation native extension crash | Profile with reduced input check for infinite loops verify CUDA memory management |
| Timeout | Infinite loop blocking I/O deadlock slow external API | Add timeout wrapper check for thread safety issues verify network connectivity |
| AssertionError on dimensions | Resize operation removed model input size changed data type conversion changed shape | Check image processing parameters verify model input specifications |
| CUDA error | GPU driver mismatch CUDA version conflict insufficient GPU memory | Verify CUDA toolkit version check Docker image GPU drivers |
When a smoke test fails in CI the recommended response workflow is:
Check the failure report. The structured JSON output from the smoke test includes the failure_reason field which contains the exception type and message. This is the fastest path to understanding what broke.
Identify whether the failure is a code regression or an infrastructure issue. A ModuleNotFoundError on a package that was installed yesterday suggests a regression. A Timeout that appears on all builds simultaneously suggests an infrastructure issue (network outage CI runner degradation).
Revert if the root cause is not obvious within 15 minutes. The smoke test suite is fast specifically to enable quick reversion. A build that fails smoke tests should be reverted immediately to unblock the CI pipeline for other developers.
Investigate with the test input. Run the failing smoke test locally with the same version-controlled test input. Reproducing the failure locally eliminates CI-specific variables (container differences resource limits parallelism issues).
Fix and add a unit test. Once the root cause is identified fix the code and add a unit test at the appropriate level that would have caught the failure. This prevents the same class of regression from recurring.
Verify the fix passes smoke tests. Run the full smoke test suite locally before pushing the fix. This ensures the fix does not introduce new failures in other pipeline stages.
Smoke tests can produce false positives — the test fails but the pipeline is actually functional. Common causes include:
Flaky smoke tests erode trust in the testing infrastructure. Developers who see smoke tests fail randomly will start ignoring failures. The solution is to de-flake aggressively: identify the root cause of flakiness and harden the test. If a test cannot be made reliable with reasonable effort it should be moved to a lower-priority test suite or replaced with a more robust design.
For aviation software including airport inspection pipelines smoke tests carry additional significance due to the safety-critical nature of the domain. Under DO-178C (Software Considerations in Airborne Systems and Equipment Certification) software verification follows a rigorous requirements-based process. While DO-178C does not explicitly name “smoke testing” the concept of Hardware/Software Integration Testing defined in Section 6.4.3 requires that the integrated system “boots and basic functions work” — the functional equivalent of a smoke test.
Under ICAO Annex 14 Volume I — Aerodrome Design and Operations — the condition of airport pavements directly affects aircraft safety. Pavement failure on a runway can cause aircraft damage or loss of control during takeoff and landing. Software used for automated pavement inspection must therefore meet reliability standards commensurate with the safety implications of its output. Smoke tests provide the first line of defense against software errors that could compromise pavement condition assessments.
For TarmacView smoke tests are part of a broader software quality assurance framework that includes:
This multi-layered approach ensures that smoke tests serve as the first gate without being the only gate. A passing smoke test means the pipeline is structurally sound. But it is the integration tests validation tests and performance benchmarks that provide the confidence needed to use TarmacView’s output for real airport maintenance decisions.
Smoke testing is a foundational software quality practice that delivers outsized value relative to its implementation cost. For inspection software pipelines like TarmacView smoke tests catch integration failures — broken imports missing files schema mismatches — in seconds rather than hours. They serve as the first gate in CI/CD rejecting unstable builds before they consume testing infrastructure. The TarmacView smoke test suite covers every pipeline phase from image ingestion through crack detection defect classification surface assessment survey mapping and visualization with each test verifying that the stage runs produces output and maintains expected column schemas.
The key to effective smoke testing is understanding its narrow scope. Smoke tests verify that the pipeline runs and produces output. They do not verify accuracy performance edge-case handling or data quality. These are validated by other test types in the broader quality assurance framework. When smoke tests are designed with this discipline — representative minimal inputs fast execution times broad but shallow coverage — they provide the rapid reliable failure detection that makes CI/CD pipelines effective for safety-critical aviation software.
TarmacView uses automated smoke tests to validate every stage of its inspection pipeline. Contact us to learn how we maintain software quality for mission-critical airport infrastructure analysis.
The defect head smoke test validates that TarmacView's structural defect detection pipeline — DINOv3 backbone + 5-label MLP head for crack/spalling/efflorescenc...
Explore the advanced concepts of software performance testing and quality assurance (QA), including processes, methodologies, tools, metrics, and real-world app...
A test procedure is a step-by-step, documented method for systematically verifying the compliance, correctness, and performance of systems in quality assurance....