Semgrep for Solidity: Custom Rules for Audit Workflows — Darkwave Log

Automated tools are a force-multiplier in an audit. They do not replace judgment, but they compress the time it takes to answer a specific class of question: does this pattern appear anywhere in the codebase? Semgrep is singularly good at answering that question, and in a Solidity audit it becomes a programmable grep that understands code structure rather than raw text.

This article covers the mechanics of Semgrep pattern matching as it applies to Solidity, how to write rules for the vulnerability classes auditors encounter most often, how the tool compares with Slither, and how to embed both into a CI pipeline that provides continuous feedback without becoming noise.

How Semgrep Pattern Matching Works

Semgrep uses rules, which encapsulate pattern matching logic and data flow analysis, to scan your code for security issues, style violations, bugs, and more. The mental model is simpler than most static analysis tools: you write a pattern that looks like the code you want to find, and Semgrep reports every place in the codebase where that pattern matches.

Under the hood, when Semgrep performs an analysis of the code, it creates an abstract syntax tree (AST), which is then translated into an analysis-friendly intermediate language (IL). This means your patterns are matched against a structured tree representation of the source, not raw text — so whitespace, comment placement, and minor formatting differences do not affect results.

Two constructs make Semgrep patterns expressive:

Metavariables — metavariables are an abstraction to match code when you don’t know the value or contents beforehand. They’re similar to capture groups in regular expressions and can track values across a specific code scope. This includes variables, functions, arguments, classes, object methods, imports, exceptions, and more. Metavariables begin with a $ and can only contain uppercase characters, _, or digits. For example, $OWNER will match any identifier in that syntactic position.

The ellipsis operator — Semgrep uses the existing syntax of the language, with a few easy-to-learn operators that add generality. The ellipsis operator (...) lets you say, “I don’t care what’s in here,” to match 0 or more arguments, statements, etc. You can place ... between statements inside a function body to indicate that zero or more intervening statements are acceptable.

Rule syntax describes Semgrep YAML rule files, which can have multiple patterns, detailed output messages, and rule-defined fixes. The syntax allows the composition of individual patterns with Boolean operators.

A minimal Solidity rule skeleton looks like this:

rules:
  - id: example-rule
    languages: [solidity]
    severity: WARNING
    message: |
      Found a potentially dangerous pattern at $FUNC.
    patterns:
      - pattern: |
          function $FUNC(...) external {
            ...
          }
      - pattern-not: |
          function $FUNC(...) external {
            require(...);
            ...
          }

Currently Semgrep supports Solidity in experimental mode. This matters practically: some parser edge cases may not be handled correctly, and patterns that rely on deeply nested or unusual Solidity constructs should always be validated against a known-good test file before being deployed to CI.

Semgrep provides a simple syntax for writing rules: if you can write code, you can write a Semgrep rule — no program analysis Ph.D. required.

Semgrep vs. Slither: Different Tools, Different Questions

Auditors frequently ask whether they should use Semgrep or Slither. The honest answer is that they are not alternatives — they answer different questions.

Slither is a Solidity & Vyper static analysis framework written in Python3. It runs a suite of vulnerability detectors, prints visual information about contract details, and provides an API to easily write custom analyses. Slither enables developers to find vulnerabilities, enhance their code comprehension, and quickly prototype custom analyses.

The architectural difference is fundamental. Slither parses Solidity source files, receiving an Abstract Syntax Tree (AST) from the compiler. It then transforms this into SlithIR, an SSA-style, reduced-instruction intermediate representation comprising fewer than 40 instructions. SlithIR uses Static Single Assignment (SSA) form and a reduced instruction set to ease implementation of analyses while preserving semantic information that would be lost in transforming Solidity to bytecode. Slither allows for the application of commonly used program analysis techniques like dataflow and taint tracking.

Semgrep, by contrast, works on the AST of the code and does not have the full potential to use Control Flow and Data Flow graphs of the code. SAST tools like Slither can operate on CFG/DFG, do complex taint analysis, as well as having a well-designed IR.

In practice this means:

Capability	Slither	Semgrep
Type-aware analysis	✓ (via SlithIR)	Limited
Cross-function dataflow	✓	CE: No; Pro: Yes
Cross-file dataflow	Partial	Pro only
Custom pattern writing	Requires Python	YAML (simple)
Solidity-specific detectors	80+ built-in	Registry + custom
Codebase-wide search speed	Fast	Very fast
CI integration	Good	Excellent

Each has different strengths: Slither excels at static analysis patterns, Mythril uses symbolic execution to find deeper bugs, Semgrep enables custom rule creation, and Aderyn focuses on the Solidity-specific detectors.

The search technology in Semgrep is completely different from Slither, so for better analysis you should use both tools — you can significantly raise your audit quality level.

Semgrep’s advantage is programmability with a low barrier. You can write a rule in ten minutes that searches every .sol file for a pattern you observed during manual review. Slither’s advantage is that its detectors reason about program semantics — reentrancy detection, for example, requires understanding state mutation relative to external calls, which is precisely what SlithIR’s SSA form makes tractable.

Use Slither for its built-in detectors and semantic analysis. Use Semgrep for rapid pattern searches, variant analysis, and enforcing conventions that your team defines — including patterns that Slither's fixed detector set does not cover.

Writing Custom Rules for Common Vulnerability Patterns

Rule Structure

Every Semgrep rule is a YAML document. The required fields are id, languages, severity, message, and at least one pattern operator. Optional fields include metadata (for CWE tags, confidence, and references) and fix (for suggested auto-remediation).

Pattern syntax describes what Semgrep patterns can do in detail, with example use cases of the ellipsis operator and metavariables. Rule syntax describes Semgrep YAML rule files, which can have multiple patterns, detailed output messages, and autofixes. The syntax allows the composition of individual patterns with boolean operators.

The key composition operators are:

patterns — all sub-patterns must match (logical AND)
pattern-either — any sub-pattern must match (logical OR)
pattern-not — exclude matches where this pattern also matches
pattern-not-inside — exclude matches that appear inside a surrounding pattern
pattern-inside — require the match to occur inside a surrounding pattern

Access Control Checks

A recurring audit finding is a function that modifies privileged state without any ownership check. Semgrep can be used to surface all external or public state-mutating functions that lack a modifier or require statement referencing msg.sender or owner.

rules:
  - id: missing-access-control-modifier
    languages: [solidity]
    severity: WARNING
    message: >
      Function $FUNC is externally callable and has no visible
      access-control modifier or ownership check. Verify that
      this is intentional.
    metadata:
      category: security
      cwe: "CWE-284: Improper Access Control"
      confidence: LOW
    patterns:
      - pattern: |
          function $FUNC(...) external {
            ...
          }
      - pattern-not: |
          function $FUNC(...) external $MOD {
            ...
          }
      - pattern-not-inside: |
          function $FUNC(...) external {
            require($OWNER == msg.sender, ...);
            ...
          }
      - pattern-not-inside: |
          function $FUNC(...) external {
            if ($OWNER != msg.sender) revert(...);
            ...
          }

Audit rules with lower confidence are intended for code auditors. This rule is deliberately a LOW-confidence audit rule: it will produce false positives on intentionally permissionless functions. Its value is in the triaging process — every match is a candidate that deserves a few seconds of human attention, not an automatic finding.

To reduce false positives, extend pattern-not-inside with the specific modifiers used in the project (onlyOwner, onlyRole, onlyGovernance, etc.):

      - pattern-not: |
          function $FUNC(...) external onlyOwner {
            ...
          }
      - pattern-not: |
          function $FUNC(...) external onlyRole($ROLE) {
            ...
          }

Reentrancy Surface Detection

Reentrancy detection at the semantic level — tracking state writes relative to external calls — is Slither’s domain. But Semgrep is useful for a complementary task: identifying the surface area of reentrancy exposure by finding all functions that make an external call before modifying state.

rules:
  - id: reentrancy-surface-state-after-call
    languages: [solidity]
    severity: WARNING
    message: >
      Function $FUNC contains an external call ($CALL) followed
      by a state-variable write. Verify that this follows the
      checks-effects-interactions pattern or is guarded by a
      reentrancy lock.
    metadata:
      category: security
      cwe: "CWE-841: Improper Enforcement of Behavioral Workflow"
      confidence: LOW
    patterns:
      - pattern: |
          function $FUNC(...) {
            ...
            $CALL{value: $VAL}(...);
            ...
            $STATE = $EXPR;
            ...
          }

A complementary rule targets low-level calls, which are a frequent reentrancy vector that higher-level patterns miss:

rules:
  - id: low-level-call-followed-by-state-write
    languages: [solidity]
    severity: WARNING
    message: >
      Low-level call via $ADDR.call(...) detected before a state
      mutation. Audit for reentrancy.
    patterns:
      - pattern: |
          function $FUNC(...) {
            ...
            $ADDR.call{...}(...);
            ...
            $STATE = $EXPR;
            ...
          }
      - pattern-not-inside: |
          function $FUNC(...) $MOD {
            ...
          }

Unsafe Casting

Solidity’s explicit casting can silently truncate values. Casting a uint256 down to a smaller integer type such as uint128, uint64, or uint8 without a bounds check is a frequent source of precision loss. Semgrep can sweep a codebase for all such patterns.

rules:
  - id: unsafe-downcast-uint256
    languages: [solidity]
    severity: WARNING
    message: >
      Downcasting $VAR from a larger integer type to $TYPE without
      a prior bounds check. This may silently truncate the value.
      Consider using OpenZeppelin SafeCast.
    metadata:
      category: security
      cwe: "CWE-681: Incorrect Conversion between Numeric Types"
      confidence: MEDIUM
    pattern-either:
      - pattern: uint128($VAR)
      - pattern: uint64($VAR)
      - pattern: uint32($VAR)
      - pattern: uint16($VAR)
      - pattern: uint8($VAR)
      - pattern: int128($VAR)
      - pattern: int64($VAR)
      - pattern: int32($VAR)
      - pattern: int8($VAR)

Pair this with a pattern-not-inside that excludes expressions already inside a SafeCast call:

      - pattern-not-inside: SafeCast.toUint128($VAR)
      - pattern-not-inside: SafeCast.toUint64($VAR)

Tx.origin Authentication

Using tx.origin for authentication is a classic Solidity pitfall. This rule is simple and high-confidence:

rules:
  - id: tx-origin-authentication
    languages: [solidity]
    severity: ERROR
    message: >
      Authentication using tx.origin at $FUNC is vulnerable to
      phishing attacks. Use msg.sender instead.
    metadata:
      confidence: HIGH
      cwe: "CWE-290: Authentication Bypass by Spoofing"
    pattern-either:
      - pattern: require(tx.origin == $ADDR, ...)
      - pattern: require(tx.origin != $ADDR, ...)
      - pattern: |
          if (tx.origin != $ADDR) {
            revert(...);
          }
      - pattern: |
          if (tx.origin == $ADDR) {
            ...
          }

Using Semgrep for Codebase-Wide Pattern Searches During Audit

One of the most productive uses of Semgrep in an audit is variant analysis: once you find a bug manually, you immediately write a Semgrep rule to answer the question “does this pattern appear anywhere else in the codebase?” This is faster and more reliable than a text search.

In the Decurity semgrep-smart-contracts repository, rules look for patterns of vulnerabilities in smart contracts based on actual DeFi exploits, as well as gas optimization rules that can be used as part of the CI pipeline. The structure of those rules — grounded in real exploit post-mortems — is a good model for how to think about custom rules during an engagement.

Common codebase-wide searches that pay off:

Find all uses of delegatecall:

rules:
  - id: delegatecall-usage
    languages: [solidity]
    severity: WARNING
    message: >
      delegatecall to $TARGET detected. Verify storage layout
      compatibility and that $TARGET is a trusted address.
    pattern: $ADDR.delegatecall(...)

Find all unchecked return values on external calls:

rules:
  - id: unchecked-low-level-call-return
    languages: [solidity]
    severity: WARNING
    message: >
      Return value of low-level call to $ADDR is not checked.
      A failed call will silently continue execution.
    patterns:
      - pattern: $ADDR.call(...)
      - pattern-not: |
          (bool $SUCCESS, ...) = $ADDR.call(...)
      - pattern-not: |
          bool $SUCCESS = $ADDR.call(...)

Find all functions that accept address parameters without validation:

rules:
  - id: unvalidated-address-parameter
    languages: [solidity]
    severity: INFO
    message: >
      Function $FUNC accepts an address parameter $ADDR with no
      zero-address check. Confirm this is intentional.
    patterns:
      - pattern: |
          function $FUNC(..., address $ADDR, ...) {
            ...
          }
      - pattern-not-inside: |
          function $FUNC(..., address $ADDR, ...) {
            require($ADDR != address(0), ...);
            ...
          }

Running a suite of such rules against the entire scope of an engagement takes seconds and produces a structured list of candidates to triage — a qualitatively different experience from scrolling through source files manually.

This gradual rollout ensures that Semgrep becomes a trusted part of your workflow rather than a noisy tool that developers learn to ignore. The same applies to audit contexts: starting with high-confidence, low-noise rules ensures findings are taken seriously by developers in the review process.

Integrating Semgrep into CI for Continuous Security Checks

Semgrep Code is a static application security testing (SAST) tool that detects security vulnerabilities in your first-party code. You can use Semgrep Code to scan local repositories or integrate it into your CI/CD pipeline to automate the continuous scanning of your code.

The following GitHub Actions workflow integrates a custom Solidity rule directory alongside the Decurity smart-contracts registry rules:

name: Semgrep Solidity Security

on:
  pull_request: {}
  push:
    branches: [main]
  workflow_dispatch: {}

jobs:
  semgrep:
    name: Semgrep Scan
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v3

      - name: Fetch community smart-contract rules
        uses: actions/checkout@v3
        with:
          repository: Decurity/semgrep-smart-contracts
          path: semgrep-rules

      - name: Run Semgrep
        run: |
          semgrep ci \
            --sarif \
            --output=semgrep.sarif \
            || true
        env:
          SEMGREP_RULES: >-
            semgrep-rules/solidity/security
            .semgrep/custom

      - name: Upload SARIF to GitHub Advanced Security
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: semgrep.sarif
        if: always()

Semgrep uses three rule modes: Monitor (finding recorded, no CI impact), Comment (PR comment posted, CI passes), and Block (PR comment posted, CI fails with exit code 1). For an audit-integrated workflow, keeping rules in Monitor mode during the initial rollout and promoting only high-confidence rules to Block prevents alert fatigue.

For pull request workflows, diff-aware scanning analyzes only changed files against a baseline reference. This significantly reduces scan times for large codebases, where full scans might take minutes.

Organize your custom rules in a .semgrep/ directory at the repository root:

.semgrep/
  custom/
    access-control.yaml
    unsafe-casting.yaml
    reentrancy-surface.yaml
    low-level-calls.yaml
.semgrepignore

The .semgrepignore file works similarly to .gitignore and should exclude test fixtures, mock contracts, and vendored libraries that would otherwise generate noise:

# .semgrepignore
lib/
node_modules/
test/
mocks/

Some of the rules utilize the taint mode, which is restricted to the same function in the open-source version of Semgrep. To take advantage of intra-procedural taint analysis, you must include the --pro flag with each command. Please note that this requires Semgrep Pro.

Limitations of Semgrep for Solidity Analysis

Understanding what Semgrep cannot do is as important as knowing what it can. Treating it as a comprehensive vulnerability scanner will produce false confidence; treating it as a targeted pattern search engine will make it genuinely useful.

No Full Type Awareness for Solidity

Semgrep’s pattern matching is primarily syntactic. While typed metavariables exist for certain languages, Solidity support is experimental and type inference is limited. This means a pattern like uint128($VAR) will match regardless of what $VAR actually is — a uint256, a function return value, or anything else. You cannot write a rule that says “only match this cast if $VAR is of type uint256” with confidence in the Solidity context.

Semgrep works on the AST of the code and does not have the full potential to use Control Flow and Data Flow graphs of the code. Although Semgrep has taint mode, it is limited.

Slither’s SlithIR, by contrast, preserves type information through the intermediate representation, enabling type-aware detectors.

No Cross-File Dataflow in the Open-Source Version

By design, Semgrep open source software (Community Edition) can only analyze interactions within a single function, also known as intraprocedural analysis. This limited scope makes Semgrep CE fast and easy to integrate into developer workflows.

This is a significant limitation for smart contract audit. A common vulnerability pattern involves a taint source in one contract calling a sink in another — cross-contract reentrancy, for example, or an authorization check in a base contract that a derived contract bypasses. Cross-file analysis takes into account how information flows between files. In particular, cross-file analysis includes cross-file taint analysis, which tracks unsanitized variables flowing from a source to a sink through arbitrarily many files. This capability requires the Pro engine.

Community Edition has limitations compared to Pro versions, including single-function analysis only versus cross-file dataflow in Pro.

No Path Sensitivity

In addition to being intraprocedural, there are other trade-offs: no path sensitivity — all potential execution paths are considered, even though some may not be feasible. A pattern that matches inside an if branch will fire regardless of whether that branch is reachable in practice.

False Positives Are Inherent

No soundness guarantees: Semgrep ignores the effects of eval-like functions on the program state. It doesn’t make worst-case sound assumptions, but rather “reasonable” ones. Expect both false positives and false negatives.

Semgrep is only as good as your rules. It requires expertise to write effective patterns and has no built-in Solidity vulnerability knowledge. The quality of the tool in practice is entirely determined by the quality of the rule set.

No Semantic Understanding of Solidity-Specific Constructs

Semgrep does not understand Solidity’s execution model. It has no notion of view and pure semantics, the difference between a storage and memory reference, or which functions are reachable from outside via the ABI. Rules that attempt to reason about these properties will either be overly broad or require complex pattern composition that approaches the complexity of writing a Slither detector.

Never treat a clean Semgrep scan as a security guarantee. Semgrep finds the patterns you told it to look for. Bugs that do not match any of your rules will not appear in the output — regardless of how severe they are.

Combining Semgrep with Other Tools

No single security tool catches all vulnerabilities. A mature audit workflow layers tools so that the weaknesses of each are compensated by the strengths of others.

Semgrep + Slither

Run Slither first for its semantic detectors — reentrancy, unprotected functions, incorrect ERC-20 implementations, and storage collision. Then run Semgrep for pattern-level searches that Slither’s fixed detector set does not cover: project-specific conventions, custom access control modifiers, unusual casting patterns, or any variant analysis triggered by manual findings.

A recommended layered approach is to use Slither and Semgrep together in CI/CD for fast coverage that catches most common issues, then add deeper analysis tools for pre-audit work on critical functions.

Semgrep + Foundry Fuzzing

Semgrep identifies surface area — functions, code paths, and patterns that warrant deeper investigation. Once Semgrep surfaces a suspicious cast or an unchecked external call, Foundry invariant tests or fuzz targets can be written to confirm whether the pattern is actually exploitable under realistic inputs.

Semgrep + Manual Review

The most direct pairing is Semgrep as a force-multiplier for manual review. When an auditor identifies a bug manually, a Semgrep rule codifies the pattern and sweeps the entire codebase for variants in seconds. This is variant analysis — and it is one of the highest-leverage uses of Semgrep in any engagement.

After creating custom Semgrep rules targeting identified gaps, Semgrep’s extensible rule system makes it the most improvable tool — demonstrating both the structural limitations of current SAST tools and Semgrep’s particular strength.

Running the Full Stack

A practical audit toolkit invocation might look like:

# 1. Slither for semantic analysis
slither . --checklist --markdown-root .

# 2. Semgrep for pattern-level searches
semgrep \
  --config .semgrep/custom \
  --config semgrep-rules/solidity/security \
  --sarif \
  --output semgrep.sarif \
  contracts/

# 3. Review SARIF output or redirect to audit tooling
cat semgrep.sarif | jq '.runs[].results[] | {ruleId, message, location}'

Writing High-Quality Rules: Practical Guidance

Start with a Known-Vulnerable Example

Before writing a rule, create a small Solidity file that contains the exact pattern you want to detect. Use semgrep --test to verify the rule fires on the vulnerable example and does not fire on a fixed version:

semgrep --config my-rule.yaml --test .

Use `pattern-not` Aggressively to Reduce Noise

A rule with high false positive rates will be ignored. Invest time in adding pattern-not and pattern-not-inside clauses that exclude common safe patterns. For access control rules, enumerate the modifiers actually used in the project.

Annotate Rules with Metadata

Include the appropriate Common Weakness Enumeration (CWE). CWE can explain what vulnerability your rule is trying to find. Consistent metadata makes the output of Semgrep actionable inside issue trackers and audit reports.

Separate CI Rules from Audit Rules

Maintain two directories: .semgrep/ci/ for high-confidence rules that block PRs, and .semgrep/audit/ for lower-confidence rules that are run manually during engagements. High confidence security rules are appropriate for CI pipelines. Rules with significant false positive rates belong in the audit directory, where a human is triaging every finding.

Version-Control Your Rules

Rule files are code. They should live in version control, have associated test fixtures, and be reviewed with the same rigor as any other security-critical artifact. When a rule is added because of a finding in an engagement, document the origin in the rule’s metadata so future readers understand why it exists.

Conclusion

Semgrep occupies a specific and valuable niche in the Solidity audit toolkit. It is not a replacement for Slither’s semantic analysis, for fuzzing, or for manual review. It is a programmable, fast, codebase-wide pattern search engine that rewards auditors who take the time to write precise rules.

The workflow is straightforward: identify a pattern that matters — an access control gap, an unsafe cast, an unchecked return value — express it in YAML, validate it against a known example, and deploy it. The same rule then runs in CI on every pull request, ensuring that a vulnerability class found once does not quietly re-enter the codebase.

Both tools work well and allow you to easily filter bugs and customize search, but they are fundamentally different in their nature. In any case, using both solutions is recommended. The auditor who understands the strength and limits of each tool — and combines them deliberately — is in a qualitatively stronger position than one who relies on any single tool’s output.