I enjoy using Semgrep for any task that requires parsing text. I often find myself writing custom rules and process the JSON output of Semgrep to answer questions about a code base.
Over the years, I developed the configuration of my text editor to be able create Semgrep YAML rules more quickly as it is the only way to leverage the full power of Semgrep’s engine.
When using Semgrep in the CLI, it is only possible to search for a single pattern at a time using the --pattern
flag.
To leverage Semgrep’s other operators, the rule must be written to a YAML file.
This year, I finally started to program in Go. To get familiar with the language, the first project I created is a small CLI tool called semsearch that helps me run temporary Semgrep rules directly from the terminal.
Most of the useful rule operators from Semgrep’s YAML DSL are available as semsearch CLI flags.
Unlike a typical CLI, semsearch works more like a stack-based language so the order of the CLI flags is relevant.
CLI flags that expects a list of operators, such as --pattern-either
and --patterns
, will use all operators until the --pop
flag is used.
As an example using the semsearch source code, the following command will search Go functions that are named NewState
or Exec
.
semsearch --language go \
--pattern-either \
--pattern 'func ($S $R) $F(...) {...}' \
--pattern 'func $F(...) $R {...}' \
--pop \
--metavariable-regex 'F=(NewState|Exec)' \
--focus-metavariable F
By default, semsearch will invoke semgrep and run the rule right away in the current directory.
When the --export
flag is used, semsearch will instead output the YAML rule it intends to run.
This is useful after a few iterations to save the rule for later or contribute it to a repository.
The above command would output the following YAML rule:
rules:
- id: id
patterns:
- pattern-either:
- pattern: func ($S $R) $F(...) {...}
- pattern: func $F(...) $R {...}
- metavariable-regex:
metavariable: $F
regex: (NewState|Exec)
- focus-metavariable: $F
severity: WARNING
message: ""
languages:
- go
As an alternative to using --pop
, square brackets can also be used to more explicitly group operators.
semsearch --language go \
--pattern-either \[ \
--pattern 'func ($S *State) $F(...) {...}' \
--pattern 'func $F(...) *State {...}' \
\] \
--focus-metavariable F
In most cases, I will be using the shorthand flag syntax to quickly create a rule. The above command can be written more succinctly as follows:
semsearch -l go -fm F -pe -p 'func ($S *State) $F(...) {...}' -p 'func $F(...) *State {...}'
I hope this tool is useful for other Semgrep users. If you encounter any bugs or have any feature requests, please open an issue on the GitHub repository.