Commit Graph

57 Commits

Author SHA1 Message Date
Jordan Tunstill e9734c1ff2 Fix pre-receive hook hangs and missing logs by flushing logs on signal and using CommandContext for git commands (#4714)
When TruffleHog times out in pre-receive hooks, it fails to output
diagnostic logs.

Changes:
- Ensure log output is flushed before process termination in all exit paths:
  * Defer log sync in run() function for normal exits
  * Sync logs in signal handler before os.Exit(0)
  * Sync logs before os.Exit(183) when results are found
  * Sync logs in logFatalFunc before os.Exit() calls
- Use exec.CommandContext instead of exec.Command for git log/diff
  to ensure processes are killed when context is cancelled
- Add WaitDelay to git commands to provide grace period for cleanup

This ensures diagnostic output is captured when git operations block in
pre-receive hook environments and logs are visible when process is killed.
2026-02-06 11:48:02 -08:00
Gleb Haranin 6a0bc788d2 fix(git): use --iso-strict git arg to prevent locale issue (#4653)
Co-authored-by: Kashif Khan <70996046+kashifkhan0771@users.noreply.github.com>

Fixes #3338

TruffleHog fails to parse git commit dates when the system locale is non-English (e.g., German outputs Sa. instead of Sat) preventing users from scanning git repos(all git-based sources like local git repo, github, gitlab, etc). This is because the format that is being used --date=format:%a %b %d %H:%M:%S %Y %z is dependent on strftime(3) C library function which is locale dependent(uses POSIX LC_ALL, LC_TIME, LANG). Setting LC_ALL=C or any other locale on the user's side like LC_ALL=de_DE.UTF-8 trufflehog ... doesn't help because the code overwrites the subprocess environment.

This means git only receives GIT_DIR and no locale settings. Rather than fixing env inheritance or adding LC_TIME=C, this PR switches to --date=iso-strict which outputs locale-independent ISO 8601 timestamps (2024-09-28T07:59:21+00:00). I think this is a pretty good solution: iso format has been stable since Git 2.2 (2014), requires no environment manipulation, and uses Go's native time.RFC3339 parser.

Error message when parsing:

2026-01-10T00:28:06-05:00	error	trufflehog	failed to parse commit date	{"source_manager_worker_id": "rc4SK", "unit_kind": "dir", "unit": "/var/folders/hw/r5j4bcyd3472ccwjz0klh5840000gn/T/trufflehog-25309-2688385567", "repo": "file:///Users/xxx/xxx/sample-repo", "commit": "5f506baa305831998a2e15aa07cd381a69fde48f", "latestState": "AuthorDateLine", "error": "parsing time \"Sa. Jan. 10 00:11:44 2026 -0500\" as \"Mon Jan 2 15:04:05 2006 -0700\": cannot parse \"Sa. Jan. 10 00:11:44 2026 -0500\" as \"Mon\""}

Tested on homebrew git on mac(apple git seems to completely ignoring locales) and linux (debian)
2026-01-15 10:25:04 -05:00
Cody Rose c0e66ca24b Make log command extensible internally (#3888)
I'm experimenting with ways to manipulate the git log that's being used to generate diffs (specifically, limiting its depth in some cases). Tweaking RepoPath for this locally isn't a huge deal but making the log command extensible seems like it could generally be useful, so I figured I'd do it.
2025-02-13 15:55:53 -05:00
ahrav ad0bc11a5b [fix] - integer types (#3793)
* fix types

* fix
2024-12-18 09:02:12 -08:00
ahrav 72b1de6b0b fix const type (#3792) 2024-12-17 20:12:55 -08:00
Dustin Decker f3630da1e0 Improve process cleanup (#3339)
* ensures that cmd.Wait() is always called, even if there's a panic in the FromReader function or if stdOut.Close() returns an error

* close stdout and ensure wait is called when handling binaries

* process cleanup improvements

* lint
2024-09-26 10:17:47 -07:00
ahrav fd257350dd [chore] - address linter (#3133)
* addres linter

* fix
2024-07-31 17:30:51 -07:00
Richard Gomez 11e5febeee feat(git): scan commit metadata (#2754)
This is a follow-up to #2713 that fixes the strange test error.

As suspected, the failure was caused by additional diffs not being included in the test's expected data.
2024-04-29 16:58:45 -04:00
ahrav b430dae83e [refactor] - lazy buffer retrieval (#2745)
* only create the contentWriter once

* update test

* Lazily fetch buffer from the pool

* fix tests

* fix test

* remove ctx
2024-04-25 08:27:15 -07:00
ahrav 8ceeb5d5a1 [bug] - Refactor newDiff constructor to avoid double initialization of contentWriter (#2742)
* only create the contentWriter once

* update test

* correclty use mock

* remove deprecated pkg
2024-04-25 08:01:38 -07:00
Cody Rose 11452e8a57 Revert "feat(git): scan commit metadata (#2713)" (#2747)
This reverts commit 81a9c813a1.
2024-04-25 10:56:48 -04:00
Richard Gomez 81a9c813a1 feat(git): scan commit metadata (#2713)
This fixes #2683. It scans the commit author, committer (which is typically GitHub <noreply@github.com> for GitHub, but can be different), and message.

It also scans Git notes.
2024-04-25 10:13:09 -04:00
ahrav 4a5fbf8417 [refactor] - Update Write method signature in contentWriter interface (#2721)
* Update write method in contentWriter interface

* fix lint
2024-04-23 08:47:53 -07:00
Richard Gomez baf7ea1458 feat(gitparse): avoid uneeded calls to strconv.Unquote (#2605) 2024-03-22 08:35:10 -07:00
Richard Gomez aa862e46bb fix(git): decode unicode paths (#2585) 2024-03-19 08:50:27 -07:00
ahrav 40bbab8add [cleanup] - Extract buffer logic (#2409)
* extract the buffer logic into it's own package

* address comments
2024-02-15 11:40:34 -08:00
ahrav e8006f1bee 2396 since commit stopped working (#2402)
* Ensure we handle commits with no diffs correctly.

* cleanup

* add nil check

* address comments

* move comment

* revert

* add comment
2024-02-13 07:21:22 -08:00
Richard Gomez 3b40c4fa63 Update GitParse to handle quoted binary filenames (#2391)
* fix(gitparse): quoted binary files

* fix(gitparse): use bytes.Cut instead of regexp

* fix lint warning

---------

Co-authored-by: Zachary Rice <zachary.rice@trufflesec.com>
2024-02-08 09:25:04 -06:00
ahrav 7b492a690a [feat] - use diff chan (#2387)
* use diff chan

* address comments

* add comment

* address comments

* use old ordering

* add correct author line

* Add required *Commit arg to newDiff

* address comments
2024-02-06 10:06:10 -08:00
ahrav 843334222c [not-fixup] - Reduce memory consumption for Buffered File Writer (#2377)
* correctly use the buffered file writer

* use value from source

* reorder fields

* use only the DetectorKey as a map field

* correctly use the buffered file writer

* use value from source

* reorder fields

* add tests and update

* Fix issue with buffer slices growing

* fix test

* fix

* add singleton

* use shared pool

* optimize

* rename and cleanup

* use correct calculation to grow buffer

* only grow if needed

* address comments

* remove unused

* remove

* rip out Grow

* address coment

* use 2k default buffer

* update comment allow large buffers to be garbage collected
2024-02-06 09:22:25 -08:00
Miccah 01c9ac7b59 Fix binary file hanging bug in git sources (#2388)
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
2024-02-05 15:28:49 -08:00
ahrav 135cc3eb69 [fixup] - correctly use the buffered file writer (#2373)
* correctly use the buffered file writer

* use value from source

* reorder fields

* use only the DetectorKey as a map field

* address comments and use factory function

* fix optional params

* remove commented out code
2024-02-05 10:43:55 -08:00
ahrav 9867ce8eb8 Allow for configuring the buffered file writer (#2319)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* Allow for configuring the buffered file writer

* remove unused

* add missing method

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* fix snifftest

* address comments

* add more kvp pairs to error

* fix test

* update

* add back missing metadata fields

* address comments

* remove bufferedfile writer

* fix

* address comments

* use unint8

* update interface

* adjust interface

* fix tests

* make linter happy

* fix finalize

* address comments

* update test

* address comments

* lint

* remove guard

* fix test

* fix

* add TODO

* fix tests
2024-01-30 12:51:58 -08:00
ahrav 7c59ff95d5 [feat] - tmp file diffs (#2306)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* remove unused

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* add more kvp pairs to error

* fix test

* update

* address comments

* remove bufferedfile writer

* address comments

* adjust interface

* fix finalize

* address comments

* lint

* remove guard

* fix

* add TODO
2024-01-30 12:30:51 -08:00
Richard Gomez 241e153dfb fix(gitparse): handle fromFileLine edge case (#2206) 2024-01-04 14:53:08 -08:00
Richard Gomez e72fdb62e4 fix(gitparse): don't trim filename (#2201) 2023-12-14 08:29:46 -08:00
Bill Rich 00a00ef651 Fix binary handling (#1999) 2023-10-26 10:07:02 -07:00
Savely Krasovsky d062834997 initial support for bare repositories (#1499)
* feat: initial support for bare repositories

* feat: use concatenation instead of formatting and os.Getenv instead of os.Environ

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: go-git update with pre-receive hooks fix

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: remove info about pre-receive hook from README.md for now

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: don't scan staged while using --bare option, fixes to make it work with the latest master

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: small refactor according to #1518

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

---------

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
2023-08-03 11:23:41 -05:00
Miccah b54683acb9 gitparse: Use an object for currentDiff (#1573)
* gitparse: Use an object for currentDiff instead of a pointer

* gitparse: Use an object for currentCommit instead of a pointer

* Revert "gitparse: Use an object for currentCommit instead of a pointer"

This reverts commit c5f0708b4a.
2023-07-31 11:39:14 -05:00
Miccah 6bd48583ae Fix gitparse from panicking on a nil-pointer (#1570) 2023-07-28 11:15:02 -05:00
Zachary Rice 3897454dbb add merge support (#1561) 2023-07-27 09:24:49 -05:00
Richard Gomez f48a635c34 feat: update gitparse logic (#1486) 2023-07-25 17:52:34 -05:00
Zachary Rice 452734adc8 remove head from git diff command, rename unstaged to staged (#1439) 2023-06-29 15:33:30 -05:00
Zachary Rice c28c70b399 fix new git file plus plus plus bug (#1386) 2023-06-08 18:29:11 -05:00
Bill Rich f2924f3061 Make sure context lines are properly handled (#1331)
* Make sure context lines are properly handled

* Fix git test to account for context change
2023-05-05 12:51:27 -07:00
ahrav 714c480931 Add log to track git log size (#1325)
* Add log to track git log size.

* Add calc for large commits and last commit.
2023-05-02 16:36:39 -07:00
Zachary Rice 458c79165a fix extra log messages (#1253)
* fix extra log messages

* add small test, move flag to isindex
2023-04-13 09:53:21 -05:00
Dustin Decker 8f10938bf7 forager requires direct access to gitparse.FromReader (#1233) 2023-04-02 17:54:43 -07:00
Zachary Rice fb9ae75661 Support for exclude globs at the git log level (#1202)
* init

* seems to be working

* better comment

* rm conditional

* Add more context to exclude-globs description
2023-03-28 10:46:03 -05:00
ahrav aa47e5e248 Only scanned staged git changes. (#1143) 2023-03-01 08:58:36 -08:00
Bill Rich ae2d510ced Gitparse message fix (#1125)
* Fix messages being reused

* Add comment about change.
2023-02-23 15:20:54 -08:00
Bill Rich f1582aafa9 Drop tabs for filenames with spaces (#1115) 2023-02-16 17:15:32 -08:00
Bill Rich 9158dcaa80 Correctly parse most filenames with ' and ' (#1113) 2023-02-16 14:11:35 -08:00
Miccah 161e499142 [chore] Remove logrus from trufflehog (#1095)
* [chore] Remove logrus from trufflehog

* Minor fixes

* Fix logFatal call

* Fix logrus call
2023-02-14 17:00:07 -06:00
Bill Rich b37080e6a5 Add max commit size (#1079)
* Add max commit size

* Use common.IsDone

* Use breaks instead of return
2023-02-07 15:25:00 -08:00
Bill Rich af6e3f8fdf Pull gitparse config options out of pkg consts (#1072)
* Pull gitparse config options out of pkg consts.

* Adjust naming
2023-02-04 13:19:23 -08:00
Bill Rich 00ebb2ed64 Full git log when targeting base merge commit (#1044)
* Full git log when targeting merge commits

* Full log is needed whenever base is specified.
2023-01-26 09:17:54 -08:00
Bill Rich ac1dd23d37 Limit diff size to prevent out of control memory use. (#1035)
* Limit diff size to prevent out of control memory use.

* Group consts
2023-01-23 10:14:10 -08:00
Miccah 4aab7b7276 Buffer commit log processing (#845)
Some very large commits take a lot of time to process, which we can make
progress on while we are scanning the contents of other commits.
2022-10-12 14:55:08 -05:00
ahrav 92f40c2031 [THOG-709] - Recover from detector panics (#810) 2022-09-22 07:01:10 -07:00