This article is part of a series of ChatGPT experiments, in which I give ChatGPT a prompt, share its response, and offer some reflections.
Prompt
Can you write a golang program that will scan a directory and all subdirectories and identify duplicate files within them?
Response
Yes, here's a simple program in Go that scans a directory and all its subdirectories and identifies duplicate files based on their content:
package main
import (
"crypto/sha256"
"fmt"
"io/ioutil"
"os"
"path/filepath"
)
func hashFile(filePath string) ([]byte, error) {
data, err := ioutil.ReadFile(filePath)
if err != nil {
return nil, err
}
hash := sha256.Sum256(data)
return hash[:], nil
}
func findDuplicates(root string) (map[string][]string, error) {
fileHashes := make(map[string][]string)
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}
hash, err := hashFile(path)
if err != nil {
return err
}
fileHashes[string(hash)] = append(fileHashes[string(hash)], path)
return nil
})
if err != nil {
return nil, err
}
duplicates := make(map[string][]string)
for _, files := range fileHashes {
if len(files) > 1 {
duplicates[files[0]] = files[1:]
}
}
return duplicates, nil
}
func main() {
duplicates, err := findDuplicates(".")
if err != nil {
fmt.Println(err)
return
}
for file, duplicates := range duplicates {
fmt.Printf("%s is duplicated in:\n", file)
for _, duplicate := range duplicates {
fmt.Printf("\t%s\n", duplicate)
}
}
}
Reflections
This was impressive. I compiled and ran this program to ensure that it worked as expected. There were no hiccups or unexpected errors - this code compiles and executes just fine.
I did have to submit this same prompt to ChatGPT twice because it stopped responding after a few lines the first time. However, it finished writing the program smoothly the second time.
I appreciate that ChatGPT is aware of:
- how to import different Go modules cleanly in a single import block
- how to use those modules correctly
- how to give variables and functions meaningful names
- how to follow error handling conventions in Go
- how to follow indentation conventions in Go (one actual hard tab per indentation level)
- how to identify duplicate files by comparing their hashes
As a developer, I also realize some of ChatGPT's limitations, or at least its lack of intuition:
- There is no console output to declare that the program has started.
- There is no progress reporting while the program is running.
- The program can run successfully with no output if the directory has no duplicates.
- The program always scans the current directory, and cannot take a directory as a command line argument.
All of the above would be useful features that I would want to include in this tool. However, I did not specify them in the ChatGPT requirements, so I can't fault ChatGPT for not including them. They are easy enough to add myself.
This is still a huge time saver for me. I've written a program with the same functionality in Node.js, but this is a case where Go's inherent speed (from compiling down to machine code) is really important.
I wanted to rewrite my program in Go at some point, but I didn't have the depth of knowledge in Go to write it as cleanly or quickly as ChatGPT. Now I can just copy/paste this solution and start using it to be productive!