Ask ChatGPT: Use Go to find duplicate files in a directory

This article is part of a series of ChatGPT experiments, in which I give ChatGPT a prompt, share its response, and offer some reflections.

Prompt

Can you write a golang program that will scan a directory and all subdirectories and identify duplicate files within them?

Response

Yes, here's a simple program in Go that scans a directory and all its subdirectories and identifies duplicate files based on their content:

package main

import (
	"crypto/sha256"
	"fmt"
	"io/ioutil"
	"os"
	"path/filepath"
)

func hashFile(filePath string) ([]byte, error) {
	data, err := ioutil.ReadFile(filePath)
	if err != nil {
		return nil, err
	}

	hash := sha256.Sum256(data)
	return hash[:], nil
}

func findDuplicates(root string) (map[string][]string, error) {
	fileHashes := make(map[string][]string)
	err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
		if info.IsDir() {
			return nil
		}

		hash, err := hashFile(path)
		if err != nil {
			return err
		}

		fileHashes[string(hash)] = append(fileHashes[string(hash)], path)
		return nil
	})
	if err != nil {
		return nil, err
	}

	duplicates := make(map[string][]string)
	for _, files := range fileHashes {
		if len(files) > 1 {
			duplicates[files[0]] = files[1:]
		}
	}

	return duplicates, nil
}

func main() {
	duplicates, err := findDuplicates(".")
	if err != nil {
		fmt.Println(err)
		return
	}

	for file, duplicates := range duplicates {
		fmt.Printf("%s is duplicated in:\n", file)
		for _, duplicate := range duplicates {
			fmt.Printf("\t%s\n", duplicate)
		}
	}
}

Reflections

This was impressive. I compiled and ran this program to ensure that it worked as expected. There were no hiccups or unexpected errors - this code compiles and executes just fine.

I did have to submit this same prompt to ChatGPT twice because it stopped responding after a few lines the first time. However, it finished writing the program smoothly the second time.

I appreciate that ChatGPT is aware of:

  • how to import different Go modules cleanly in a single import block
  • how to use those modules correctly
  • how to give variables and functions meaningful names
  • how to follow error handling conventions in Go
  • how to follow indentation conventions in Go (one actual hard tab per indentation level)
  • how to identify duplicate files by comparing their hashes

As a developer, I also realize some of ChatGPT's limitations, or at least its lack of intuition:

  • There is no console output to declare that the program has started.
  • There is no progress reporting while the program is running.
  • The program can run successfully with no output if the directory has no duplicates.
  • The program always scans the current directory, and cannot take a directory as a command line argument.

All of the above would be useful features that I would want to include in this tool. However, I did not specify them in the ChatGPT requirements, so I can't fault ChatGPT for not including them. They are easy enough to add myself.

This is still a huge time saver for me. I've written a program with the same functionality in Node.js, but this is a case where Go's inherent speed (from compiling down to machine code) is really important.

I wanted to rewrite my program in Go at some point, but I didn't have the depth of knowledge in Go to write it as cleanly or quickly as ChatGPT. Now I can just copy/paste this solution and start using it to be productive!