When working with large codebases, having clear documentation about each directory’s purpose and contents is crucial. This guide shows how to use Codegen and AI to automatically generate a hierarchical README that explains your codebase structure.
Generating Directory READMEs
Here’s how to recursively generate README files for each directory using AI:
def generate_directory_readme(directory):
# Skip non-source directories
if any(skip in directory.name for skip in [
'node_modules', 'venv', '.git', '__pycache__', 'build', 'dist'
]):
return
# Collect directory contents for context
files = [f for f in directory.files if f.is_source_file]
functions = directory.functions
classes = directory.classes
# Create context for AI
context = {
"Directory Name": directory.name,
"Files": [f"{f.name} ({len(f.source.splitlines())} lines)" for f in files],
"Functions": [f.name for f in functions],
"Classes": [c.name for c in classes]
}
# Generate directory summary using AI
readme_content = codebase.ai(
prompt="""Generate a README section that explains this directory's:
1. Purpose and responsibility
2. Key components and their roles
3. How it fits into the larger codebase
4. Important patterns or conventions
Keep it clear and concise.""",
target=directory,
context=context
)
# Add file listing
if files:
readme_content += "\n\n## Files\n"
for file in files:
# Get file summary from AI
file_summary = codebase.ai(
prompt="Describe this file's purpose in one line:",
target=file
)
readme_content += f"\n### {file.name}\n{file_summary}\n"
# List key components
if file.classes:
readme_content += "\nKey classes:\n"
for cls in file.classes:
readme_content += f"- `{cls.name}`\n"
if file.functions:
readme_content += "\nKey functions:\n"
for func in file.functions:
readme_content += f"- `{func.name}`\n"
# Create or update README.md
readme_path = f"{directory.path}/README.md"
if codebase.has_file(readme_path):
readme_file = codebase.get_file(readme_path)
readme_file.edit(readme_content)
else:
readme_file = codebase.create_file(readme_path)
readme_file.edit(readme_content)
# Recursively process subdirectories
for subdir in directory.subdirectories:
generate_directory_readme(subdir)
# Generate READMEs for the entire codebase
generate_directory_readme(codebase.root_directory)
This will create a hierarchy of README.md files that explain each directory’s purpose and contents. For example:
# src/
Core implementation directory containing the main business logic and data models.
This directory is responsible for the core functionality of the application.
## Key Patterns
- Business logic is separated from API endpoints
- Models follow the Active Record pattern
- Services implement the Repository pattern
## Files
### models.py
Defines the core data models and their relationships.
Key classes:
- `User`
- `Product`
- `Order`
### services.py
Implements business logic and data access services.
Key classes:
- `UserService`
- `ProductService`
Key functions:
- `initialize_db`
- `migrate_data`
Customizing the Generation
You can customize the README generation by modifying the prompts and adding more context:
def get_directory_patterns(directory):
"""Analyze common patterns in a directory"""
patterns = []
# Check for common file patterns
if any('test_' in f.name for f in directory.files):
patterns.append("Contains unit tests")
if any('interface' in f.name.lower() for f in directory.files):
patterns.append("Uses interface-based design")
if any(c.is_dataclass for c in directory.classes):
patterns.append("Uses dataclasses for data models")
return patterns
def generate_enhanced_readme(directory):
# Get additional context
patterns = get_directory_patterns(directory)
dependencies = [imp.module for imp in directory.imports]
# Enhanced context for AI
context = {
"Common Patterns": patterns,
"Dependencies": dependencies,
"Parent Directory": directory.parent.name if directory.parent else None,
"Child Directories": [d.name for d in directory.subdirectories],
"Style": "technical but approachable"
}
# Generate README with enhanced context
# ... rest of the generation logic
Best Practices
-
Keep Summaries Focused: Direct the AI to generate concise, purpose-focused summaries.
-
Include Key Information:
- Directory purpose
- Important patterns
- Key files and their roles
- How components work together
-
Maintain Consistency: Use consistent formatting and structure across all READMEs.
-
Update Regularly: Regenerate READMEs when directory structure or purpose changes.
-
Version Control: Commit generated READMEs to track documentation evolution.
The AI-generated summaries are a starting point. Review and refine them to ensure accuracy and completeness.
Be mindful of sensitive information in your codebase. Configure the generator to skip sensitive files or directories.