1. Handling Large Data Streams
Generators excel at memory efficiency. Use them to process large files, databases, or infinite sequences without loading everything into memory.
Example: Large File Processing
def read_large_file(file_path, chunk_size=1024): with open(file_path, 'r') as file: while True: chunk = file.read(chunk_size) if not chunk: break yield chunk # Process a 10GB file in 1KB chunks for chunk in read_large_file('huge_data.log'): process(chunk) # Custom processing function
2. Parallel Processing with Generators
Combine generators with concurrent.futures or asyncio for parallel execution (e.g., processing multiple data streams).
Example: Async Generator (Python 3.6+)
import asyncio async def async_data_fetcher(urls): for url in urls: data = await fetch(url) # Assume `fetch` is an async function yield data async def main(): urls = ['url1', 'url2', 'url3'] async for data in async_data_fetcher(urls): print(data) asyncio.run(main())
3. Generator Pipelines
Chain generators to create data processing pipelines (e.g., filtering → transforming → aggregating).
Example: Data Pipeline
def filter_positive(numbers): for n in numbers: if n > 0: yield n def square(numbers): for n in numbers: yield n ** 2 def pipeline(data): filtered = filter_positive(data) squared = square(filtered) yield from squared # Execute the pipeline data = [-2, -1, 0, 1, 2] result = pipeline(data) print(list(result)) # Output: [1, 4]
4. Stateful Generators
Retain complex state between iterations (e.g., tracking progress, managing connections).
Example: Batch Database Queries
def batch_query(query, batch_size=1000): offset = 0 while True: results = execute_query(f"{query} LIMIT {batch_size} OFFSET {offset}") if not results: break yield results offset += batch_size # Process batches of 1000 records for batch in batch_query("SELECT * FROM huge_table"): process_batch(batch)
5. Infinite Generators
Generate sequences indefinitely (e.g., sensor data, real-time streams).
Example: Real-Time Sensor Data
import random import time def sensor_data_stream(): while True: yield { 'timestamp': time.time(), 'value': random.uniform(0, 100) } time.sleep(1) # Simulate delay # Monitor indefinitely for data in sensor_data_stream(): print(f"Time: {data['timestamp']}, Value: {data['value']}")
6. Advanced Generator Control
Use .send(), .throw(), and .close() for bidirectional communication and error handling.
Example: Generator with .send()
def interactive_generator(): total = 0 while True: value = yield total # Pause and wait for input via `.send()` if value is None: break total += value gen = interactive_generator() next(gen) # Prime the generator print(gen.send(10)) # Output: 10 print(gen.send(20)) # Output: 30 gen.close() # Terminate the generator
7. Memory-Optimized Generators
Avoid materializing large datasets with yield from and itertools.
Example: Combining Generators with itertools
import itertools def generate_combinations(elements): for r in range(1, len(elements)+1): yield from itertools.combinations(elements, r) # Generate combinations on-the-fly for combo in generate_combinations(['A', 'B', 'C']): print(combo)
8. Error Handling in Heavy-Duty Generators
Gracefully handle exceptions during iteration.
Example: Fault-Tolerant Generator
def fault_tolerant_generator(data): for item in data: try: result = process_risky_item(item) # May raise exceptions yield result except Exception as e: log_error(e) continue
Performance Tips
-
Lazy Evaluation: Delay computation until needed.
-
Avoid Materialization: Use
yieldinstead of building lists. -
Optimize Chunk Sizes: For I/O-bound tasks (e.g., file reading), tune
chunk_sizeto balance memory and speed. -
Use C Extensions: For CPU-heavy tasks, pair generators with libraries like
numpyorCython.
When to Use Heavy-Duty Generators
-
Big Data: Process terabytes of data without OOM errors.
-
Real-Time Streams: Handle live data feeds (e.g., logs, sensors).
-
Resource-Constrained Environments: Minimize memory footprint.












