Architecture - DRAGON

Vue d'ensemble de l'architecture

DRAGON est construit selon une architecture modulaire en couches, permettant une separation claire des responsabilites et une evolution independante de chaque composant.

Architecture en couches

flowchart TB subgraph UI["Couche Interface"] TUI["dragon-analyze-tui
Interface Console"] CLI["dragon-analyze
Ligne de Commande"] API["API REST
(futur)"] end subgraph Core["Couche Metier"] ANALYZER["Analyzer
Orchestration"] CHUNKER["Chunker
Decoupage"] PATTERN["Pattern Detector
Detection doublons"] CATALOG["Catalog Manager
Metadata"] end subgraph GPU["Couche Acceleration"] SCHEDULER["GPU Scheduler
Ordonnancement"] HASHER["GPU Hasher
Calcul empreintes"] INDEX["GPU Index
B+Tree / Bloom"] end subgraph Infra["Couche Infrastructure"] LOCAL["Local GPU
CUDA/DirectX"] REMOTE["Remote GPU
TCP Workers"] STORAGE["Storage
Fichiers/Blocs"] end UI --> Core Core --> GPU GPU --> Infra style UI fill:#1e40af,stroke:#3b82f6 style Core fill:#065f46,stroke:#10b981 style GPU fill:#7c2d12,stroke:#f59e0b style Infra fill:#581c87,stroke:#a855f7

Composants principaux

1. Interface Utilisateur (TUI)

L'interface console interactive permet de controler l'analyse en temps reel avec une vue detaillee de la progression et des statistiques.

Structure de l'interface TUI

flowchart LR subgraph TUI["dragon-analyze-tui"] MAIN["Main Loop
FTXUI"] TABS["Tab Manager"] subgraph TabViews["Onglets"] T1["Progress
Barre + Stats"] T2["Files
Liste fichiers"] T3["GPU
Metriques GPU"] T4["Logs
Messages"] end end subgraph Events["Evenements"] KEY["Keyboard Input"] TICK["Timer 100ms"] CALLBACK["Progress Callback"] end KEY --> MAIN TICK --> MAIN CALLBACK --> MAIN MAIN --> TABS TABS --> TabViews

Fonctionnalites TUI

4 onglets : Progress, Files, GPU, Logs
Progression temps reel avec ETA
Graphique de deduplication
Configuration des limites ressources
Selection du mode chunking

Raccourcis clavier

Tab/Shift+Tab : Navigation onglets
Entree : Demarrer analyse
Echap : Annuler / Quitter
+/- : Ajuster limites CPU/RAM
3/4 : Changer mode chunking

2. Analyzer (Orchestration)

Le composant central qui coordonne toutes les etapes de l'analyse et de la deduplication.

Flux de traitement de l'Analyzer

stateDiagram-v2 [*] --> Idle: Initialisation Idle --> Scanning: scan(path) Scanning --> Scanning: Parcours recursif Scanning --> Chunking: Fichiers decouverts Chunking --> Hashing: Blocs crees Hashing --> Indexing: Hash calcules Indexing --> Detecting: Index mis a jour Detecting --> Reporting: Patterns detectes Reporting --> Idle: Rapport genere Scanning --> Cancelled: cancel() Chunking --> Cancelled: cancel() Hashing --> Cancelled: cancel() Cancelled --> Idle: reset() note right of Scanning FileWalker recursif Filtre par extension Estimation taille end note note right of Hashing GPU Scheduler Parallelisme massif Metriques temps reel end note

Algorithme principal de l'Analyzer

Initialisation : Charger config, initialiser GPU manager
Scan : Parcourir recursivement le repertoire source
Pour chaque fichier :
- Verifier taille minimale (seuil GPU)
- Decouper en chunks selon le mode (Fixed/CDC)
- Soumettre les blocs au GPU Scheduler
Collecte : Recuperer les hash calcules par le GPU
Detection : Identifier les blocs dupliques via l'index
Rapport : Generer statistiques et catalogue

3. GPU Scheduler

L'ordonnanceur intelligent qui repartit le travail entre les GPUs disponibles (locaux et distants).

Architecture du GPU Scheduler

flowchart TB subgraph Scheduler["GPU Scheduler"] QUEUE["Work Queue
FIFO + Priorite"] DISPATCH["Dispatcher
Load Balancing"] MONITOR["Monitor
Metriques"] RETRY["Retry Manager
Fault Tolerance"] end subgraph Workers["GPU Workers"] W1["Worker 1
GPU Local 0"] W2["Worker 2
GPU Local 1"] W3["Worker 3
Remote 192.168.1.10"] W4["Worker 4
Remote 192.168.1.11"] end subgraph Results["Resultats"] BUFFER["File Result Buffer
Reassemblage"] CALLBACK["Completion Callback"] end QUEUE --> DISPATCH DISPATCH --> W1 DISPATCH --> W2 DISPATCH --> W3 DISPATCH --> W4 W1 --> BUFFER W2 --> BUFFER W3 --> BUFFER W4 --> BUFFER BUFFER --> CALLBACK W1 -.-> MONITOR W2 -.-> MONITOR W3 -.-> MONITOR W4 -.-> MONITOR MONITOR --> DISPATCH RETRY --> QUEUE

Composant	Role	Strategie
Work Queue	File d'attente des blocs	FIFO avec priorite par taille fichier
Dispatcher	Attribution au meilleur worker	Weighted Round Robin + metriques
Monitor	Suivi performance workers	Moyenne glissante (smoothing 0.3)
Retry Manager	Gestion des echecs	2 retries, timeout exponentiel
Result Buffer	Reassemblage par fichier	Out-of-order completion

Flux de donnees

Pipeline de traitement complet

flowchart LR subgraph Input["Entree"] FILE["Fichier
source.dat"] end subgraph Chunk["Chunking"] C1["Chunk 1
16 KB"] C2["Chunk 2
12 KB"] C3["Chunk 3
20 KB"] CN["..."] end subgraph Block["WorkBlocks"] B1["Block 1
+ metadata"] B2["Block 2
+ metadata"] BN["..."] end subgraph GPU["GPU Processing"] HASH["Hash
XXH3 + SHA256"] end subgraph Result["Resultats"] H1["Hash256 1"] H2["Hash256 2"] HN["..."] end subgraph Index["Indexation"] BLOOM["Bloom Filter
Pre-check"] BTREE["B+Tree
Lookup"] end subgraph Output["Sortie"] NEW["Nouveau bloc"] DUP["Doublon
-> Reference"] end FILE --> Chunk Chunk --> Block Block --> GPU GPU --> Result Result --> Index BLOOM --> |"maybe"| BTREE BLOOM --> |"no"| NEW BTREE --> |"found"| DUP BTREE --> |"not found"| NEW

Structure du WorkBlock

Anatomie d'un WorkBlock

classDiagram class WorkBlock { +uint64 block_id +uint64 file_id +uint64 offset +uint64 size +vector~uint8~ data +TimePoint created_at +atomic~bool~ completed +atomic~bool~ failed +set~string~ excluded_workers +vector~FingerprintResult~ results +size_mb() double +elapsed_ms() double +processing_time_ms() double } class FingerprintResult { +Hash256 hash +uint32 chunk_size +uint64 offset_in_block } class Hash256 { +array~uint8, 32~ bytes +is_zero() bool +operator==() +operator<() } WorkBlock "1" --> "*" FingerprintResult FingerprintResult "1" --> "1" Hash256

Configuration

DRAGON utilise plusieurs niveaux de configuration pour s'adapter aux differents environnements et cas d'utilisation.

Configuration globale

config/default.yaml

num_threads: auto (detecte CPU)
max_ram_usage: 70% systeme
use_gpu: true
chunk_size_default: 16 KB
hash_algorithm: xxh3_sha256

Configuration GPU

dragon_gpu.yaml

local_gpus: [0, 1]
memory_limit: 70%
block_size_min: 1 MB
block_size_max: 64 MB
remote_gpus: [...]

Parametre	Defaut	Description
gpu_threshold	10 MB	Taille min fichier pour GPU (sinon CPU)
chunk_sizes	[4K, 8K, 16K, 32K, 64K]	Tailles de chunks a tester
max_retries	2	Tentatives avant echec definitif
timeout_base_ms	5000	Timeout de base pour operations GPU
timeout_multiplier	2.5	Multiplicateur pour timeout adaptatif
smoothing_factor	0.3	Lissage des metriques GPU

Formats de fichiers

Catalogue de deduplication

Structure du catalogue

erDiagram CATALOG ||--o{ FILE : contains FILE ||--o{ CHUNK_REF : has CHUNK_REF }o--|| UNIQUE_BLOCK : references CATALOG { uint32 magic_number uint32 version uint64 total_files uint64 total_chunks uint64 unique_blocks uint64 total_size uint64 dedup_size datetime created_at } FILE { uint64 file_id string path uint64 size datetime mtime uint32 chunk_count } CHUNK_REF { uint64 file_id uint32 chunk_index uint64 block_id uint32 offset_in_block uint32 size } UNIQUE_BLOCK { uint64 block_id Hash256 hash uint64 size uint32 ref_count uint64 storage_offset }

Stack technologique

C++

Langage

C++20 (modules, concepts, ranges)
CMake 3.20+ pour build
MSVC / GCC / Clang

GPU

Acceleration

CUDA 11+ (NVIDIA)
DirectX Compute (Windows)
Potentiel : OpenCL, Vulkan

Lib

Bibliotheques

FTXUI : Interface console
CLI11 : Parsing arguments
xxHash : Hachage rapide
GoogleTest : Tests unitaires

Architecture Technique