
During the summer of 2024 I ran an independent research project on the CIC-DDoS2019 capture, focused specifically on DDoS amplification traffic — the kind of reflection-heavy floods that show up in misconfigured UDP services.
I trained a supervised classifier in scikit-learn with careful train/validation splits so metrics reflected generalization, not leakage across time windows. The final model cleared 99% accuracy on the held-out slices I reserved.
Exploratory work at five-million-row scale used Vaex for lazy, out-of-core aggregates and Matplotlib for the figures that convinced me which features actually separated amplification from benign bursts. Java handled some of the heavier preprocessing glue where the Python scientific stack was awkward.