Playing with the LANL ARCS Data Sets

The Los Alamos National Laboratory (LANL) has the Advanced Research in Cyber Systems (ARCS) group that provides an intereting data set for cybersecurity purposes.

Since cybersecurity datasets are difficult to come by, I decided to play a bit with this dataset. No particular purpose in mind besides just playing and maybe exploring some different technologies (duckdb, neo4j, llm). Basically a memo to self in playing with data sets.

The page from LANL provides a very nice overview per data set, including example data. This blog will focus on the Comprehensive, Multi-Source Cyber-Security Events data set.

Main take aways for me were:

Just throwing data together gets you nowhere.
Still lotsa fun to play with all kinds of technology
Maybe duckdb isn’t that bad of a format to exchange data

Continue reading “Playing with the LANL ARCS Data Sets”