Wednesday, November 27, 2024

GIAC Machine Learning Engineer (GMLE) Certification: A Comprehensive Pre-course Prep and Study Guide

As a recent graduate of the GIAC Machine Learning Engineer (GMLE) certification program, I've been frequently asked about my experience and recommendations for those considering or currently pursuing this new course (SEC595). In this article, I'll share my insights and a curated list of supplemental resources that I found invaluable during my studies.

My Learning Approach

I believe in reinforcing concepts through diverse explanations and examples. While the SANS SEC 595 course material is comprehensive, I sought additional resources to deepen my understanding. This approach may be more than necessary for some, but for those who desire a broader perspective or struggle with certain concepts, these supplemental materials can be incredibly helpful. 

General Recommendations

Before diving into specific resources, I highly recommend exploring the work of Andrew Ng. His explanations of machine learning, deep learning, and AI concepts are exceptional and complement the SANS course material well. You can find his content on platforms like deeplearning.ai.

Originally this course description specifically stated you did not need to know much Python or mathematics prior to taking the course. However, it appears they have now updated it (probably based on some passionate feedback) to say "Intermediate Python fluency is important. Pre-calculus mathematics skills are important but not required." I included Python resources below that cover from absolute beginner to intermediate learners.

That said, the course still claims that they will show you math but not expect you to do it. That is not strictly true. You will still need to be able to perform things like calculations of mean, median, mode and standard deviations on datasets using Python. Hypothetically, you could argue you are not required to do the math because Python performs the calculations, but you do need to understand the math. If I said "Find the most common grade for these three tests" you should be able to say you're looking for the mode. I recommend when making your index that you create a separate index of just the mathematical equations and what page numbers they can be found on.

Section-by-Section Supplemental Resources

Below, I've mapped out supplemental learning resources to each section of the GMLE course. Most of these are free, though some may require a subscription. These are the courses and resources I used to help me pass the certification and more fully understand the material.

1. Introduction and Foundations

1.1-1.2: Course Overview and Technology Terms

YouTube: What is Machine Learning?

Medium Article: Unveiling the Depths of AI/ML 

DeepLearning.AI: AI for Everyone

1.3: Python Refresher 

For beginners: CodeCombat Python Courses

For refreshers: Codecademy: Learn Python 3 

Recommended pre-course: Codecademy: Data and Programming Foundations for AI

1.4-1.7: Data Visualization, SQL, Document Stores, and Web Scraping

Codecademy: Intro to Data Visualization with Python 

Codecademy: Analyze Data With Python 

TutorialsPoint: MongoDB Query Document

Codecademy: Learn Web Scraping


2. Statistics and Data Exploration

2.1-2.3: Statistics, Data Exploration, and Probability

Codecademy: Statistics - Mean, Median and Mode 

Codecademy: Learn Statistics with Python

Codecademy: Fundamental Math for Data Science

2.4: Time Domain vs Frequency Domain 

YouTube: Time and Frequency Domains Explained


3. Machine Learning Algorithms

3.1-3.4: Clustering, Support Vector Machines, Decision Trees, and Random Forests

Codecademy: Build a Machine Learning Model

Codecademy: Machine Learning - Clustering with K-Means

YouTube: Support Vector Machines Explained 

Codecademy: Machine Learning - Random Forests and Decision Trees


4. Advanced Machine Learning Concepts

4.1-4.4: Linear Regression, Neural Networks, Feature Selection, and Categorical Outputs 

Codecademy: Linear Regression in Python

Codecademy: Introduction to Deep Learning with Tensorflow 

Codecademy: Principal Component Analysis Intro 

Codecademy: Deep Learning with Tensorflow Classification


5. Advanced Topics in AI/ML

5.2-5.4: Convolutional Neural Networks, Embeddings, and Autoencoders

Codecademy: Deep Learning with TensorFlow - Image Classification 

Codecademy: Intro to Language Models in Python 

YouTube: Autoencoders in Deep Learning


6. Advanced Applications

6.1-6.2: Convolutional Neural Networks and Genetic Algorithms

DeepLearning.AI: Deep Learning Specialization

Codecademy: Intro to Hyperparameter Tuning with Python


Conclusion

While these resources greatly enhanced my learning experience, it's important to note that the SANS course material alone is sufficient for most students to pass the GMLE certification. These supplemental materials are for those who, like me, benefit from multiple perspectives or seek a deeper understanding of the concepts.

Remember, the key to success in this course is consistent practice and hands-on application of the concepts. Don't just passively consume the material – engage with it, experiment, and most importantly, enjoy the learning process!

Good luck on your #GMLE #SEC595 journey!

Tuesday, November 26, 2024

CTI Guide for the Use of "Targeted" in Analytical Threat Intelligence Reporting

Introduction

When discussing cyber threat actors and their activities, it's important to use precise language, especially regarding terms like "targeted." Often threat intelligence providers or cyber threat intelligence (CTI) reporting will use the word targeted when they are describing the victimology of a threat actor group or campaign, even when the attack was opportunistic in nature. For example, in describing the MoveIt exploit campaign, CTI reports would note that the attack “targeted multiple industries including healthcare, retail and education” which can be misleading since it may imply to a non-CTI reader that the threat actors specifically selected companies in those industries prior to launching the attack.

While there are some examples of specific organizations that were truly pre-selected by threat actors for persistent campaigns, most threat activity is opportunistic in nature and the word targeted maybe only appropriate to describe the practice of Big Game Hunting (BGH) by some groups. CTI teams risk desensitization of their readers when targeted or targeting is used excessively or inappropriately as it undermines the significance of the word when of when targeted activity does take place.

Further, it is important that cyber threat intelligence teams are able to communicate that opportunistic targeting generally compromises the vast majority of cyber threat breaches and likely presents the most significant risk to an organization. It is not uncommon for those outside the cyber threat intelligence practice to generally regard the concept of a threat actor group targeting an industry or organization specifically as a greater risk. 

This guide openly acknowledges there is a lot of variability and debate around use of the word targeted. For example, if a threat actor group have a large victimology in healthcare and education – do you describe that as targeted or do you acknowledge companies in those sectors are often included in campaigns due to their reliance on critical data and generally lower levels of cybersecurity investment compared to financial or government entities? Do you consider a threat actor group mass exploiting vulnerabilities and then selecting specific organizations or sectors from their victimology to infect with ransomware based on a belief they are more likely to pay a type of "targeting"?

This are all very important concepts that cyber threat intelligence teams should consider standardizing into their team's analytical writing style guidance for analysts. It is not as important that everyone agrees what the "right" way to use the word targeting, as much as it is important that a framework is agreed on and standardized so reports from different analysts form the same team do not conflict.

An example:

To understand why this can be a problem in CTI reporting, let’s look at a specific real-world example.

In 2023, many CTI reports would have described the Qilin group as "targeting healthcare" to describe their activity, trying to communicate that the Qilin victimology included companies that are healthcare and pharmaceutical. Or that campaigns Qilin ran attacked many organizations in many different verticals but had statistically notable success against companies classified as pharmaceutical or healthcare. However, they are a financially motivated and opportunistic group. The Qilin group was not targeting healthcare organizations specifically in 2023, in fact only 7% of their total victims on the data leak site were healthcare. To make matters more confusing even HC3’s recent published advisory on the group stated, “Qilin is a ransomware-as-a-service (RaaS) offering in operation since 2022, and which continues to target healthcare organizations and other industries worldwide... The group’s targeting appears to be opportunistic rather than targeted.”[1]

However, in June 2024 Qilin posted on their data leak site “We also officially declare that in the near future there will be a series of attacks on medical institutions U.S.A”. Although the truthfulness of this statement maybe debated[i], it is a distinctive shift in the way we understand Qilin operates. 

If CTI teams have used the word “targeted” historically to describe this group, reporting on this supposed shift in tactics may lose its impact on non-CTI readers.

Purpose

This guide hopes to provide example guidelines for CTI teams around the use of language describing targeting so that teams speak with a consistent voice in reports and communications. This guide does not claim this is the only correct way, only to acknowledge there is a real need for consistency.

When to Use "Targeted"

"Targeted" should be used when there is clear evidence that a threat actor specifically chose and pursued a particular organization, industry, or group. This typically involves:

  • Customized tactics: The threat actor tailored their techniques or malware specifically for the victim.
  • Persistent efforts: Multiple attempts or a sustained campaign focused on the same target.
  • Specific victim selection: Evidence that the actor deliberately chose the victim based on certain attributes.

When to Avoid "Targeted"

Avoid using "targeted" in the following scenarios:

  • Opportunistic attacks: When threat actors cast a wide net and attack any vulnerable system they encounter.
  • Broad campaigns: Attacks affecting multiple industries or a large number of organizations without clear focus.
  • Big Game Hunting: While these attacks go after specific types of victims, they're often based on general criteria like company size or potential ransom value rather than targeting a specific entity.

Alternative Terminology

Instead of "targeted," consider using more precise language:

  • "Affected" or "impacted" for general victims of an attack
  • "Focus on" or "prioritize" for industries or sectors that receive more attention from threat actors
  • "Opportunistically compromised" for victims of non-targeted attacks
  • "The organization was impacted by a campaign associated with [APT group]."

This phrasing acknowledges the effect on the organization without implying that they were the primary target.

  • "Organizations in [specific industry/sector] are frequently affected by the campaigns of [APT group]."

This approach highlights the trend of certain industries being commonly affected without asserting direct targeting.

  • "The attack victims typically include companies from [specific industries/sector], often due to [reasons like size, data value, etc.]."

This emphasizes that while certain types of companies are commonly affected, the focus may be on characteristics like company size or the value of data rather than specific targeting.

  • "The campaign is known to affect a broad range of organizations, with a particular impact on companies in [industry]."

This phrasing allows for the inclusion of industry-related patterns without suggesting direct targeting.

Key Considerations

  • Evidence-based language: Only use "targeted" when you have concrete evidence of specific targeting.
  • Avoid assumptions: Don't assume targeting without clear indicators.
  • Contextual analysis: Consider the broader context of the threat actor's activities and motivations.

By adhering to these guidelines, cyber threat intelligence professionals can provide more accurate and nuanced reporting on threat actor activities, avoiding the common pitfall of overusing the term "targeted" when it may not be appropriate

 

GIAC Machine Learning Engineer (GMLE) Certification: A Comprehensive Pre-course Prep and Study Guide

As a recent graduate of the GIAC Machine Learning Engineer (GMLE) certification program, I've been frequently asked about my experience ...