Research Note

Back to research

Reverse Engineering Obfuscated JavaScript | Donnie Celestre

A practical research note on turning layered JavaScript obfuscation into behavior-level understanding that supports triage, detection, and downstream investigation.

Thesis: the goal is not just to make ugly JavaScript readable. The goal is to recover enough structure and behavior that an analyst can explain what the code does, why it matters, and what should happen next.

Why This Problem Matters

Obfuscated JavaScript appears in phishing kits, skimmers, malicious browser loaders, staged droppers, and suspicious third-party web assets. In many cases the real difficulty is not the language itself. It is the way the code is intentionally shaped to waste analyst time: string arrays, runtime decoding, flattened control flow, dead branches, and dynamically constructed execution paths.

If the workflow stops at “partially readable,” the output is still weak. Defensive teams need something more concrete:

  • what the script is trying to achieve
  • what inputs or browser surfaces it touches
  • where the network or credential handling risk appears
  • what indicators, patterns, or detections can be derived from it

A Practical Workflow

I approach this type of sample as a behavior recovery problem, not just a beautification problem.

1. Profile The Sample Before Transforming It

Before editing or executing anything, establish a quick profile:

  • how many layers of encoding or indirection are visible
  • whether the script uses packed string arrays or decoder helpers
  • whether dangerous behavior is obvious from API names, DOM access, storage access, fetch/XHR usage, or dynamic evaluation
  • whether the file looks like a standalone payload, a web skimmer component, or a loader stage

This initial pass narrows the likely family of transforms needed and prevents unnecessary execution.

flowchart LR A["Suspicious JavaScript sample"] --> B["Profile the sample"] B --> C["Identify obfuscation traits"] C --> D["Apply AST transforms"] D --> E["Recover meaningful behavior"] E --> F["Summarize analyst findings"] F --> G["Derive detections and next actions"]

2. Normalize Structure With AST Passes

For heavily obfuscated JavaScript, regex cleanup becomes fragile very quickly. AST-driven transforms are more reliable because they work on structure rather than text shape.

A typical sequence looks like this:

for (const pass of transforms) {
  ast = pass(ast);
}

const suspiciousNodes = tagBehavior(ast, {
  watchCalls: ["eval", "Function", "fetch", "XMLHttpRequest"],
  watchStorage: true,
  watchDomCollection: true,
});

The exact transforms change by sample, but the usual priorities are:

  • resolve packed string arrays
  • fold simple constant expressions
  • remove unreachable or noise-heavy branches
  • simplify flattened control flow where possible
  • rename helper functions and temporary values into something analyst-readable

3. Separate Readability From Meaning

Readable output is helpful, but it is not enough by itself. The real goal is to extract behavior:

  • credential harvesting or form interception
  • DOM scraping or hidden-field collection
  • staged fetch/download behavior
  • cookie or local storage access
  • anti-analysis checks or environment gating
  • redirection, injection, or loader behavior

That separation matters because some samples never become fully clean. Even partial deobfuscation can still produce a strong behavioral answer if the analysis is organized correctly.

What Good Output Looks Like

For a defensive workflow, a successful reverse engineering pass should produce at least four useful outputs.

flowchart TD A["Recovered behavior"] --> B["Behavioral summary"] A --> C["Technical evidence"] A --> D["Detection ideas"] A --> E["Analyst caveats"] B --> F["Incident handling guidance"] C --> G["Evidence-backed review"] D --> H["Detection engineering and hunting"] E --> I["Known uncertainty and next pivots"]

Behavioral Summary

This is the short explanation an analyst or incident handler can act on. It should answer:

  • what the script collects, modifies, executes, or transmits
  • which browser or runtime surfaces it depends on
  • whether the behavior looks like skimming, credential interception, staged loading, or evasion

Technical Evidence

This is the supporting layer: notable functions, decoded strings, endpoint patterns, storage keys, suspicious selectors, or dynamic call sites.

Detection Ideas

Recovered behavior should be convertible into something useful for downstream teams:

  • suspicious string or selector patterns
  • risky browser API combinations
  • loader patterns tied to dynamic execution
  • endpoint naming or request-shape indicators

Analyst Caveats

Every serious note should also say what remains uncertain. Some paths only resolve at runtime. Some strings are environment-dependent. Some loaders only activate under particular DOM conditions. Ambiguity should be documented, not hidden.

Where Analysts Lose Time

There are a few common failure modes in JavaScript deobfuscation work:

Chasing Full Reconstruction Too Early

Trying to make every line pristine before tagging behavior slows the entire investigation. The better path is to recover enough structure to identify the meaningful branches first.

Treating Every Helper Function As Equally Important

Most obfuscated samples contain large amounts of noise. The priority should be code that affects execution flow, decoding, network behavior, data collection, or persistence.

Producing Output That Stops At The Researcher

If the result cannot be handed to detection engineering, incident response, or a threat-hunting workflow, then the analysis has not fully landed. Reverse engineering should feed operations, not just documentation.

What I Would Capture In A Full Case Study

If this note were expanded into a larger write-up or portfolio case study, I would include:

  • the initial sample profile and visible obfuscation traits
  • the transform pipeline used to reduce indirection
  • before/after examples of decoded logic
  • the final behavioral map of the script
  • the detection hypotheses derived from the analysis

That structure makes the work legible to both technical reviewers and hiring teams. It shows not only that the code can be unpacked, but that the result becomes operationally useful.

Closing Takeaway

The real benchmark for reverse engineering obfuscated JavaScript is not whether the code becomes elegant. It is whether the analysis can move from opaque script to clear behavior to actionable defensive output without wasting time on unnecessary cleanup.