Balancing Previous Methods with New Feats: AI-Powered Conversion From Enzyme to React Testing Library at Slack

On the earth of frontend improvement, one factor stays sure: change is the one fixed. New frameworks emerge, and libraries can turn into out of date with out warning. Maintaining with the ever-changing ecosystem entails dealing with code conversions, each large and small. One vital shift for us was the transition from Enzyme to React Testing Library (RTL), prompting many engineers to transform their take a look at code to a extra user-focused RTL strategy. Whereas each Enzyme and RTL have their very own strengths and weaknesses, the absence of native assist for React 18 by Enzyme offered a compelling rationale for transitioning to RTL. It’s so compelling that we at Slack determined to transform greater than 15,000 of our frontend unit and integration Enzyme checks to RTL, as a part of the replace to React 18.

We began by exploring essentially the most easy avenue of looking for out potential Enzyme adapters for React 18. Sadly, our search yielded no viable choices. In his article titled “Enzyme is dead. Now what?, Wojciech Maj, the writer of the React 17 adapter, unequivocally urged, “you need to think about in search of Enzyme various proper now.”

complicated adapter
Adapter illustrating the mismatch and incompatibility of React 18 and Enzyme

Contemplating our final purpose of updating to React 18, which Enzyme doesn’t assist, we began with a radical evaluation of the issue’s scope and methods to automate this course of. Our initiative started with a monumental process of changing greater than 15,000 Enzyme take a look at circumstances, which translated to greater than 10,000 potential engineering hours. At that scale with that many engineering hours required, it was nearly compulsory to optimize and automate that course of. Regardless of thorough opinions of current instruments and intensive Google searches, we discovered no appropriate options for this quite common downside. On this weblog, I’ll stroll you thru our strategy to automating the Enzyme-to-RTL conversion course of. It consists of analyzing and scoping the problem, using conventional Summary Syntax Tree (AST) transformations and an AI Massive Language Mannequin (LLM) independently, adopted by our customized hybrid strategy of mixing AST and LLM methodologies.

Summary Syntax Tree (AST) transformations

Our preliminary strategy centered round a extra standard manner of performing automated code conversions — Summary Syntax Tree (AST) transformations. These transformations allow us to symbolize code as a tree construction with nodes and create focused queries with conversions from one code node to a different. For instance,'selector'); can may be represented as:

ast representation
AST illustration of `‘selector’);`

Naturally, we aimed to create guidelines to deal with the commonest conversion patterns. Moreover specializing in the rendering strategies, corresponding to mount and shallow, and numerous helpers using them, we recognized essentially the most steadily used Enzyme strategies to prioritize the conversion efforts. These are the highest 10 strategies in our codebase:

{ method: 'find', count: 13244 },
{ method: 'prop', count: 3050 },
{ method: 'simulate', count: 2755 },
{ method: 'text', count: 2181 },
{ method: 'update', count: 2147 },
{ method: 'instance', count: 1549 },
{ method: 'props', count: 1522 },
{ method: 'hostNodes', count: 1477 },
{ method: 'exists', count: 1174 },
{ method: 'first', count: 684 },
... and 55 more methods

One necessary requirement for our conversion was attaining 100%-correct transformations, as a result of any deviation would end in incorrect code era. This problem was significantly pronounced with AST conversions, the place with a view to create transformations with 100% accuracy we wanted to painstakingly create extremely particular guidelines for every state of affairs manually. Inside our codebase, we discovered 65 Enzyme strategies, every with its personal quirks, resulting in a quickly increasing rule set and rising considerations about the feasibility of our efforts.

Take, for instance, the Enzyme technique discover, which accepts a wide range of arguments like selector strings, part sorts, constructors, and object properties. It additionally helps nested filtering strategies like first or filter, providing highly effective factor concentrating on capabilities however including complexity to AST manipulation.

Along with the big variety of guide guidelines wanted for technique conversions, sure logic was depending on the rendered part Doc Object Mannequin (DOM) slightly than the mere presence or absence of comparable strategies in RTL. For example, the selection between getByRole and getByTestId trusted the accessibility roles or take a look at IDs current within the rendered part. Nevertheless, AST lacks the potential to include such contextual data. Its performance is confined to processing the conversion logic primarily based solely on the contents of the file being reworked, with out consideration for exterior sources such because the precise DOM or React part code.

With every new transformation rule we tackled, the issue appeared to escalate. After establishing patterns for 10 Enzyme strategies and addressing different apparent patterns associated to our customized Jest matchers and question selectors, it grew to become obvious that AST alone couldn’t deal with the complexity of this conversion process. Consequently, we opted for a realistic strategy: we achieved comparatively passable conversions for the commonest circumstances whereas resorting to guide intervention for the extra complicated eventualities. For each line of code requiring guide changes, we added feedback with solutions and hyperlinks to related documentation. This hybrid technique yielded a modest success price of 45% robotically transformed code throughout the chosen information used for analysis. Finally, we determined to supply this device to our frontend developer groups, advising them to run our AST-based codemod first after which deal with the remaining conversions manually.

Exploring the AST supplied helpful insights into the complexity of the issue. We confronted the problem of various testing methodologies in Enzyme and RTL with no easy mapping between them. Moreover, there have been no appropriate instruments obtainable to automate this course of successfully. Because of this, we needed to search out various approaches to deal with this problem.

Massive Language Fashions (LLMs) transformations

use of LLMs
Group members enthusiastically discussing AI purposes

Amidst the widespread conversations on AI options and their potential purposes throughout the business, our workforce felt compelled to discover their applicability to our personal challenges. Collaborating with the DevXP AI workforce at Slack, who specialise in integrating AI into the developer expertise, we built-in the capabilities of Anthropic’s AI mannequin, Claude 2.1, into our workflows. We created the prompts and despatched the take a look at code together with them to our recently-implemented API endpoint.

Regardless of our greatest efforts, we encountered vital variation and inconsistency. Conversion success charges fluctuated between 40-60%. The outcomes ranged from remarkably efficient conversions to disappointingly insufficient ones, relying largely on the complexity of the duty. Whereas some conversions proved spectacular, significantly in reworking extremely Enzyme-specific strategies into practical RTL equivalents, our makes an attempt to refine prompts had restricted success. Our efforts to fine-tune prompts could have sophisticated issues, probably perplexing the AI mannequin slightly than aiding it. The scope of the duty was too giant and multifaceted, so the standalone software of AI failed to supply the constant outcomes we sought, highlighting the complexities inherent in our conversion process.

The conclusion that we needed to resort to guide conversions with minimal automation was disheartening. It meant dedicating a considerable quantity of our workforce’s and firm’s time to check migration, time that would in any other case be invested in constructing new options for our prospects or enhancing developer expertise. Nevertheless, at Slack, we extremely worth creativity and craftsmanship and we didn’t halt our efforts there. As an alternative, we remained decided to discover each potential avenue obtainable to us.

AST + LLM transformations

We determined to look at how actual people carry out take a look at conversions and determine any elements we’d have missed. One notable benefit within the comparability between guide human conversion and automatic processes was the wealth of knowledge accessible to people throughout conversion duties. People profit from invaluable insights taken from numerous sources, together with the rendered React part DOM, React part code (usually authored by the identical people), AST conversions, and intensive expertise with frontend applied sciences. Recognizing the importance of this, we reviewed our workflows and built-in most of this related data into our conversion pipeline. That is our last pipeline flowchart:

pipeline chart
Undertaking pipeline flowchart

This strategic pivot, and the mixing of each AST and AI applied sciences, helped us obtain the exceptional 80% conversion success price, primarily based on chosen information, demonstrating the complementary nature of those approaches and their mixed efficacy in addressing the challenges we confronted.

In our pursuit of optimizing our conversion course of, we applied a number of strategic selections that led to a notable 20-30% enchancment past the capabilities of our LLM mannequin out-of-the-box. Amongst these, two revolutionary approaches stood out that I’ll write about subsequent:

  1. DOM tree assortment
  2. LLM management with prompts and AST

DOM tree assortment

One essential facet of our strategy was the gathering of the DOM tree of React elements. This step proved vital as a result of RTL testing depends closely on the DOM construction of a part slightly than its inner construction. By capturing the precise rendered DOM for every take a look at case, we supplied our AI mannequin with important contextual data that enabled extra correct and related conversions.

This assortment step was important as a result of every take a look at case may need totally different setups and properties handed to the part, leading to various DOM buildings for every take a look at case. As a part of our pipeline, we ran Enzyme checks and extracted the rendered DOM. To streamline this course of, we developed adaptors for Enzyme rendering strategies and saved the rendered DOM for every take a look at case in a format consumable by the LLM mannequin. For example:

// Import authentic strategies
import enzyme, { mount as originalMount, shallow as originalShallow } from 'enzyme';
import fs from 'fs';

let currentTestCaseName: string | null = null;

beforeEach(() => {
   // Set the present take a look at case identify earlier than every take a look at
   const testName = count on.getState().currentTestName;
   currentTestCaseName = testName ? testName.trim() : null;

afterEach(() => {
   // Reset the present take a look at case identify after every take a look at
   currentTestCaseName = null;

// Override mount technique
enzyme.mount = (node: React.ReactElement, choices?: enzyme.MountRendererProps) => {
   const wrapper = originalMount(node, choices);
   const htmlContent = wrapper.html();
   if (course of.env.DOM_TREE_FILE) {
           course of.env.DOM_TREE_FILE,
           `<test_case_title>${currentTestCaseName}</test_case_title> and <dom_tree>${htmlContent}</dom_tree>;n`,
   return wrapper;

LLM management with prompts and AST

The second inventive change we needed to combine was a extra sturdy and strict controlling mechanism for hallucinations and erratic responses from our LLM. We achieved this by using two key mechanisms: prompts and in-code directions made with the AST codemod. By means of a strategic mixture of those approaches, we created a extra coherent and dependable conversion course of, making certain better accuracy and consistency in our AI-driven transformations.

We initially experimented with prompts as the first technique of instructing the LLM mannequin. Nevertheless, this proved to be a time-consuming process. Our makes an attempt to create a common immediate for all requests, together with preliminary and suggestions requests, had been met with challenges. Whereas we might have condensed our code by using a single, complete immediate, we discovered that this strategy led to a big improve within the complexity of requests made to the LLM. As an alternative, we opted to streamline the method by formulating a immediate with essentially the most vital directions that consisted of three components: introduction and common context setting, essential request (10 express required duties and 7 non-compulsory), adopted by the directions on easy methods to consider and current the outcomes:

Context setting:

`I want help changing an Enzyme take a look at case to the React Testing Library framework.
I'll offer you the Enzyme take a look at file code inside <code></code> xml tags.
I can even provide the partially transformed take a look at file code inside <codemod></codemod> xml tags.
The rendered part DOM tree for every take a look at case shall be supplied in <part></part> tags with this construction for a number of take a look at circumstances "<test_case_title></test_case_title> and <dom_tree></dom_tree>".`

Primary request:

`Please carry out the next duties:
1. Full the conversion for the take a look at file inside <codemod></codemod> tags.
2. Convert all take a look at circumstances and make sure the identical variety of checks within the file. ${numTestCasesString}
3. Exchange Enzyme strategies with the equal React Testing Library strategies.
4. Replace Enzyme imports to React Testing Library imports.
5. Alter Jest matchers for React Testing Library.
6. Return the whole file with all transformed take a look at circumstances, enclosed in <code></code> tags.
7. Don't modify anything, together with imports for React elements and helpers.
8. Protect all abstracted features as they're and use them within the transformed file.
9. Keep the unique group and naming of describe and it blocks.
10. Wrap part rendering into <Supplier retailer={createTestStore()}><Part></Supplier>. With a purpose to try this it's essential to do two issues
First, import these:
import { Supplier } from '.../supplier';
import createTestStore from '.../test-store';
Second, wrap part rendering in <Supplier>, if it was not completed earlier than.
<Supplier retailer={createTestStore()}>
<Part {...props} />
Be sure that all 10 circumstances are met. The transformed file must be runnable by Jest with none guide modifications.

Different directions part, use them when relevant:
1. "data-qa" attribute is configured for use with "display screen.getByTestId" queries.
2. Use these 4 augmented matchers which have "DOM" on the finish to keep away from conflicts with Enzyme
toBeCheckedDOM: toBeChecked,
toBeDisabledDOM: toBeDisabled,
toHaveStyleDOM: toHaveStyle,
toHaveValueDOM: toHaveValue
3. For person simulations use userEvent and import it with "import userEvent from '@testing-library/user-event';"
4. Prioritize queries within the following order getByRole, getByPlaceholderText, getByText, getByDisplayValue, getByAltText, getByTitle, then getByTestId.
5. Use question* variants just for non-existence checks: Instance "count on(display screen.question*('instance')).not.toBeInTheDocument();"
6. Guarantee all texts/strings are transformed to lowercase regex expression. Instance: display screen.getByText(/your textual content right here/i), display screen.getByRole('button', {identify: /your textual content right here/i}).
7. When asserting {that a} DOM renders nothing, substitute isEmptyRender()).toBe(true) with toBeEmptyDOMElement() by wrapping the part right into a container. Instance: count on(container).toBeEmptyDOMElement();`

Directions to judge and current outcomes:

`Now, please consider your output and ensure your transformed code is between <code></code> tags.
If there are any deviations from the required circumstances, listing them explicitly.
If the output adheres to all circumstances and makes use of directions part, you may merely state "The output meets all specified circumstances."`

The second and arguably more practical strategy we used to manage the output of the LLM was the utilization of AST transformations. This technique isn’t seen elsewhere within the business. As an alternative of solely counting on immediate engineering, we built-in the partially transformed code and solutions generated by our preliminary AST-based codemod. The inclusion of AST-converted code in our requests yielded exceptional outcomes. By automating the conversion of easier circumstances and offering annotations for all different situations via feedback within the transformed file, we efficiently minimized hallucinations and nonsensical conversions from the LLM. This system performed a pivotal position in our conversion course of. Now we have now established a sturdy framework for managing complicated and dynamic code conversions, leveraging a large number of knowledge sources together with prompts, DOM, take a look at file code, React code, take a look at run logs, linter logs, and AST-converted code. It’s value noting that solely an LLM was able to assimilating such disparate kinds of data; no different device obtainable to us possessed this functionality.

Analysis and influence

Analysis and influence assessments had been essential elements of our challenge, permitting us to measure the effectiveness of our strategies, quantify the advantages of AI-powered options, and validate the time financial savings achieved via AI integration. 

We streamlined the conversion course of with on-demand runs, delivering leads to simply 2-5 minutes, in addition to with CI nightly jobs that dealt with tons of of information with out overloading our infrastructure. The information transformed in every nightly run had been categorized primarily based on their conversion standing—absolutely transformed, partially transformed with 50-99% of take a look at circumstances handed, partially transformed with 20-49% of take a look at circumstances handed, or partially transformed with lower than 20% of take a look at circumstances handed—which allowed builders to simply determine and use essentially the most successfully transformed information. This setup not solely saved time by releasing builders from working scripts but in addition enabled them to regionally tweak and refine the unique information for higher efficiency of the LLM with the native on-demand runs.

Notably, our adoption price, calculated because the variety of information that our codemod ran on divided by the full variety of information transformed to RTL, reached roughly 64%. This adoption price highlights the numerous utilization of our codemod device by the frontend builders who had been the first shoppers, leading to substantial time financial savings. 

We assessed the effectiveness of our AI-powered codemod alongside two key dimensions: guide analysis of code high quality on particular take a look at information and go price of take a look at circumstances throughout a bigger take a look at information set. For the guide analysis, we analyzed 9 take a look at information of various complexities (three simple, three medium, and three complicated) which had been transformed by each the LLM and frontend builders. Our benchmark for high quality was set by the requirements achieved by the frontend builders primarily based on our high quality rubric that covers imports, rendering strategies, JavaScript/TypeScript logic, and Jest assertions. We aimed to match their stage of high quality. The analysis revealed that 80% of the content material inside these information was precisely transformed, whereas the remaining 20% required guide intervention.

The second dimension of our evaluation delved into the go price of take a look at circumstances throughout a complete set of information. We examined the conversion charges of roughly 2,300 particular person take a look at circumstances unfold out inside 338 information. Amongst these, roughly 500 take a look at circumstances had been efficiently transformed, executed, and handed. This highlights how efficient AI may be, resulting in a big saving of twenty-two% of developer time. It’s necessary to notice that this 22% time saving represents solely the documented circumstances the place the take a look at case handed. Nevertheless, it’s conceivable that some take a look at circumstances had been transformed correctly, but points corresponding to setup or importing syntax could have triggered the take a look at file to not run in any respect, and time financial savings weren’t accounted for in these situations. This data-centric strategy supplies clear proof of tangible time financial savings, finally affirming the highly effective influence of AI-driven options. It’s value noting that the generated code was manually verified by people earlier than merging into our essential repository, making certain the standard and accuracy of the automated conversion course of whereas maintaining human experience within the loop.

impact chart
Chart with conversion outcomes

As our challenge nears its conclusion in Might 2024, we’re nonetheless within the strategy of amassing knowledge and evaluating our progress. To this point it’s obvious that LLMs supply invaluable assist for the builders’ expertise and have a constructive impact on their productiveness, including one other device to our repertoire. Nevertheless, the shortage of knowledge surrounding code era, Enzyme-to-RTL conversion particularly, means that it’s a extremely complicated subject and AI won’t be capable of be an final device for this type of conversion. Whereas our expertise has been considerably lucky within the respect that the mannequin we used had out-of-the-box capabilities for JavaScript and TypeScript, and we didn’t should do any additional coaching, it’s clear that customized implementations could also be crucial to totally make the most of any LLM potential.

Our customized Enzyme-to-RTL conversion device has confirmed efficient up to now. It has demonstrated dependable efficiency for large-scale migrations, saved frontend builders noticeable time, and obtained constructive suggestions from our customers. This success confirms the worth of our funding into this automation. Wanting forward, we’re wanting to discover automated frontend unit take a look at era, a subject that has generated pleasure and optimism amongst our builders relating to the potential of AI.

Moreover, as a member of the Frontend Check Frameworks workforce, I’d like to precise my gratitude for the collaboration, assist, and dedication of our workforce members. Collectively, we created this conversion pipeline, performed rigorous testing, made immediate enhancements, and contributed distinctive work on the AST codemod, considerably elevating the standard and effectivity of our AI-powered challenge. Moreover, we prolong our because of the Slack DevXP AI workforce for offering an impressive expertise in using our LLM and for patiently addressing all inquiries. Their assist has been instrumental in streamlining our workflows and attaining our improvement objectives. Collectively, these groups exemplify collaboration and innovation, embodying the spirit of excellence inside our Slack engineering group.

Keen on constructing revolutionary initiatives and making builders’ work lives simpler? We’re hiring 💼

Apply now