Skip to content

Conversation

@IvanIhnatsiuk
Copy link
Contributor

@IvanIhnatsiuk IvanIhnatsiuk commented Dec 13, 2025

Summary

This PR has two main ideas

  1. Improve the performance of HTML serialization
  2. Make serializer generic. If we add a new style, it should be automatically handled by the HTML parser.

Current situation

The existing HTML serialization logic is quite difficult to follow and maintain.
During serialization we need to:

  • track previous and current styles,
  • manually manage which parameters should become HTML attributes,
  • hardcode tag names for each style,
  • and handle paragraph and inline styles together.

All of this makes the processing complex and error-prone.


Proposed approach

The main idea is to move style knowledge into style classes and build an intermediate HTML node tree, which is then serialized in a clean, deterministic way.


1. Style definitions

Each style class is responsible for describing how it maps to HTML.

Every style must implement:

  • tagName
  • attributeKey
  • subTagName
  • isSelfClosing
  • isParagraphStyle

Styles that require parameters (e.g. color, mention params, href, etc.) additionally conform to ParameterizedStyleProtocol:

@protocol ParameterizedStyleProtocol <NSObject>
+ (NSDictionary *_Nullable)getParametersFromValue:(id)value;
@end

This removes the need for:

  • hardcoded tag names in the serializer,
  • manual attribute handling
  • special-case logic scattered across the parser.

Because styles now fully describe themselves, we can separate paragraph styles and inline styles upfront, which avoids unnecessary checks during traversal.


2. HTML element tree

Instead of writing HTML directly while walking the attributed string, we first build a tree of HTML nodes.

There are two node types:

HTMLElementNode

{
  tag: NSString,
  attributes: NSDictionary<NSString *, NSString *> | nil,
  children: NSArray<HTMLNode *>,
  selfClosing: BOOL
}

HTMLTextNode

{
  source: NSString,   // text
  range: NSRange      // substring range
}

This gives us a clear, inspectable intermediate representation before serialization.


3. Converting an attributed string into nodes

Step 1: Root node

  • Create a root <html> node with an empty children array.

Step 2: Paragraph enumeration

We enumerate paragraphs of the attributed string:

  • Empty paragraph
    → add a <br /> node (self-closing)

  • Non-empty paragraph

    • Read attributes from the first character
    • Evaluate paragraph styles using styleCondition

Checking only the first character is sufficient because mixed paragraph types (e.g. H1 + UL in the same paragraph) are not supported.

Note:
If we ever want to support mixed paragraph styles, we can evaluate the entire range and emit paragraph nodes in sequence.


Step 3: Inline attribute processing

For each paragraph, we enumerate attribute runs:

  1. Create an HTMLTextNode with:

    • the full plain text
    • the run range
  2. Traverse all inline styles and wrap the text node when a style applies:

for (inlineStyle in inlineStyles) {
  if (styleCondition(attribute, range)) {
    // Wrap current node in an HTMLElementNode
    // Tag name and selfClosing flag come from the style
  }
}

Example:
For the text Hello world, this results in:

  • "Hello"<strong>TextNode</strong>
  • "world"<em>TextNode</em>

Each attribute run produces its own small inline subtree.


4. Building HTML from the node tree

Once the tree is built, HTML generation is straightforward and isolated.

The serializer performs a depth-first recursive traversal:

visit(node):
  if node is self-closing:
    close tag and stop
  else:
  write opening tag with attributes if present
    for each child:
      visit(child)
    write closing tag

Key properties

  • No style logic during serialization
  • Structure and formatting are cleanly separated
  • Output HTML directly reflects the node tree
  • Easy to reason about, debug, and extend

Summary

This approach:

  • simplifies the serializer,
  • moves HTML knowledge into style definitions,
  • removes stateful style tracking,
  • introduces a clear intermediate representation,
  • and makes the whole pipeline easier to maintain and extend.

Benchmarks

HTML Size Old Parser Time New Parser Time Speedup Notes
7,687 html value characters 34 ms 1 ms 34× faster Style condition lookup took 1ms for the new implementaion.
199,180 html value characters 1.9 s 20 ms 95× faster Large-document performance improved drastically due to single-pass parsing, reduced attribute enumeration, and minimized allocations.

New style example

To add a new style, we just need to specify

@interface HighlightStyle : NSObject <BaseStyleProtocol>
@end

@implementation HighlightStyle

+ (StyleType)getStyleType {
    return Highlight;
}

// Inline style → not a paragraph style
+ (BOOL)isParagraphStyle {
    return NO;
}

// The HTML tag name this style generates
+ (const char *)tagName {
    return "highlight";
}

// Attribute we look for in NSAttributedString
+ (NSAttributedStringKey)attributeKey {
    return @"CustomHighlightAttribute";
}

// Whether `<highlight/>` or `<highlight>...</highlight>` 
+ (BOOL)isSelfClosing {
    return NO;
}

// Optional: parameters passed as HTML attributes, if parameterized, it should implement this method
+ (NSDictionary *)getParametersFromValue:(id)value {
    return @{ @"color": (NSString *)value };
}

// Style detection logic
- (BOOL)styleCondition:(id)value range:(NSRange)range {
    return value != nil;
}

@end

Test Plan

  1. Test all possbile html combinations on iOS

Provide clear steps so another contributor can reproduce the behavior or verify the feature works.
For example:

  • Steps to reproduce the bug (if this is a bug fix)
  • Steps to verify the new feature
  • Expected vs actual results
  • Any special conditions or edge cases to test

Screenshots / Videos

Screen.Recording.2025-12-13.at.23.06.56.mov

Include any visual proof that helps reviewers understand the change — UI updates, bug reproduction or the result of the fix.

Compatibility

OS Implemented
iOS ✅❌
Android ✅❌

@IvanIhnatsiuk IvanIhnatsiuk changed the title feat: fast html parser feat: fast attributed string to html serializer Dec 15, 2025
@IvanIhnatsiuk IvanIhnatsiuk changed the title feat: fast attributed string to html serializer feat(iOS): attributed string to html serializer Dec 15, 2025
@IvanIhnatsiuk
Copy link
Contributor Author

IvanIhnatsiuk commented Dec 15, 2025

@szydlovsky could you please review this one?

@szydlovsky
Copy link
Collaborator

Hey @IvanIhnatsiuk we're having quite a busy week. Will take a look when we're offloaded.

@exploIF
Copy link
Collaborator

exploIF commented Dec 16, 2025

We have to think about merging strategy. PR is quite big and it affects already working part of the app. Ideally we should have tests which verifies if there is no regression

@IvanIhnatsiuk
Copy link
Contributor Author

IvanIhnatsiuk commented Dec 16, 2025

@exploIF , if I don't have enough time to write unit tests for this. If you could write them, it would be great. What I can suggest at the moment is to add a feature flag, for example:
useIOSExperementalFastHtmlSerializer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants