PublishedTags

World domination, part one

Intro

Some time ago, I was able to merge in a huge refactoring effort into the next major version of Fantomas.
The result of these changes make Fantomas at least twice as fast as the v4 release.

Before

BenchmarkDotNet=v0.13.1, OS=ubuntu 20.04
Intel Xeon Platinum 8171M CPU 2.60GHz, 1 CPU, 2 logical and 2 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT DEBUG
  DefaultJob : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT


| Method |    Mean |    Error |   StdDev | Rank |      Gen 0 |      Gen 1 |     Gen 2 | Allocated |
|------- |--------:|---------:|---------:|-----:|-----------:|-----------:|----------:|----------:|
| Format | 2.434 s | 0.0415 s | 0.0388 s |    1 | 92000.0000 | 33000.0000 | 2000.0000 |      2 GB |

After

BenchmarkDotNet=v0.13.1, OS=ubuntu 20.04
Intel Xeon Platinum 8171M CPU 2.60GHz, 1 CPU, 2 logical and 2 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT DEBUG
  DefaultJob : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT


| Method |     Mean |    Error |   StdDev | Rank |      Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |---------:|---------:|---------:|-----:|-----------:|----------:|----------:|----------:|
| Format | 715.2 ms | 13.48 ms | 18.46 ms |    1 | 12000.0000 | 4000.0000 | 1000.0000 |    202 MB |

In this blogpost, I’ll elaborate a bit how we did this and what you can expect from the V5 release.

F# eXchange 2021

Last October, I had the opportunity to speak at F# eXchange. There I announced that what the plan was for the next major of Fantomas and how to get there. A crucial part of that talk was about how improving the Syntax tree at dotnet/fsharp was the key to everything. In short, a better syntax tree, leads to fewer shenanigans in the Fantomas codebase.

In short, a better syntax tree, leads to fewer shenanigans in the Fantomas codebase.

Trivia

A significant part of improving the syntax tree was the introduction of Trivia. Trivia has a bit of dual meaning:

  • In Fantomas, we use it as a denominator for additional information that was present in the source code, but not captured in the AST. For example code comments or newlines.
  • In the F# compiler, we use it as a denominator for addition information about syntax, that the compiler doesn’t need to compile to binary. For example the range of the let keyword in a binding.

Having additional trivia information in the syntax tree is of immense value to Fantomas.

Keywords

Having more information about keywords allows us to better release what the user originally wrote. If we take a look at the AST for

type MyType =
    abstract One : unit -> unit
    abstract member Two: unit -> unit

we get something along the lines of

Types
 ([SynTypeDefn
     (SynComponentInfo
        ([], None, [], [MyType],
         PreXmlDoc ((1,0), FSharp.Compiler.Xml.XmlDocCollector),
         false, None, tmp.fsx (1,5--1,11)),
      ObjectModel
        (Unspecified,
         [AbstractSlot
            (SynValSig
               ([], SynIdent (One, None), ...
             { IsInstance = true
               IsDispatchSlot = true
               IsOverrideOrExplicitImpl = false
               IsFinal = false
               MemberKind = Member
               Trivia = { MemberRange = None
                          OverrideRange = None
                          AbstractRange = Some tmp.fsx (2,4--2,12)
                          StaticRange = None
                          DefaultRange = None } },
             tmp.fsx (2,4--2,31));
          AbstractSlot
            (SynValSig
               ([], SynIdent (Two, None), ...
             { IsInstance = true
               IsDispatchSlot = true
               IsOverrideOrExplicitImpl = false
               IsFinal = false
               MemberKind = Member
               Trivia = { MemberRange = Some tmp.fsx (3,13--3,19)
                          OverrideRange = None
                          AbstractRange = Some tmp.fsx (3,4--3,12)
                          StaticRange = None
                          DefaultRange = None } }

We can see in the Trivia of SynMemberFlags what exact keywords were used for One and Two. Because of this new information, we can restore exactly what was written.

Identifier

Consider

let (+) a b = a + b + 1

The compiled function name the F# compiler will use in the typed tree is op_Addition. We can determine this from tracing it back to the + operator, however, there is no mentioning of the parentheses. Luckily the AST looks like:

SynLongIdent(
    [op_Addition], 
    [], 
    [Some (OriginalNotationWithParen(tmp.fsx (1,4--1,5), "+" ]
)

Because of OriginalNotationWithParen we can very efficiently restore the function name as (+).

This is very significant because beforehand we needed to check every identifier within a file to determine if it is an operator or not. Identifiers are all over the place so this really is an immense performance boost.

No Tokens, No Cry

Talking about Trivia, not everything was present in the syntax tree. Originally, we also processed the tokens of each file. Inside these tokens were addition clues to what the user actually wrote. We needed to find these clues (trivia) and link them to the actual syntax tree nodes (trivia nodes). This was a tedious operation.

Though we still have the concept of trivia, we don’t need to process the tokens anymore to detect the missing pieces. The syntax tree carries enough information for us to extract the remaining trivia from source code.

For example the ranges of any code comments are now part of the syntax tree:

let v = 42 // some comment

the matching tree

ImplFile
  (ParsedImplFileInput
     ("tmp.fsx", true, QualifiedNameOfFile Tmp$fsx, [], [],
      [SynModuleOrNamespace
         ([Tmp], false, AnonModule,
          [Let
             (false,
              [SynBinding
                 (None, Normal, false, false, [],
                  PreXmlDoc ((1,0), FSharp.Compiler.Xml.XmlDocCollector),
                  SynValData
                    (None, SynValInfo ([], SynArgInfo ([], false, None)), None),
                  Named (SynIdent (v, None), false, None, tmp.fsx (1,4--1,5)),
                  None, Const (Int32 42, tmp.fsx (1,8--1,10)),
                  tmp.fsx (1,4--1,5), Yes tmp.fsx (1,0--1,10),
                  { LetKeyword = Some tmp.fsx (1,0--1,3)
                    EqualsRange = Some tmp.fsx (1,6--1,7) })],
              tmp.fsx (1,0--1,10))], PreXmlDocEmpty, [], None,
          tmp.fsx (1,0--1,26), { ModuleKeyword = None
                                 NamespaceKeyword = None })], (false, false),
      { ConditionalDirectives = []
        CodeComments = [LineComment tmp.fsx (1,11--1,26)] }))

The example above will contain the range of // some comment. This doesn’t tell us yet that it belong to v, but it is a step in the right direction. We don’t need to process any tokens to learn about the existence of the comment. And that’s a good thing!

Fullmetal Alchemist

All these changes on the compiler side at dotnet/fsharp are shipped as the FSharp.Compiler.Service NuGet package. The release schedule of these packages is a bit of mystery and appears to be tied to the .NET SDK releases. I started looking for a way we could have these changes faster and we are now creating our own Fantomas flavoured FSharp.Compiler.Service package.

I wrote some prose on the technical details and I invite you to read it. But the gist is that we take the files we need from dotnet/fsharp at a known commit pointer and expose the parser tailored to the needs of Fantomas. End-users don’t need to worry about this, as this is all happening under the hood.

Code generation

If you are using Fantomas to generate code, that is still possible. Fantomas.FCS has exactly the same namespaces as the FSharp.Compiler.Service has, so migrating should be doable.

Fantomas Five

Besides the performance there are some other topics planned for the next major release.

fantomas-tool -> fantomas

We renamed the .NET tool from fantomas-tool to fantomas. From now on, you can just install it using dotnet tool install fantomas --prerelease.

dotnet tool install fantomas —prerelease

We posted a poll for a logo contest. For the next major release, we want to work on the Fantomas branding as well. A first step here was looking for a new logo:

New Fantomas logo

Get Back

Having a faster formatter won’t necessarily bring you in if you are not using Fantomas today. Even though Fantomas follows the F# style guide, some people still don’t use it.

At the end of last year, I had a change of heart about Stroustrup bracket style. The re-opening of this issue was well received by the community:

Community feedback

I prefer an opinionated view on the style of things, yet at the same time I want to be open to feedback of the community.

Ragnarok

Before re-opening that issue, I made an initial proof of concept and got something working.

Later Josh DeGraw ported the code and you can now activate it by adding:

[*.fs]
fsharp_multiline_block_brackets_on_same_column=true
fsharp_experimental_stroustrup_style=true

Thank you Josh!

It is a small step for Fantomas, but a giant leap for the F# community

It has been around since 5.0.0-alpha-001 (March 19th 2022), yet I haven’t really received any feedback on this. This is a known problem when developing any software, and one I’ve seen in Fantomas: people will only try new features once they are considered stable.

Please try this out and participate on GitHub!!

There are a lot of open technical and philosophical questions regarding this topic, so if this matters to you, please help to push this forward!

Elmish no more

Beware the vanity alignment PD

The Fantomas default settings do not respect the F# style guides when you have a function application that take a list (or two lists) as its last argument. In the common tongue, this often reflects to an Elmish DSL.

let v =
    Input.input [ Input.Custom [ Placeholder placeholder
                                 OnChange(fun ev -> ev.Value |> onChange)
                                 DefaultValue value
                                 Key key ] ]

Though this looks okay-ish when you are coding something with Fable, it doesn’t make sense for other things like:

let sorted =
    List.sortDescending [ "Alpha"
                          "Beta"
                          "Gamma"
                          "Delta"
                          "Epsilon" ]

And if you change the List.sort to List.sortDescending, all the items of the list will jump around. This is known as the dreaded vanity alignment problem, where the name of the identifier influences the positioning of the remainder of the expression.

The style guide would suggest to format this as:

let v =
    Input.input
        [ Input.Custom
              [ Placeholder placeholder
                OnChange(fun ev -> ev.Value |> onChange)
                DefaultValue value
                Key key ] ]

let sorted =
    List.sortDescending
        [ "Alpha"
          "Beta"
          "Gamma"
          "Delta"
          "Epsilon" ]

In the 4.x series, you were able to do this using the setting fsharp_disable_elmish_syntax=true, but it was unfortunate this wasn’t the default behavior. With a major release, we can address these things.

Expanding on lists at the end

The Elmish-like shapes that did not follow the style guide were fairly restricted. If a small detail was altered, the shape would not match anymore and so there would have been a difference between:

// matches the AST shape of a function application with two lists
div [] [
    // some comment
    p [] [ str "x" ]
]

// does not match the AST shape of a function application with two lists
// because of that extra string argument
div
    "some string"
    []
    [
      // some comment
      p [] [ str "y" ] ]

This had its limitations and we want to revisit this as well. We want to get rid of all the Elmish specific settings (fsharp_max_elmish_width, fsharp_single_argument_web_mode, fsharp_disable_elmish_syntax) and instead consider this as a part of the Stroustrup setting.

Maybe something like:

// list at the end
div [] [
    // some comment
    p [] [ str "x" ]
]

// also list at the end
div "some string" [] [
    // some comment
    p [] [ str "y" ]
]

Jimmy Byrd started working on this in PR 2200.
Thank you Jimmy!

Closing thoughts

The changes are significant in version five and I believe that the best is yet to come. Please try this one out, report issues using our online tool and let us know on Discord how thing are going.

Cheers,

Florian

Photo by David Bruyndonckx on Unsplash