Skip to Main Content
Apache Events The Apache Software Foundation
Apache 20th Anniversary Logo

ASF Generative Tooling Guidance

Version: 1.0

Can contributions to ASF projects include AI generated content?

The Apache-2.0 license, and the Apache Individual Contribution License Agreement, both remind contributors that they are responsible for disclosing any copyrighted materials in submitted contributions that are not their original creation. This is as true when using generative AI tooling, as it is when using materials from public websites or code from other open source projects.

When disclosing these materials, contributors should also identify the licensing for these materials. The ASF maintains a 3rd Party Licensing Policy that provides guidance on which licenses are acceptable, along with instructions on the treatment of 3rd Party Works.

While in general, content generated by a non-human (e.g., machine or monkey) is not copyrightable, if content consists of some portions generated by AI and other portions authored by a human, the portions authored by a human may be copyrightable.

As explained by the following U.S. Copyright Office Registration Guidance (3/16/2023):

“For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are ‘independent of’ and do ‘not affect’ the copyright status of the AI-generated material itself.”

These portions authored by a human may simply come from the prompt the human provided or subsequent changes they make. However, a prominent concern with generative AI is the risk of reproducing portions of materials that they were trained on, some of which may be copyrightable subject matter. Thus, a recommended practice when using generative AI tooling is to use tools with features that identify any included content that is similar to parts of the tool’s training data, as well as the license of that content.

Given the above, code generated in whole or in part using AI can be contributed if the contributor ensures that:

  1. The terms and conditions of the generative AI tool do not place any restrictions on use of the output that would be inconsistent with the Open Source Definition (e.g., ChatGPT’s terms are inconsistent).
  2. At least one of the following conditions is met:
    1. The output is not copyrightable subject matter (and would not be even if produced by a human)
    2. No third party materials are included in the output
    3. Any third party materials that are included in the output are being used with permission (e.g., under a compatible open source license) of the third party copyright holders and in compliance with the applicable license terms
  3. A contributor obtain reasonable certainty that conditions 2.2 or 2.3 are met if the AI tool itself provides sufficient information about materials that may have been copied, or from code scanning results
    1. E.g. AWS CodeWhisperer recently added a feature that provides notice and attribution

When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by: ”. This allows for future release tooling to be considered that pulls this content into a machine parsable Tooling-Provenance file.

Finally, please note that while the above seems like a reasonable set of guidelines in June 2023, this is a rapidly evolving area. Whatever we recommend to PMCs today, policies will need to be re-evaluated and updated in response to:

We will continue communicating with PMC and ASF members as updates to this FAQ get discussed and merged in.

What about Documentation?

The above text should apply for documentation as well. However, the most popular tooling for documentation, ChatGPT, has restrictive licensing, so caution should be applied.

What about Images?

As with documentation, the above principles would still apply. Though with images being a non textual form, the details quickly become complex. We expect this to continue to be a rapidly evolving area.

What do we do if a contribution includes AI generated content and some form of tooling has identified materials that have been copied?

Refer to the 3rd Party Licensing Policy as with any other contribution.