Coders spent more time prompting and reviewing AI generations than they saved on coding. On the surface, METR’s results seem to contradict other benchmarks and experiments that demonstrate increases in coding efficiency when AI tools are used. But those often also measure productivity in terms of total lines of code or the number of discrete tasks/code commits/pull requests completed, all of which can be poor proxies for actual coding efficiency. These factors lead the researchers to conclude that current AI coding tools may be particularly ill-suited to “settings with very high quality standards, or with many implicit requirements (e.g., relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn.” While those factors may not apply in “many realistic, economically relevant settings” involving simpler code bases, they could limit the impact of AI tools in this study and similar real-world situations.

  • 1984@lemmy.today
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    edit-2
    4 天前

    Sounds reasonable. The time and energy ive lost on trying very confident chat gpt suggestions that doesnt work must be weeks at this point.

    Sometimes its very good though and really helps, which is why its so frustrating. You never know if its going to work before you go through the process.

    It has changed how me and coworkers work now also. We just talk to chat gpt instead of even trying to look something up in the docs and trying to understand it. Too slow to do that now, it feels like. There is a pressure to solve anything quickly now that chat gpt exists.

    • Alex@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      3 天前

      You have to ignore the obsequious optimism bias LLM’s often have. It all comes down to their training set and if they have seen more than you have.

      I don’t generally use them on projects I’m already familiar with unless it’s for fairly boring repetitive work that would be fiddly with search and replace, e.g. extract the common code out of these functions and refactor.

      When working with unfamiliar code they can have an edge so if I needed a simple mobile app I’d probably give the LLM a go and then tidy up the code once it’s working.

      At most I’ll give it 2 or 3 attempts to correct the original approach before I walk away and try something else. If it starts making up functions it APIs that don’t exist that is usually a sign out didn’t know so time to cut your losses and move on.

      Their real strengths come in when it comes to digesting large amounts of text and sumerising. Great for saving you reading all the documentation on a project just to try a small thing. But if your going to work on the project going forward your going to want to invest that training data yourself.