Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent/wrong list arg parsing #239

Open
purpleidea opened this issue Mar 2, 2024 · 2 comments
Open

Inconsistent/wrong list arg parsing #239

purpleidea opened this issue Mar 2, 2024 · 2 comments

Comments

@purpleidea
Copy link
Contributor

It seems the parsing of lists is done differently than what is common in some other parsing libraries like urfave/cli. I detected this because some tests in https://github.com/purpleidea/mgmt/ started failing after a direct port to this lib. I investigated the behaviour and I believe what we are doing isn't optimal or correct. Minimally I think it's inconsistent or surprising.

Overall the library is great though! I'm happy to find such issues and report.

TL;DR: extra positional args are "chomped" up when we have a []string arg or similar. For example:

Note how sub is part of the list:

$ go run parsingbug.go --some-list list1 sub input
args: {SomeInt:<nil> SomeList:[list1 sub input] Sub:<nil>}
args.SomeList(3): [list1 sub input]

If we add the equals symbol this works around this:

$ go run parsingbug.go --some-list=list1 sub input
args: {SomeInt:<nil> SomeList:[list1] Sub:0xc0000667b0}
args.Sub: {Input:input}
args.SomeList(1): [list1]

Strange things happen with two list args. Where did list1 go?

$ go run parsingbug.go --some-list list1 --some-list list2 sub input
args: {SomeInt:<nil> SomeList:[list2 sub input] Sub:<nil>}
args.SomeList(3): [list2 sub input]

Three specifiers gives something perhaps inconsistent if the last one has an equals:

$ go run parsingbug.go --some-list list1 --some-list list2 --some-list=list3 sub input
args: {SomeInt:<nil> SomeList:[list3] Sub:0xc000068830}
args.Sub: {Input:input}
args.SomeList(1): [list3]

The below code for this works all by itself:

package main

import (
	"fmt"

	"github.com/alexflint/go-arg"
)

var args struct {
	//Argv0 string `arg:"positional"`

	SomeInt *int `arg:"-i" arg:"--int" help:"some integer"`

	SomeList []string `arg:"--some-list"`

	Sub *cmd `arg:"subcommand:sub"`
}

type cmd struct {
	Input string `arg:"positional"`
}

func main() {
	arg.MustParse(&args)

	fmt.Printf("args: %+v\n", args)
	if sub := args.Sub; sub != nil {
		fmt.Printf("args.Sub: %+v\n", *sub)
	}
	fmt.Printf("args.SomeList(%d): %+v\n", len(args.SomeList), args.SomeList)
}

I'll be happy to test and then convert this into a PR of test cases if someone wants to prepare a patch.

Cheers!

@alexflint
Copy link
Owner

Thanks for these James. Here are my thoughts:

First case

go run parsingbug.go --some-list list1 sub input

At the moment, list arguments without equals signs consume all remaining tokens until encountering one that starts with a hyphen. So in your first example, yes --some-list list1 sub input is handled by creating a slice of strings []{"list1", "sub", "input"}. Now in the case that you have here, the string "sub" is one of the subcommands, so there is the question of whether we should treat it as a subcommand or as an element of the list. When I originally put this together, I considered this case and decided that it wouldn't be good to treat "sub" here as a subcommand. My thought was that if you're doing something like this:

do_something | xargs ./myprogram --some-list

then it could be confusing if there are certain tokens (e.g. the string "sub" in your example) that could be generated by "do_something" and would change the way the arguments are parsed. Now it's already the case that tokens that start with a hyphen do change the way the arguments are parsed (they terminate the consumption of tokens), but that at least doesn't depend on the particularities of the args struct being parsed into, but rather is consistent no matter what your args struct looks like.

If you want to put a list of tokens on the command line and then continue with more tokens afterwards, then it seems to me that it would be best to use the equals sign, as in ./some-list=aa,bb,cc as per your second example.

Second case

go run parsingbug.go --some-list=list1 sub input

In the case that you use an equals sign with a list argument, as in go run parsingbug.go --some-list=list1 sub input I think it makes most sense to treat the value on the right side of the equals sign as the entire list, and not consume any more tokens. So in this case, the use of the equals sign completely changes the way that parsing takes place, but I have the sense that someone reading a command like go run parsingbug.go --some-list=list1 sub input would intuitively understand that --some-list=list1 describes the entire contents of the list, and that what follows next will not be part of --some-list.

Third case

go run parsingbug.go --some-list list1 --some-list list2 sub input

In this case, go-arg encounters two values for --some-list. The first value it encounters is []string{"list1"}, and the second value it encounters is []string{"list2", "sub", "input"}. In the library as it stands, you can repeat command line arguments and the last value will overwrite all previous values. You can do this with non-list arguments too, as in:

./myprogram --value 1 --value 2 --value 3

The above will end up with "3" parsed into the struct field corresponding to --value. Perhaps it should be an error to repeat the same argument multiple times on the command line.

Fourth case

go run parsingbug.go --some-list list1 --some-list list2 --some-list=list3 sub input

The reason you're getting the behavior that you are in this case is due to the same "overwrite" behavior that is behind the previous case. The library finds three separate values for --some-list, first []string{"list1"}, then []string{"list2"}, then []string{"list3"} (the last one does not include "sub" or "input" due to the equals sign). Each of these values overwrites the previous values, and at the end you see the value is []string{"list3"}.

Options for v2

What if anything should we do to avoid this complexity in v2? One possibility would be to remove support for space-separated list arguments. This would reduce complexity a lot but space-separated list arguments are widely used on the command line and they are nice.

A second possibility would be to throw an error if the same option appears twice on the command line. We could have a struct tag that explicitly re-enables this behavior if someone really wants it. This could make the whole situation less confusing.

We could add documentation about all of these cases so that at least people can get a good explanation of how it all works.

We could also make it so that subcommands terminate the consumption of tokens going into a list argument, but I still don't like this option.

It seems to me the best approach for v2 would be to throw an error when an option appears twice, and then give some really good documentation about how subcommands, equals signs, and list arguments interact together, with a nice table of examples.

@purpleidea
Copy link
Contributor Author

Thank you for your dedicated and thoughtful reply.

space-separated list arguments are widely used on the command line and they are nice.

Assuming I'm parsing this correctly, and you mean things like --somelist a b c, then where are they used? I can't recall where I would have seen this, although I might be living under a rock!

A second possibility would be to throw an error if the same option appears twice on the command line.

I think this would be a great idea. Either that or allowing --foo a --foo b --foo c to produce a list of three elements. You've at a minimum convinced me that the behaviour could be configurable with one your really clever struct tags!

PS: if you're curious, here is my WIP branch porting to go-arg. It will likely get merged as-is, unless of course you'd like to comment. purpleidea/mgmt@54456ff

Of interesting note, here is a workaround I added to help users who might accidentally hit this issue: purpleidea/mgmt@54456ff#diff-e59db0c8589637e59150ffb46afed207139f3273d37db6fa95260914254450d7R82

The same workaround for a more complicated sub-command is shown here: purpleidea/mgmt@54456ff#diff-d8b7e73d77fa79e1493c5173dbf6914d4048b87c83ae9edbb97ca3bd5593bc47R124

Thanks again for the nice library. You've got me motivated that CLI libraries are cool. If I was using regexp for anything I'd want to use your clever regexp library too!

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants