Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient array serialization #38

Open
alexeyegorov opened this issue Aug 19, 2015 · 12 comments
Open

Efficient array serialization #38

alexeyegorov opened this issue Aug 19, 2015 · 12 comments

Comments

@alexeyegorov
Copy link

Hello,
we've currently considering protocol buffers for serializing the data. But they seem to be slow while deserializing using java and don't have support for int16 (or int8) as Cap'n Proto does. Cap'n Proto documentation says nothing about arrays, but only lists. As our data mostly consists of short-arrays and need a very fast access to them, lists are inappropriate for us.
Can you help us to learn whether Cap'n Proto does support array serialization in java?
Thanks in advance.
Alexey

@dwrensha
Copy link
Member

Cap'n Proto lists are just dynamically-sized arrays. Accessing their elements is fast, though it does need to go through a list pointer. I recommend trying them and measuring whether the performance is good enough for you. If you want to avoid the list-pointer indirection, then you probably are interested in inline lists, a proposed future feature of Cap'n Proto. Although inline lists don't exist yet in Cap'n Proto, you can fake them by manually creating fields for each element of the list, like this:

struct Foo {
    array0 @0 :Int16;
    array1 @1 :Int16;
    array2 @2 :Int16;
    array3 @3 :Int16;
}

Also, you may be interested in reading this discussion from last year.

@alexeyegorov
Copy link
Author

Thanks for your reply! I was first busy on giving msgpack a try. But now I would really love to try capn proto out. Unfortunately I don't get capnpc-java after I installed Capn Proto. Can you maybe give me a hint what I'm doing wrong? I've tried both, installing via downloading the tar and then also checking the git source.

BTW,
our array has something around 55000 elements. So creating it manually is not an option! ;)

@dwrensha
Copy link
Member

Unfortunately I don't get capnpc-java after I installed Capn Proto

capnpc-java depends on capnproto, but is not included with it. Once you've installed capnproto you'll need to build capnpc-java from source by running make at the root directory of the capnproto-java repository. If you try that and you get errors, please post what the errors are.

@alexeyegorov
Copy link
Author

Ok, thanks for that. That was easier than i thought. I was kind of
confused...
Now just another one question: I get it to work now, what is your intended
way to add the capnpoto package to our project? Should I just manually copy
it from the runtime? How is it going to be updated then? all manually?

(sorry for the questions, but I just don't get it. Maven repository would
have been somehow easier.)

2015-08-24 14:29 GMT+02:00 David Renshaw [email protected]:

Unfortunately I don't get capnpc-java after I installed Capn Proto

capnpc-java depends on capnproto, but is not included with it. Once you've
installed capnproto you'll need to build capnpc-java from source by running
make at the root directory of the capnproto-java repository. If you try
that and you get errors, please post what the errors are.


Reply to this email directly or view it on GitHub
#38 (comment)
.

@dwrensha
Copy link
Member

Unfortunately, I know very little about Java packaging and build systems. I know that sbt package will create a Jar, and I have a setup where I've been uploading snapshot releases to OSS Sonatype (see #16), which you can get here: https://oss.sonatype.org/content/repositories/snapshots/org/capnproto/runtime/0.1.0-SNAPSHOT/

@alexeyegorov
Copy link
Author

Oh yeah. That is really nice. Maybe you could provide this information on
the documentation page?

Now, another one question: if I want to send that message by zeromq, I
found this thread here somehow helpful:
http://stackoverflow.com/questions/32041315/how-to-send-capn-proto-message-over-zmq

But I'm not sure if there is such a method for java?!

2015-08-24 15:57 GMT+02:00 David Renshaw [email protected]:

Unfortunately, I know very little about Java packaging and build systems.
I know that sbt package will create a Jar, and I have a setup where I've
been uploading snapshot releases to OSS Sonatype (see #16
#16), which you can
get here:
https://oss.sonatype.org/content/repositories/snapshots/org/capnproto/runtime/0.1.0-SNAPSHOT/


Reply to this email directly or view it on GitHub
#38 (comment)
.

@alexeyegorov
Copy link
Author

I got an array representation using write(BufferedOutputStream, MessageBuilder) and retrieving the ButeArray from it and turning it into byte[]. But for zeromq you need to subscribe to certain filters. Meaning like first byte of the message?! 

Can you help me retrieve this information?

Cheers

On Mon, Aug 24, 2015 at 6:01 PM, Alexey Egorov
[email protected] wrote:

Oh yeah. That is really nice. Maybe you could provide this information on
the documentation page?
Now, another one question: if I want to send that message by zeromq, I
found this thread here somehow helpful:
http://stackoverflow.com/questions/32041315/how-to-send-capn-proto-message-over-zmq
But I'm not sure if there is such a method for java?!
2015-08-24 15:57 GMT+02:00 David Renshaw [email protected]:

Unfortunately, I know very little about Java packaging and build systems.
I know that sbt package will create a Jar, and I have a setup where I've
been uploading snapshot releases to OSS Sonatype (see #16
#16), which you can
get here:
https://oss.sonatype.org/content/repositories/snapshots/org/capnproto/runtime/0.1.0-SNAPSHOT/


Reply to this email directly or view it on GitHub
#38 (comment)
.

@alexeyegorov
Copy link
Author

Ok, I've done it and got the whole stuff running.
I don't know whether it could be interesting to provide such a code as an
example (not that this is a best solution, but it is somehow a way to use
Cap'n Proto with ZeroMQ in Java and would be a nice-to-have explanation on
usage. :)

2015-08-24 18:46 GMT+02:00 Alexey Egorov [email protected]:

I got an array representation using write(BufferedOutputStream,
MessageBuilder) and retrieving the ButeArray from it and turning it into
byte[]. But for zeromq you need to subscribe to certain filters. Meaning
like first byte of the message?!
Can you help me retrieve this information?
Cheers

On Mon, Aug 24, 2015 at 6:01 PM, Alexey Egorov <
[email protected]> wrote:

Oh yeah. That is really nice. Maybe you could provide this information on
the documentation page?

Now, another one question: if I want to send that message by zeromq, I
found this thread here somehow helpful:

http://stackoverflow.com/questions/32041315/how-to-send-capn-proto-message-over-zmq

But I'm not sure if there is such a method for java?!

2015-08-24 15:57 GMT+02:00 David Renshaw [email protected]:

Unfortunately, I know very little about Java packaging and build
systems. I know that sbt package will create a Jar, and I have a setup
where I've been uploading snapshot releases to OSS Sonatype (see #16
#16), which you can
get here:
https://oss.sonatype.org/content/repositories/snapshots/org/capnproto/runtime/0.1.0-SNAPSHOT/


Reply to this email directly or view it on GitHub
#38 (comment)
.

@dwrensha
Copy link
Member

Cool, I'm glad you've got it working. I agree that it could be good to use your code as an example, perhaps in the documentation (the gh-pages branch) or in the examples/ directory. Did you manage to make any measurements with your comparison to msgpack and protobuf? I'd be curious to hear about how well capnproto-java fared.

@alexeyegorov
Copy link
Author

The comparisons seemed well concerning CPU time as I sent the messages without serialization by getting segments and sending them as ByteBuffers. So while we were able to get around 1500 messages/second serialized and put into ZeroMQ with protocol buffers, we achieved around 2300 messages/second with Cap'n Proto. Our message contains some telescope image data, so they are pretty big.
BUT the negative side effect is the size of the messages (of course). I would love to get it to run with the serialization, but if I get it right, SerializePacked uses WritableByteChannel and thus meaning I would have to out FileDescriptor.out (as in the example) that is no-go somehow or I would have to create a new file to which all the serialized output will be written and then read it in to get the byte array to be send through ZeroMQ...
So maybe I'm getting it wrong, but it would be nice to understand how one is able to not use WritableByteChannel or to use it without writing into files or file descriptors... Maybe one can use something like ChannelBuffer from netty? Or generally an opportunity to write into any kind of buffer!?

@dwrensha
Copy link
Member

So maybe I'm getting it wrong, but it would be nice to understand how one is able to not use WritableByteChannel or to use it without writing into files or file descriptors...

Note that org.capnproto.ArrayOutputStream implements WritableByteChannel. However, sounds like that's not exactly what you want, because it requires you to preallocate the whole buffer. So I suppose that you need an implementation of WritableByteChannel that is backed by a growing buffer. Maybe that's even what netty's ChannelBuffer is? In any case, I don't imagine it would be difficult to implement one.

@alexeyegorov
Copy link
Author

ArrayOutputStream is a very good hint! :) I added another class that just can construct a bigger buffer in case the original one is too small, but in general it is fine now. netty's ChannelBuffer could be a good try to test the speed and timing... But for now I'm good with it. 👍
Also adding such an example is a nice idea... at least to know what classes are there to be used for different cases as the example with FileDescriptor is far not my favourite...
I can provide you some plots later on our comparison with protobuf and msgpack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants