dkurok

throttle in API

Recommended Posts

Is there a way to get rid of the throttle in API-calls (Status code 429) or get it significantly higher.

Actually there is a limit of 2 calls / second. I'm working on a (private; non-commercial) inventory-solution in C#/WPF and I have a database with ~50.000 parts for which I want to update part-information by using RB-API (I also write back my infomration into my RB-account). With a limit of 2 calls/second and handling the 429-status it will take ~7 hours. I think without throttle I could do it in ~2 hours.

In the API-documentation it is stated:

Quote

Normal user accounts are allowed to send on average 2 requests/sec, with some small allowance for burst traffic.

What is the meaning of "normal user"? How can I become a non-normal user?

Best regards

Dietmar

Share this post


Link to post
Share on other sites

Well, you could always download the CSV files instead of calling the DB for each part.  That should get most of them, then call the API for each part that is left.

Share this post


Link to post
Share on other sites

I do exactly that in first place; but I also need the Bricklink-IDs, thumbnail_urls, molds, printed,... and so on and so I've got to call API on every part (i own) again (even the content from get /api/v3/lego/parts don't give back the needed fields).

The download-files only have basic data...

 

Share this post


Link to post
Share on other sites

I can't reduce the throttle, the number of calls via API is starting to become a real problem. I will need to move it to it's own dedicated hardware soon.

I will look into being able to make bulk calls or something instead.

Share this post


Link to post
Share on other sites

Hi Nathan,

thank you for explanation! Is the number of calls the bigger problem than the calls with big results?

So as a (real) example:

What is better for your API's infrastructure:

A) 28 calls to /api/v3/lego/parts for getting all the actual 27651 parts (with 1000 parts returned in each call) OR

B) ~5000 calls to /api/v3/lego/parts/{part_num} each receiving information for one part?

If option A) is better for the infrastructure / performance / ..., could you just add year-from, year_to,  [prints}, [molds] and [alternates] of the parts into the results-object of each part returned by /api/v3/lego/parts? This would reduce my number of calls dramatically...

Many thanks

Dietmar

Share this post


Link to post
Share on other sites

it's a database adding more conditions will slow it down. so 28 calls is better then 5000 calls even if each generate a big chunk of data.

Share this post


Link to post
Share on other sites

what you can do is cache the data on your end and only query again after a certain time unless you know it has changed, and only query for what you need. ie if you intend to only show the first 40 items then don't get all 28000, but perhaps the first batch of a 1000, think both of your own performance and that of the API. and you will help Nathan in the process.

Share this post


Link to post
Share on other sites

Hi biodreamer,

thank you for the explanation! I'm just working on that aspects. I fetch data and cache them in my local DB, but from time to time I've got to consolidate and this is what I actually work on. So knowing that number of calls is more of a problem than "big" chunks on backend-side it good to know.

BTW: It seems that pagesize greater 1000 in /api/v3/lego/parts are ignored. Otherwise I could fetch all parts in just two calls: first to get the first 1000 and the count; second to get all in one call using the count of the first call as the pagesize for the second call. But this does not work.... Is that by intention?

Share this post


Link to post
Share on other sites

Hi biodreamer,

just a question about your explanation / clarification:

Are you involved in the development of the Rebrickable-API? Or in other words: Do you KNOW for sure that the 28 calls with big chunks are better than the 5000 small calls? In terms of general thoughts I would agree, but the answer for a concrete system like RB depends on a lot of factors (like available bandwith, number of concurrent connections, DB-cache, memory, CPUs,...). So I can imagine a system where the answer would be the oposite (for example good scaling DB with huge cache having all queries for concrete part in cache but small bandwith to the outside world).

 

Share this post


Link to post
Share on other sites
22 hours ago, dkurok said:

The download-files only have basic data.

Hi, dkurok, good to meet!

I have just talked with Nathan about this - I too would love to have more detail in the download-files (not for application development, but for ad-hoc local queries and such), and he agrees that data items that are available through the API should also be available in the download-files. Not sure about his schedule, but a more detailed download is therefor in the pipe-line. Thought you guys should know.

Take care,

Simon

Share this post


Link to post
Share on other sites

Hi Simon,

yes, some more information in download-files which are updated on a regular base (should be determined by the frquency of part/set-updates and alike by Nathan) would be very nice.

Best regards

Dietmar

 

Share this post


Link to post
Share on other sites
12 hours ago, dkurok said:

Hi biodreamer,

just a question about your explanation / clarification:

Are you involved in the development of the Rebrickable-API? Or in other words: Do you KNOW for sure that the 28 calls with big chunks are better than the 5000 small calls?

Just looking with what is returned with a call, you are not only getting the data you requested,  but also a JSON Header,  Now consider this,  do you want 5000 chunks of text you will do nothing with, or just 28?

I know what I would like.

Share this post


Link to post
Share on other sites

Fewer calls that cover multiple items can query the database more efficiently, reduce the request overheads and network latency of multiple requests. However, each request runs longer which is dangerous for concurrency. It would be much easier for a single user to ruin it for everyone else.

I have some ideas, if I'm lucky I might get some of it done this weekend.

Share this post


Link to post
Share on other sites
21 hours ago, dkurok said:

BTW: It seems that pagesize greater 1000 in /api/v3/lego/parts are ignored. Otherwise I could fetch all parts in just two calls: first to get the first 1000 and the count; second to get all in one call using the count of the first call as the pagesize for the second call. But this does not work.... Is that by intention?

Yes - it's capped at 1000. I'd like to make it smaller in future so don't rely on that number.

Actually, I think I'll reduce the default to 100 but leave the max at 1000.

Share this post


Link to post
Share on other sites
On 3/9/2018 at 8:49 AM, dkurok said:

Hi biodreamer,

thank you for the explanation! I'm just working on that aspects. I fetch data and cache them in my local DB, but from time to time I've got to consolidate and this is what I actually work on. So knowing that number of calls is more of a problem than "big" chunks on backend-side it good to know.

No I am not but it's how things works. it's kind of independent of hardware. hardware only increase speed and capacity. it does not remove the fundamental rules. if you need to get all entries a few easy large one will take less resource then multiple advanced queries that still boil down to same result. the process time of those 5000 calls will always be more then those 28. but if it can handle a lot of connection you might get the result faster by splitting it but it won't take less server resources ever, only more.

Share this post


Link to post
Share on other sites

I've made a few changes:

  • Changed the default page size to 100, but can still be overridden to 1000.
  • Added an optional inc_part_details parameter to endpoints that return lists of parts, to make it return all the same fields as the /lego/parts/XXX/ lookup does.
  • Added a part_nums filter parameter to /lego/parts/ which takes a comma separated list of part_num values to restrict the results.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now