What is the fastest way to lookup a large number of values using R? -


I have a list of more than 1,000,000 numbers. I have a lookup table that has numbers and a range. For example, 0-200 Category A, 201-650 is Category B (ranges are not of the same length)

I need to just recycle on the list of 1,000,000 numbers and get a list of 1,000,000 related Categories

Edit:

For example, there are some elements of my list - 100, 125.5, 807.5, 345.2, and it should return categories As for logging like some 1,1,8,4, logic has been implemented in a function - categoryLookup (CD) and I am using the following command to get categories

  Cats & lt; - sapply (List.cd, categoryLookup)  

However, as long as it works up to 10000 on the size lists, it is taking a lot of time for the complete list.

What is the fastest way to do this? Is there any indexing that can help speed up the process?

number:

  numbers < - Sample (1: 1000000)  

Groups:

  Groups & lt; - Sort (Representative (alphabet, 40000))  

Lookup:

  Categories & lt; - Groups [numbers]  

Edit:

If you do not have a vector of "groups", you can create it first.

Assume that you have information on the information limit:

  Ranges & lt; - data.fr (group = c ("a", "b", "c"), start = c (0,300001,600001), end = c (300000,600000,1000000) 1a1 3 e + 05 2b 300001 6A + 05 3C600001 1A +06 # If groups are sorted and do not overlap: Group & lt; - Representative ($ group, ($ $ limit starts $ limit) +1)  

Continue again before

  categories < - Group [Numbers]  

Edit: As @ Jabau MS said - in this case you +1 ($ $ $ end-limits $ start) ). (Already edited in the example above). Additionally, your initial coordinate should not be 1 and 0


Comments

Popular posts from this blog

mysql - How to enter php data into a html multiple select box -

java - Can't add JTree to JPanel of a JInternalFrame -

c++ - Cassandra datastax cpp driver - avoiding unnecessary copies -