python - Remove multiple items from a numy.narray without numpy.delete -


I am using a large nump.narray (11.000x3180) to develop an active learning algorithm (text mining) . In this algorithm, I have to delete every Ireshian 16 samples (line vectors) in my dataset, and then have to integrate them into the training set (this increment increases in 16 samples). After performing this process for 60 iterations (approximately), the same process is started from the beginning of the algorithm repeatedly starting from 100

In my data set, set of 16 elements To delete, I use the method numpy.delete (dataset [listfoundx], axis = 0) , where [ListifoIndex] indexes of selected objects to be removed Corresponds to

This method works for the first time (1 out of 100), but again the algorithm starts again, I have the following error:

  new = Empty (newshape, arr .dtype, arr.flags.fnc) Random  

Obviously numpy.delete metod creates a copy of my database for each index (16x1.2GB), which is more than the amount of memory on my computer.

The question is how can I get the item removed from a numpy.narray, not to use too much memory and without excessive execution time?

PD1: I have done the reverse process, where I add the elements that are not in the index list to remove but the process is very slow. PD2: Sometimes the error occurs before the start of the algorithm (before the recurrence number 60)

It can help you understand exactly what np.delete does in your case

  newset = np.delete (dataset, listfoundx, axis = 0) # Correct  

In short it does:

  keep = np.ones (dataset.shape [0], dtype = bool) #The truth of the mail Matching first index [ListifoIndex] = False newset = Dataset [keep:]  

If I run to run

  dataset = np.delete (dataset, listfoundx), then its boolean index builds in other words. , Axis = 0)  

There is no accumulation of frequent, intermediate arrays in an interactive shell. This will temporarily be a new copy of the array and dataset to keep it going while moving . But with the assignment, the old copy disappears.

Are you sure that this is delete rising memory usage, which is in opposition to increasing training set? For speed in terms of speed, you can improve that by actually retaining the 'mask' of all 'removed' rows instead of removing anything, how it depends on how ListifoIndex overlaps with the previous extinction, updating this mask can cause more trouble than it. It is also likely to have more error prone.


Comments

Popular posts from this blog

mysql - How to enter php data into a html multiple select box -

java - Can't add JTree to JPanel of a JInternalFrame -

c++ - Cassandra datastax cpp driver - avoiding unnecessary copies -